Geography-independent mucosal microbiota alterations in primary sclerosing cholangitis persist after liver transplantation

1. Hypothesis: PSC is associated with the gut microbiome changes

1 Introduction

1.1 About

Pre_LTx vs Post_LTx vs Healthy analysis on merged data

Alpha diversity – Richness, Shannon, Simpson, and Pielou indices tested by linear fixed effect or mixed effect model (group effect, country effect, interaction effect)
Beta diversity – PERMANOVA (group effect, country effect, interaction effect), PCA visualization
Differential abundance testing:
- o Group effect – linDA + MaAsLin2 intersection
- o Country effect – taxa with significant interaction effect were excluded based on individual post-hoc analysis. Taxa had to have the same direction in both countries
ML models, trained and validated using out-of-sample bootstrap (500 resamplings):
o ENET
o kNN
o GBoost
o RF

Pairwise comparisons:

Pre_ltx vs Healthy – an effect of disease, liver cirrhosis and overall bad clinical condition

Pre_ltx vs post_ltx – an effect of transplantation + general improvement of the clinical state

Post_ltx vs Healthy – does LTx lead to „healthy microbiome”?

Report structure The analysis is divided into two segments: terminal ileum and colon. Each part begins with data filtering, where sequencing depth is assessed and nearZeroVar() is applied to remove taxa with no variance in the dataset. This is followed by a series of analytical steps:

Alpha diversity
Beta diversity
Univariate testing
Machine learning models training and validation

At the end of the analysis, a brief summary of the results is provided, as well as final plots that are reported in the original publication.

Importing libraries and custom functions built for this analysis

source("custom_functions.R")

1.2 Clinical characteristics

clinical_char <- read.xlsx("../../../results/clinical_characteristics.xlsx",
                           colNames = FALSE)
colnames(clinical_char) <- clinical_char[3,]
clinical_char <- clinical_char[-c(1,2,3),]
header_groups <- c(" " = 2, "HC" = 2, "pre_LTx" = 2, "post_LTx_non-rPSC"=2, "post_LTx_rPSC"=2)

knitr::kable(clinical_char, align="l", escape=FALSE,digits=3) %>% 
  kable_styling(bootstrap_options = c("hover", "striped", "responsive")) %>% 
  kable_styling(full_width = TRUE) %>%
  add_header_above(header_groups)

		HC		pre_LTx		post_LTx_non-rPSC		post_LTx_rPSC
	NA	Norway	Czech	Norway	Czech	Norway	Czech	Norway	Czech
4	F/M [n]	10/30	30/26	19/65	12/19	15/36	26/57	13	3/22
5	Age [years]	NA	50 (23;68)	40 (17;77)	35 (17;63)	49 (23;70)	48 (32;81)	NA	48 (26;81)
6	IBD yes/no [n]	-	-	66/18	25/6	31/7	72/11	11/2	24/1
7	Total bilirubin [µmol/l]	-	21 (3;68)	24	62 (9;500)	19.5 (11;49)	18 (5;132)	24 (13;52)	24 (6;148)
8	AST [µkat/l]	-	0.5 (0.3;1.9)	1.1 (0.2;8.6)	1.6 (0.5;4.0)	0.6 (0.2;3.2)	0.4 (0.2;1.7)	1.1 (0.3, 4.0)	0.5 (0.2;1.9)
9	ALT [µkat/l]	-	0.5 (0.3;1.7)	1.6 (0.5;8.6)	1.7 (0.5;5.2)	0.6 (0.1,4.1)	0.5 (0.3;4.0)	1.1 (0.2;4.5)	0.5 (0.3;3.5)
10	ALP [µkat/l]	-	1.3 (0.7;2.3)	4.2 (0.9;17.3)	6.0 (1.6;17.8)	1.3 (0.6;7.9)	1.4 (0.6;3.5)	3.7 (0.5;13.9)	2.3 (0.5;13.1)
11	GGT [µkat/l]	-	0.4 (0.2;5.0)	-	2.6 (0.6;15.2)	-	0.4 (0.1; 4.8)	-	1.4 (0.3;13.7)
12	INR [-]	-	1.1 (1.1;1.2)	-	1.1 (0.9;1.8)	-	1.1 (0.9;1.6)	-	1.1 (1.0;1.7)
13	Creatinine [µmol/l]	-	77 (50;115)	65 (39; 232)	60 (44;101)	84 (59; 159)	87 (49;248)	81 (65; 133)	85 (50;780)
14	Albumin [g/l]	-	50 (43;57)	41 (26;49)	38 (20;50)	43 (35;49)	45 (26;51)	38 (34; 50)	42 (26;50)
15	Fecal calprotectin [µg/g]	-	-	59 (1;2844)	123 (6;4513)	30 (1;1945)	275 (6;4821)	55 (10;832)	564 (14;4301)
16	NANCY_max [-]	-	-	-	2 (0;4)	-	2 (0;4)	-	2 (0;4)
17	eMayo [-]	-	-	-	1 (0;2)	-	1 (0;3)	-	1 (0;2)
18	Mayo_DAI [-]	-	-	-	2 (0;5)	-	1 (0;8)	-	2 (0;7)
19	Mayo_PSC risk score [-]	-	-	0.3 (;2.2;3.5)	0.3 (;2.2;3.5)	0.3 (;1.1;2.2)	-	1.0 (;1.4;3.1)	-
20	AOM_score [-]	-	-	1.9 (0.5;3.8)	2 (1.1;5.0)	1.9 (0.8;2.7)	-	2.2 (1.2;3.5)	-
21	APRI_score [-]	-	0.2 (0.1;1.5)	0.7 (0.1;5.5)	1.1 (0.3;14.3)	0.5 (0.2;2.3)	0.3 (0.1;1.5)	0.7 (0.2;3.6)	0.4 (0.1;1.4)
22	FIB-4_score [-]	-	0.8 (0.4;3.4)	1.1 (0.8;8.6)	1.6 (0.4;23.1)	1.5 (0.6;3.6)	1.1 (0.3; 5.0)	2.2 (0.5;5.8)	1.1 (0.5;3.1)
23	MELD_score [-]	-	7.8 (7.1;8.6)	-	8.2 (6.4;18.0)	-	8.0 (6.4;14.6)	-	7.8 (6.5;22.8)
24	Platelets [10^9/l]	-	267 (145;400)	265 (37; 712)	201 (20;477)	197 (101;434)	208 (41;442)	212 (69;409)	213 (83;416)

1.3 Data Import

Importing ASV, taxa and metadata tables for both Czech and Norway samples.

Czech

path = "../../../../data/analysis_ready_data/ikem/"
asv_tab_ikem <- as.data.frame(fread(file.path(path,"asv_table_ikem.csv"),
                                    check.names = FALSE))
taxa_tab_ikem <- as.data.frame(fread(file.path(path,"taxa_table_ikem.csv"),
                                     check.names = FALSE))
metadata_ikem <- as.data.frame(fread(file.path(path,"metadata_ikem.csv"),
                                     check.names = FALSE))

Norway

path = "../../../../data/analysis_ready_data/norway/"
asv_tab_norway <- as.data.frame(fread(file.path(path,"asv_table_norway.csv"),
                                    check.names = FALSE))
taxa_tab_norway <- as.data.frame(fread(file.path(path,"taxa_table_norway.csv"),
                                    check.names = FALSE))
metadata_norway <- as.data.frame(fread(file.path(path,"metadata_norway.csv"),
                                    check.names = FALSE))

1.4 Merging

Merging two countries based on the different matrices - Ileum, Colon.

Terminal ileum

ileum_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="TI",Q="Q1")

Removing 1498 ASV(s)

Removing 1834 ASV(s)

Merging at ASV level

Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment

ileum_asv_tab <- ileum_data[[1]]
ileum_taxa_tab <- ileum_data[[2]]
ileum_metadata <- ileum_data[[3]]

Reads statistics

summary(colSums(ileum_asv_tab[,-1]))

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   3441   16579   20972   24071   28473   85910

Colon

colon_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="colon",Q="Q1")

Removing 739 ASV(s)

Removing 266 ASV(s)

Merging at ASV level

Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment

colon_asv_tab <- colon_data[[1]]
colon_taxa_tab <- colon_data[[2]]
colon_metadata <- colon_data[[3]]

Reads statistics

summary(colSums(colon_asv_tab[,-1]))

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     63   19814   26676   30841   37553  136048

2 Data Analysis - Terminal ileum

segment="terminal_ileum"

2.1 Filtering

Rules:

sequencing depth > 10000
nearZeroVar() with default settings

Rarefaction Curve

path="../../../intermediate_files/rarecurves"
seq_depth_threshold <- 10000

ps <- construct_phyloseq(ileum_asv_tab,ileum_taxa_tab,ileum_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_ileum.Rdata"))

load(file.path(path,"rarefaction_ileum.Rdata"))

prare <- ggrarecurve(obj=rareres,
                      factorNames="Country",
                      indexNames=c("Observe")) + 
  theme_bw() +
  theme(axis.text=element_text(size=8), panel.grid=element_blank(),
        strip.background = element_rect(colour=NA,fill="grey"),
        strip.text.x = element_text(face="bold")) + 
  geom_vline(xintercept = seq_depth_threshold, 
             linetype="dashed", 
             color = "red") + 
  xlim(0, 20000)

The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)

prare

Library size

read_counts(ileum_asv_tab, line = c(5000,10000))

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.1 Sequencing depth

data_filt <- seq_depth_filtering(ileum_asv_tab,
                                 ileum_taxa_tab,
                                 ileum_metadata,
                                 seq_depth_threshold = 10000)

Removing 131 ASV(s)

filt_ileum_asv_tab <- data_filt[[1]]; alpha_ileum_asv_tab <- filt_ileum_asv_tab
filt_ileum_taxa_tab <- data_filt[[2]]; alpha_ileum_taxa_tab <- filt_ileum_taxa_tab
filt_ileum_metadata <- data_filt[[3]]; alpha_ileum_metadata <- filt_ileum_metadata

seq_step <- dim(filt_ileum_asv_tab)[1]

Library size

read_counts(filt_ileum_asv_tab,line = c(5000,10000))

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.2 NearZeroVar

data_filt <- nearzerovar_filtering(filt_ileum_asv_tab, 
                                   filt_ileum_taxa_tab,
                                   filt_ileum_metadata)

filt_ileum_asv_tab <- data_filt[[1]]
filt_ileum_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_ileum_asv_tab)[1]

Library size

read_counts(filt_ileum_asv_tab,line = c(5000,10000))

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.3 Final Counts

final_counts_filtering(ileum_asv_tab,
                       filt_ileum_asv_tab,
                       filt_ileum_metadata,
                       seq_step, 0, nearzero_step) %>% `colnames<-`("Count")

                            Count
Raw data: ASVs               5132
Raw data: Samples             294
Sequencing depth filt: ASVs  5001
Prevalence filt: ASVs           0
NearZeroVar filt: ASVs        479
Filt data: ASVs               479
Filt data: Samples            281
Filt data: Patients           281
Filt data: Patients.1           0
Filtered samples               13
Matrices                       TI
healthy                        73
non-rPSC                        0
rPSC                            0
pre_ltx                        70
post_ltx                      138
ETOH                            0

2.2 Alpha diversity

Calculating Richness, Shannon, Simpson, Pielou indexes on raw (unfiltered) rarefied data. Samples with sequencing depth < 10000 were excluded. Testing by linear model.

path = "../../../results/merged_sites/main_analysis/Q1/alpha_diversity"

Calculation

# Construct MPSE object
alpha_ileum_metadata$Sample <- alpha_ileum_metadata$SampleID
ileum_mpse <- as.MPSE(construct_phyloseq(alpha_ileum_asv_tab,
                                         alpha_ileum_taxa_tab,
                                         alpha_ileum_metadata))

ileum_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)

# Calculate alpha diversity - rarefied counts
ileum_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)

alpha_div_plots <- list()

# preparing data frame
alpha_data <- data.frame(SampleID=ileum_mpse$Sample.x,
                         Observe=ileum_mpse$Observe,
                         Shannon=ileum_mpse$Shannon,
                         Simpson=ileum_mpse$Simpson,
                         Pielou=ileum_mpse$Pielou,
                         Group=ileum_mpse$Group,
                         Country=ileum_mpse$Country,
                         Patient=ileum_mpse$Patient)

# save the result
write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
          row.names = FALSE)

2.2.1 Plots

p_boxplot_alpha <- alpha_diversity_countries(alpha_data)

Using SampleID, Group, Country, Patient as id variables

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha

# see the results
p_boxplot_alpha

2.2.2 Linear Models

path = "../../../results/merged_sites/main_analysis/Q1/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))

Richness

# run model
results_model <- pairwise.lm(formula = "Observe ~ Group * Country",
                             factors=alpha_data$Group,
                             data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_observe <- results_model[[1]]
  results_model_observe_emeans <- results_model[[2]]
} else {
  results_model_observe <- results_model
  results_model_observe_emeans <- NA
}

# save the results
pc_observed <- list(); 
pc_observed[[segment]] <- results_model_observe

# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")

Raw results of linear model of richness estimation.
	Estimate	Std..Error	t.value	Pr…t..	p.adj	sig
healthy vs Grouppre_ltx	-53.642	11.742	-4.568	0.000	0.000	***
healthy , pre_ltx - CZ vs NO	11.219	10.488	1.070	0.287	0.361
healthy vs Grouppre_ltx:CountryNO	6.313	15.403	0.410	0.683	0.683
pre_ltx vs Grouppost_ltx	25.623	11.366	2.254	0.025	0.057
pre_ltx , post_ltx - CZ vs NO	17.533	12.615	1.390	0.166	0.249
pre_ltx vs Grouppost_ltx:CountryNO	-35.933	15.921	-2.257	0.025	0.057
healthy vs Grouppost_ltx	-28.019	9.237	-3.033	0.003	0.012	*
healthy , post_ltx - CZ vs NO	11.219	11.267	0.996	0.321	0.361
healthy vs Grouppost_ltx:CountryNO	-29.620	14.629	-2.025	0.044	0.080

knitr::kable(results_model_observe_emeans,digits = 3,
caption = "Raw results of independent country analysis")

Raw results of independent country analysis
contrast	Country	estimate	SE	df	t.ratio	p.value
post_ltx - pre_ltx	CZ	25.623	11.366	204	2.254	0.025
post_ltx - pre_ltx	NO	-10.310	11.148	204	-0.925	0.356
post_ltx - healthy	CZ	-28.019	9.237	207	-3.033	0.003
post_ltx - healthy	NO	-57.639	11.344	207	-5.081	0.000

Shannon

# run model
results_model <- pairwise.lm(formula = "Shannon ~ Group * Country",
                             factors=alpha_data$Group,
                             data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_shannon <- results_model[[1]]
  results_model_shannon_emeans <- results_model[[2]]
} else {
  results_model_shannon <- results_model
  results_model_shannon_emeans <- NA
}

# save the results
pc_shannon <- list(); 
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)

# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")

Raw results of linear model of Shannon estimation.
	Estimate	Std..Error	t.value	Pr…t..	p.adj	sig
healthy vs Grouppre_ltx	-0.505	0.159	-3.171	0.002	0.017	*
healthy , pre_ltx - CZ vs NO	0.007	0.142	0.046	0.963	0.963
healthy vs Grouppre_ltx:CountryNO	0.144	0.209	0.687	0.493	0.634
pre_ltx vs Grouppost_ltx	0.372	0.148	2.514	0.013	0.049	*
pre_ltx , post_ltx - CZ vs NO	0.150	0.164	0.915	0.361	0.542
pre_ltx vs Grouppost_ltx:CountryNO	-0.502	0.207	-2.423	0.016	0.049	*
healthy vs Grouppost_ltx	-0.134	0.111	-1.200	0.231	0.416
healthy , post_ltx - CZ vs NO	0.007	0.136	0.048	0.962	0.963
healthy vs Grouppost_ltx:CountryNO	-0.358	0.176	-2.031	0.044	0.098

knitr::kable(results_model_shannon_emeans,digits = 3,
caption = "Raw results of independent country analysis")

Raw results of independent country analysis
contrast	Country	estimate	SE	df	t.ratio	p.value
post_ltx - pre_ltx	CZ	0.372	0.148	204	2.514	0.013
post_ltx - pre_ltx	NO	-0.130	0.145	204	-0.898	0.370
post_ltx - healthy	CZ	-0.134	0.111	207	-1.200	0.231
post_ltx - healthy	NO	-0.492	0.137	207	-3.597	0.000

Simpson

# run model
results_model <- pairwise.lm(formula = "Simpson ~ Group * Country",
                                     factors=alpha_data$Group,
                                     data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_simpson <- results_model[[1]]
  results_model_simpson_emeans <- results_model[[2]]
} else {
  results_model_simpson <- results_model
  results_model_simpson_emeans <- NA
}


# save the results
pc_simpson <- list(); 
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)

# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")

Raw results of linear model of Simpson estimation.
	Estimate	Std..Error	t.value	Pr…t..	p.adj
healthy vs Grouppre_ltx	-0.046	0.019	-2.481	0.014	0.129
healthy , pre_ltx - CZ vs NO	-0.008	0.017	-0.466	0.642	0.692
healthy vs Grouppre_ltx:CountryNO	0.027	0.024	1.092	0.277	0.498
pre_ltx vs Grouppost_ltx	0.037	0.021	1.763	0.079	0.238
pre_ltx , post_ltx - CZ vs NO	0.019	0.023	0.811	0.418	0.627
pre_ltx vs Grouppost_ltx:CountryNO	-0.060	0.029	-2.033	0.043	0.195
healthy vs Grouppost_ltx	-0.009	0.016	-0.572	0.568	0.692
healthy , post_ltx - CZ vs NO	-0.008	0.020	-0.396	0.692	0.692
healthy vs Grouppost_ltx:CountryNO	-0.033	0.025	-1.303	0.194	0.436

knitr::kable(results_model_simpson_emeans,digits = 3,
caption = "Raw results of independent country analysis")

Raw results of independent country analysis
contrast	Country	estimate	SE	df	t.ratio	p.value
post_ltx - pre_ltx	CZ	0.037	0.021	204	1.763	0.079
post_ltx - pre_ltx	NO	-0.023	0.021	204	-1.106	0.270

Pielou

# run model
results_model <- pairwise.lm(formula = "Pielou ~ Group * Country",
                                     factors=alpha_data$Group,
                                     data=alpha_data)

# check interaction

if (!is.data.frame(results_model)){
  results_model_pielou <- results_model[[1]]
  results_model_pielou_emeans <- results_model[[2]]
} else {
  results_model_pielou <- results_model
  results_model_pielou_emeans <- NA
}

# save the results
pc_pielou <- list(); 
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)

# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")

Raw results of linear model of Pielou estimation.
	Estimate	Std..Error	t.value	Pr…t..	p.adj
healthy vs Grouppre_ltx	-0.040	0.024	-1.708	0.090	0.285
healthy , pre_ltx - CZ vs NO	-0.008	0.021	-0.394	0.694	0.943
healthy vs Grouppre_ltx:CountryNO	0.010	0.031	0.332	0.740	0.943
pre_ltx vs Grouppost_ltx	0.042	0.022	1.922	0.056	0.285
pre_ltx , post_ltx - CZ vs NO	0.002	0.024	0.082	0.935	0.943
pre_ltx vs Grouppost_ltx:CountryNO	-0.051	0.030	-1.677	0.095	0.285
healthy vs Grouppost_ltx	0.001	0.017	0.072	0.943	0.943
healthy , post_ltx - CZ vs NO	-0.008	0.020	-0.407	0.684	0.943
healthy vs Grouppost_ltx:CountryNO	-0.041	0.027	-1.525	0.129	0.290

knitr::kable(results_model_pielou_emeans,digits = 3,
caption = "Raw results of independent country analysis")

Raw results of independent country analysis
contrast	Country	estimate	SE	df	t.ratio	p.value
post_ltx - pre_ltx	CZ	0.042	0.022	204	1.922	0.056
post_ltx - pre_ltx	NO	-0.009	0.021	204	-0.435	0.664

2.2.3 Saving results

alpha_list <- list(
  Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
  Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
  Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
  Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
                   
write.xlsx(alpha_list, 
           file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))

2.3 Beta diversity

Calculating Aitchison distance (euclidean distance on clr-transformed data) at genus level (main analysis). In supplements, there are also bray-curtis and Jaccard distance at ASV and genus levels. Testing by PERMANOVA.

2.3.1 Main analysis

Genus level, Aitchison distance

level="genus"
path = "../../../results/merged_sites/main_analysis/Q1/beta_diversity"

pairwise_aitchison_raw <- list()
pca_plots_list <- list()

Aggregation, filtering

# Aggregation
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)
# Filtration
filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             ileum_metadata,
                             seq_depth_threshold=10000)

Removing 5 ASV(s)

filt_ileum_genus_tab <- filt_data[[1]]
filt_ileum_genus_taxa <- filt_data[[2]]
filt_ileum_metadata <- filt_data[[3]]

2.3.1.0.1 PERMANOVA

# prepare dataset
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_ileum_metadata$Group,
                           covariate = filt_ileum_metadata$Country, 
                           sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_ileum_metadata$Group,
                          covariate = filt_ileum_metadata$Country, 
                          interaction = TRUE, sim.method = "robust.aitchison",
                          p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)

# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")

PERMANOVA, GROUP separation
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted	sig
pre_ltx vs healthy	1	695.306	4.259	0.029	0.001	0.002	**
pre_ltx vs post_ltx	1	309.768	1.811	0.009	0.003	0.003	**
post_ltx vs healthy	1	914.409	5.454	0.025	0.001	0.002	**

knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")

PERMANOVA, COUNTRY separation
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted	sig
pre_ltx vs healthy , Country	1	482.074	2.953	0.020	0.001	0.001	***
pre_ltx vs post_ltx , Country	1	699.075	4.087	0.019	0.001	0.001	***
post_ltx vs healthy , Country	1	728.020	4.342	0.020	0.001	0.001	***

knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")

PERMANOVA, INTERACTION GROUP:Country
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted	sig
pre_ltx vs healthy : Country	1	237.598	1.460	0.010	0.010	0.030	*
pre_ltx vs post_ltx : Country	1	237.265	1.390	0.007	0.021	0.032	*
post_ltx vs healthy : Country	1	202.888	1.211	0.006	0.106	0.106

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.1]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2)
  print(result_list)
}

$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Model     1    275.1 0.02844 1.7272  0.003      0.003
Residual 59   9396.0 0.97156                         
Total    60   9671.1 1.00000                         

$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs     R2      F Pr(>F) p.adjusted
Model     1    657.8 0.0474 3.9806  0.001  0.0013333
Residual 80  13220.8 0.9526                         
Total    81  13878.7 1.0000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2     F Pr(>F) p.adjusted
Model     1    362.6 0.03086 2.165  0.001  0.0013333
Residual 68  11387.2 0.96914                        
Total    69  11749.8 1.00000                        

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Model     1    357.1 0.03082 2.2579  0.001  0.0013333
Residual 71  11229.6 0.96918                         
Total    72  11586.7 1.00000                         

$pre_ltx_post_ltx_CZ
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs     R2      F Pr(>F) p.adjusted
Model      1    184.3 0.0085 1.0629  0.314      0.314
Residual 124  21497.3 0.9915                         
Total    125  21681.6 1.0000                         

$pre_ltx_post_ltx_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Model     1    362.8 0.02649 2.1767  0.001  0.0013333
Residual 80  13332.7 0.97351                         
Total    81  13695.4 1.00000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2     F Pr(>F) p.adjusted
Model     1    362.6 0.03086 2.165  0.001  0.0013333
Residual 68  11387.2 0.96914                        
Total    69  11749.8 1.00000                        

$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Model      1    573.8 0.02389 3.3288  0.001  0.0013333
Residual 136  23442.8 0.97611                         
Total    137  24016.6 1.00000

2.3.1.0.2 Plots

p <- pca_plot_custom(filt_ileum_genus_tab,
                                 filt_ileum_genus_taxa,
                                 filt_ileum_metadata,
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend= FALSE)

# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p

# see the results
p

2.3.1.1 Saving results

write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]], 
           file = file.path(path,
           paste0("beta_diversity_results_", segment,".xlsx")))

2.4 Univariate Analysis

Using two DAA tools - linDA and Maaslin. Only the intersection of sets selected by each tool was shown to minimize false positives. Taxa with a significant interaction effect were excluded based on individual post-hoc analysis of the Czech and Norwegian cohorts. Only taxa with significant log fold change that showed the same change direction in both countries were retained.

Main analysis is performed at genus level. In supplements, analysis is performed at ASV and phylum levels.

2.4.1 Main analysis

level="genus"

# needed paths
path = "../../../results/merged_sites/main_analysis/Q1/univariate_analysis"
path_maaslin=file.path("../../../intermediate_files/maaslin/Q1",level)

# variables
raw_linda_results_genus <- list();
raw_linda_results_genus[[segment]] <- list()
linda_results_genus <- list(); 
linda_results_genus[[segment]] <- list()

# country and interaction problems
list_country_union <- list()
list_intersections <- list()
list_venns <- list()
uni_statistics <- list()

# workbook for final df
wb <- createWorkbook()

# PSC effect
psc_effect <- list()

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

ileum_genus_asv_taxa_tab <- create_asv_taxa_table(ileum_genus_tab,
                                                  ileum_genus_taxa_tab)

2.4.1.1 pre_ltx vs healthy

We can assume, that the differentially abundant taxa between these two groups are caused by PSC disease and other factors, like liver cirrhosis and overall bad clinical condition.

group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

2.4.1.1.1 linDA

# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

Removing 84 ASV(s)

Removing 10 ASV(s)

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

0  features are filtered!
The filtered data has  143  samples and  184  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# volcano plots
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

2.4.1.1.2 MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

2.4.1.1.3 Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.1.4 Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

NULL

## results for czech cohort
list_interaction_significant[[2]]

[1] NA

## results for norwegian cohort
list_interaction_significant[[3]]

[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

2.4.1.1.5 Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

2.4.1.2 pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

2.4.1.2.1 linDA

# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

Removing 46 ASV(s)

Removing 6 ASV(s)

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

0  features are filtered!
The filtered data has  208  samples and  172  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

volcano

2.4.1.2.2 MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

2.4.1.2.3 Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.2.4 Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

NULL

## results for czech cohort
list_interaction_significant[[2]]

[1] NA

## results for norwegian cohort
list_interaction_significant[[3]]

[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

2.4.1.2.5 Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

2.4.1.3 post_ltx vs healthy

We can assume, that the differentially abundant taxa between these two groups remain because of the persistent gut MB alteration due to the PSC disease. Also, transplantation itself could add to this difference, as it has definitely strong impact on microbial composition.

group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

2.4.1.3.1 linDA

# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

Removing 45 ASV(s)

Removing 2 ASV(s)

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

0  features are filtered!
The filtered data has  211  samples and  180  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

2.4.1.3.2 MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

2.4.1.3.3 Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.3.4 Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

NULL

## results for czech cohort
list_interaction_significant[[2]]

[1] NA

## results for norwegian cohort
list_interaction_significant[[3]]

[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

2.4.1.3.5 Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

2.4.1.4 Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab) 
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
                                      ileum_taxa_tab) + xlab("") + ylab("")

min_clr -1.767585 
max_clr 6.908912 
min_log -5.062147 
max_log 4.892498

dotheatmap_linda

Horizontal bar plot

p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))

Using SeqID as id variables

p_prevalence_final <- ggarrange(p_prevalence,
                                ggplot() + theme_minimal(),
                                nrow = 2,heights = c(1,0.09))

p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),
               p_prevalence_final,
               ncol=2,widths = c(1,0.3))

Warning: Removed 80 rows containing missing values or values outside the scale range
(`geom_point()`).

Warning: Removed 80 rows containing missing values or values outside the scale range
(`geom_text()`).

dot_heatmap_ileum <- p

2.4.1.5 PSC effect

To get only PSC associated taxa, we can intersect differentially abundant taxa from the previous analyses (see diagram below).

pre_LTx vs HC and Post_LTx vs HC intersection

A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)

psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
  
# see the results
psc_effect[[paste(segment,level)]]

                           SeqID
10                 Enterorhabdus
16                   Barnesiella
19                 Butyricimonas
20                   Odoribacter
27                     Alistipes
29               Parabacteroides
41                    Holdemania
64                   Coprococcus
68              Fusicatenibacter
72             Lachnoclostridium
74  Lachnospiraceae_FCS020_group
105                Oscillibacter
118             Faecalibacterium
155                  Pseudomonas
164                 Enterococcus
                                                                                                        Taxonomy
10     k__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;f__Eggerthellaceae;g__Enterorhabdus
16                 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Barnesiellaceae;g__Barnesiella
19                k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Butyricimonas
20                  k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Odoribacter
27                     k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes
29              k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides
41               k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Holdemania
64                   k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus
68              k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter
72             k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium
74  k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_FCS020_group
105              k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter
118            k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium
155   k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas
164                   k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus
    log2FoldChange      p_value         padj
10       -1.050549 1.087023e-02 4.545734e-02
16       -3.177161 1.263665e-03 7.750481e-03
19       -2.900425 6.571752e-05 9.301556e-04
20       -2.971657 3.696980e-05 6.184040e-04
27       -2.559742 5.731016e-04 4.336862e-03
29       -3.418185 1.366753e-05 2.794251e-04
41       -1.213660 1.082333e-02 4.545734e-02
64       -3.868534 6.688508e-07 3.076714e-05
68       -2.864295 5.224510e-04 4.336862e-03
72       -1.781011 1.555333e-03 9.231654e-03
74       -2.231945 8.971276e-04 6.113758e-03
105      -1.970465 2.772265e-03 1.416936e-02
118      -3.360183 8.947299e-06 2.351861e-04
155       1.573079 8.758249e-03 4.028794e-02
164       4.892498 4.554602e-10 8.380467e-08

2.4.1.6 Saving results

# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
             overwrite = TRUE)

# PSC effect
write.xlsx(psc_effect[[paste(segment,level)]],file.path(path,paste0("psc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa

write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("significant_taxa_",segment,".xlsx")))

2.5 Machine learning

Binary classification was performed using four independent models: Elastic Net (ENET), RF (RF), Gradient Boosting (GBoost), K-nearest Neighbors (kNN). Training and validation were performed via bootstrapping (N=500) on clr-transformed data. Model performance metrics expressed as AUC were calculated based on an out-of-sample principle.

The supplementary analysis contains training and validating at clr-transformed data at ASV level and relative-abundance data at ASV and genus level.

path = "../../../results/merged_sites/main_analysis/Q1/models"

2.5.1 ENET

model="enet"

2.5.1.1 ASV level

level="ASV"

2.5.1.1.1 pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")

Removing 1598 ASV(s)

Removing 146 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ <- list()
models_cm <- list()
betas <- list()
roc_cs <- list()

models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                     [,1]
alpha                           0.4000000
lambda                          0.0256910
auc                             1.0000000
auc_czech                       1.0000000
auc_no                          1.0000000
auc_optimism_corrected          0.9562938
auc_optimism_corrected_CIL      0.8940903
auc_optimism_corrected_CIU      0.9955143
accuracy                        1.0000000
accuracy_czech                        NaN
accuracy_no                     1.0000000
accuracy_optimism_corrected     0.8824810
accuracy_optimism_corrected_CIL 0.7808636
accuracy_optimism_corrected_CIU 0.9600000

enet_model$conf_matrices

$original
    Predicted
True  0  1
   0 73  0
   1  0 70

$czech
    Predicted
True  0  1
   0 37  0
   1  0 24

$no
    Predicted
True  0  1
   0 36  0
   1  0 46

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

roc_c

2.5.1.1.2 pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

Removing 979 ASV(s)

Removing 68 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                     [,1]
alpha                           0.2000000
lambda                          0.1292328
auc                             0.9967909
auc_czech                       0.9979575
auc_no                          0.9969807
auc_optimism_corrected          0.8645187
auc_optimism_corrected_CIL      0.7852767
auc_optimism_corrected_CIU      0.9318161
accuracy                        0.9278846
accuracy_czech                        NaN
accuracy_no                     0.9512195
accuracy_optimism_corrected     0.8220597
accuracy_optimism_corrected_CIL 0.7333333
accuracy_optimism_corrected_CIU 0.8924940

enet_model$conf_matrices

$original
    Predicted
True   0   1
   0 138   0
   1  15  55

$czech
    Predicted
True   0   1
   0 102   0
   1  11  13

$no
    Predicted
True  0  1
   0 36  0
   1  4 42

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

roc_c

2.5.1.1.3 post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

Removing 641 ASV(s)

Removing 104 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                      [,1]
alpha                           0.20000000
lambda                          0.04799701
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.95799961
auc_optimism_corrected_CIL      0.89999175
auc_optimism_corrected_CIU      0.98977781
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.89383497
accuracy_optimism_corrected_CIL 0.82118019
accuracy_optimism_corrected_CIU 0.96028870

enet_model$conf_matrices

$original
    Predicted
True   0   1
   0  73   0
   1   0 138

$czech
    Predicted
True   0   1
   0  37   0
   1   0 102

$no
    Predicted
True  0  1
   0 36  0
   1  0 36

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

roc_c

2.5.1.2 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

2.5.1.2.1 pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

Removing 84 ASV(s)

Removing 10 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                      [,1]
alpha                           0.20000000
lambda                          0.03984448
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.96582239
auc_optimism_corrected_CIL      0.91575521
auc_optimism_corrected_CIU      0.99568966
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.89754798
accuracy_optimism_corrected_CIL 0.80769231
accuracy_optimism_corrected_CIU 0.97141139

enet_model$conf_matrices

$original
    Predicted
True  0  1
   0 73  0
   1  0 70

$czech
    Predicted
True  0  1
   0 37  0
   1  0 24

$no
    Predicted
True  0  1
   0 36  0
   1  0 46

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))

[1] "p_value: 0"

roc_c

2.5.1.2.2 pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

Removing 46 ASV(s)

Removing 6 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                     [,1]
alpha                           0.0000000
lambda                          0.6674012
auc                             0.9726708
auc_czech                       0.9616013
auc_no                          0.9800725
auc_optimism_corrected          0.8344709
auc_optimism_corrected_CIL      0.7487803
auc_optimism_corrected_CIU      0.9154473
accuracy                        0.8653846
accuracy_czech                        NaN
accuracy_no                     0.8780488
accuracy_optimism_corrected     0.7788698
accuracy_optimism_corrected_CIL 0.6973684
accuracy_optimism_corrected_CIU 0.8516141

enet_model$conf_matrices

$original
    Predicted
True   0   1
   0 136   2
   1  26  44

$czech
    Predicted
True   0   1
   0 101   1
   1  17   7

$no
    Predicted
True  0  1
   0 35  1
   1  9 37

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))

[1] "p_value: 0"

roc_c

2.5.1.2.3 post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,group, 
                                     usage="ml_clr")

Removing 45 ASV(s)

Removing 2 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                      [,1]
alpha                           0.00000000
lambda                          0.07097722
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.96566611
auc_optimism_corrected_CIL      0.92200341
auc_optimism_corrected_CIU      0.99386451
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.90031834
accuracy_optimism_corrected_CIL 0.82800926
accuracy_optimism_corrected_CIU 0.96054545

enet_model$conf_matrices

$original
    Predicted
True   0   1
   0  73   0
   1   0 138

$czech
    Predicted
True   0   1
   0  37   0
   1   0 102

$no
    Predicted
True  0  1
   0 36  0
   1  0 36

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))

[1] "p_value: 0"

roc_c

2.5.1.3 Saving results

models_summ_df_ileum <- do.call(rbind, 
  models_summ[grep(segment,names(models_summ),value = TRUE)])

write.csv(models_summ_df_ileum,file.path(path,paste0("elastic_net_",segment,".csv")))

2.6 Results overview

2.6.0.1 Alpha diversity

knitr::kable(pc_observed[[segment]],
             digits = 3,
             caption = "Results of linear model testing ASV Richness")

Results of linear model testing ASV Richness
	Estimate	Std..Error	t.value	Pr…t..	p.adj	sig
healthy vs Grouppre_ltx	-53.642	11.742	-4.568	0.000	0.000	***
healthy , pre_ltx - CZ vs NO	11.219	10.488	1.070	0.287	0.361
healthy vs Grouppre_ltx:CountryNO	6.313	15.403	0.410	0.683	0.683
pre_ltx vs Grouppost_ltx	25.623	11.366	2.254	0.025	0.057
pre_ltx , post_ltx - CZ vs NO	17.533	12.615	1.390	0.166	0.249
pre_ltx vs Grouppost_ltx:CountryNO	-35.933	15.921	-2.257	0.025	0.057
healthy vs Grouppost_ltx	-28.019	9.237	-3.033	0.003	0.012	*
healthy , post_ltx - CZ vs NO	11.219	11.267	0.996	0.321	0.361
healthy vs Grouppost_ltx:CountryNO	-29.620	14.629	-2.025	0.044	0.080

knitr::kable(pc_shannon[[segment]],
             digits = 3,
             caption = "Results of linear model testing Shannon index")

Results of linear model testing Shannon index
	Estimate	Std..Error	t.value	Pr…t..	p.adj	sig
healthy vs Grouppre_ltx	-0.505	0.159	-3.171	0.002	0.017	*
healthy , pre_ltx - CZ vs NO	0.007	0.142	0.046	0.963	0.963
healthy vs Grouppre_ltx:CountryNO	0.144	0.209	0.687	0.493	0.634
pre_ltx vs Grouppost_ltx	0.372	0.148	2.514	0.013	0.049	*
pre_ltx , post_ltx - CZ vs NO	0.150	0.164	0.915	0.361	0.542
pre_ltx vs Grouppost_ltx:CountryNO	-0.502	0.207	-2.423	0.016	0.049	*
healthy vs Grouppost_ltx	-0.134	0.111	-1.200	0.231	0.416
healthy , post_ltx - CZ vs NO	0.007	0.136	0.048	0.962	0.963
healthy vs Grouppost_ltx:CountryNO	-0.358	0.176	-2.031	0.044	0.098

knitr::kable(pc_simpson[[segment]],
             digits = 3,
             caption = "Results of linear model testing Simpson index")

Results of linear model testing Simpson index
	Estimate	Std..Error	t.value	Pr…t..	p.adj
healthy vs Grouppre_ltx	-0.046	0.019	-2.481	0.014	0.129
healthy , pre_ltx - CZ vs NO	-0.008	0.017	-0.466	0.642	0.692
healthy vs Grouppre_ltx:CountryNO	0.027	0.024	1.092	0.277	0.498
pre_ltx vs Grouppost_ltx	0.037	0.021	1.763	0.079	0.238
pre_ltx , post_ltx - CZ vs NO	0.019	0.023	0.811	0.418	0.627
pre_ltx vs Grouppost_ltx:CountryNO	-0.060	0.029	-2.033	0.043	0.195
healthy vs Grouppost_ltx	-0.009	0.016	-0.572	0.568	0.692
healthy , post_ltx - CZ vs NO	-0.008	0.020	-0.396	0.692	0.692
healthy vs Grouppost_ltx:CountryNO	-0.033	0.025	-1.303	0.194	0.436

knitr::kable(pc_pielou[[segment]],
             digits = 3,
             caption = "Results of linear model testing Pielou index")

Results of linear model testing Pielou index
	Estimate	Std..Error	t.value	Pr…t..	p.adj
healthy vs Grouppre_ltx	-0.040	0.024	-1.708	0.090	0.285
healthy , pre_ltx - CZ vs NO	-0.008	0.021	-0.394	0.694	0.943
healthy vs Grouppre_ltx:CountryNO	0.010	0.031	0.332	0.740	0.943
pre_ltx vs Grouppost_ltx	0.042	0.022	1.922	0.056	0.285
pre_ltx , post_ltx - CZ vs NO	0.002	0.024	0.082	0.935	0.943
pre_ltx vs Grouppost_ltx:CountryNO	-0.051	0.030	-1.677	0.095	0.285
healthy vs Grouppost_ltx	0.001	0.017	0.072	0.943	0.943
healthy , post_ltx - CZ vs NO	-0.008	0.020	-0.407	0.684	0.943
healthy vs Grouppost_ltx:CountryNO	-0.041	0.027	-1.525	0.129	0.290

Plots

alpha_div_plots[[paste(segment,"Country")]]

2.6.0.2 Beta diversity

Main analysis

knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
             digits = 3,
             caption = "Results of PERMANOVA - robust aitchison distance")

Results of PERMANOVA - robust aitchison distance
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted	sig
pre_ltx vs healthy	1	695.306	4.259	0.029	0.001	0.002	**
pre_ltx vs post_ltx	1	309.768	1.811	0.009	0.003	0.003	**
post_ltx vs healthy	1	914.409	5.454	0.025	0.001	0.002	**
pre_ltx vs healthy , Country	1	482.074	2.953	0.020	0.001	0.001	***
pre_ltx vs post_ltx , Country	1	699.075	4.087	0.019	0.001	0.001	***
post_ltx vs healthy , Country	1	728.020	4.342	0.020	0.001	0.001	***
pre_ltx vs healthy : Country	1	237.598	1.460	0.010	0.010	0.030	*
pre_ltx vs post_ltx : Country	1	237.265	1.390	0.007	0.021	0.032	*
post_ltx vs healthy : Country	1	202.888	1.211	0.006	0.106	0.106

PCA

pca_plots_list[[paste(segment,"genus custom")]]

2.6.0.3 Univariate analysis

Number of significant taxa

knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
      as.data.frame(lapply(psc_effect,nrow))) %>% t() %>% 
  `colnames<-`("Count") %>% 
  `rownames<-`(c(names(list_intersections),"PSC effect Genus")),caption="Number of significant taxa")

Number of significant taxa
	Count
terminal_ileum genus healthy vs pre_ltx	25
terminal_ileum genus pre_ltx vs post_ltx	4
terminal_ileum genus healthy vs post_ltx	38
PSC effect Genus	15

2.6.0.4 Machine learning

Main models

knitr::kable(models_summ_df_ileum %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
             digits=2,caption="Elastic net results")

Elastic net results
	alpha	lambda	auc_optimism_corrected	auc_optimism_corrected_CIL	auc_optimism_corrected_CIU
pre_ltx vs healthy ASV terminal_ileum	0.4	0.03	0.96	0.89	1.00
pre_ltx vs post_ltx ASV terminal_ileum	0.2	0.13	0.86	0.79	0.93
post_ltx vs healthy ASV terminal_ileum	0.2	0.05	0.96	0.90	0.99
pre_ltx vs healthy genus terminal_ileum	0.2	0.04	0.97	0.92	1.00
pre_ltx vs post_ltx genus terminal_ileum	0.0	0.67	0.83	0.75	0.92
post_ltx vs healthy genus terminal_ileum	0.0	0.07	0.97	0.92	0.99

ROC - Genus level

p <- roc_curve_all_custom(roc_cs[c(4:6)],Q="Q1",
                     model_name="enet_model")

p

3 Data Analysis - Colon

segment="colon"

3.1 Filtering

Rules: - sequencing depth > 10000 - nearZeroVar() with default settings

Rarefaction Curve

path="../../../intermediate_files/rarecurves"
seq_depth_threshold <- 10000

ps <- construct_phyloseq(colon_asv_tab,colon_taxa_tab,colon_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_colon.Rdata"))

load(file.path(path,"rarefaction_colon.Rdata"))
seq_depth_threshold <- 10000
prare <- ggrarecurve(obj=rareres,
                      factorNames="Country",
                      indexNames=c("Observe")) + 
        theme_bw()+
        theme(axis.text=element_text(size=8), 
              panel.grid=element_blank(),
              strip.background = element_rect(colour=NA,fill="grey"),
              strip.text.x = element_text(face="bold")) + 
        geom_vline(xintercept = seq_depth_threshold, 
                   linetype="dashed", color = "red") + 
        xlim(0, 20000)

The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)

prare

Library size

read_counts(colon_asv_tab, line = c(5000,10000))

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.1 Sequencing depth

data_filt <- seq_depth_filtering(colon_asv_tab,
                                 colon_taxa_tab,
                                 colon_metadata,
                                 seq_depth_threshold = 10000)

Removing 75 ASV(s)

filt_colon_asv_tab <- data_filt[[1]]; alpha_colon_asv_tab <- filt_colon_asv_tab
filt_colon_taxa_tab <- data_filt[[2]]; alpha_colon_taxa_tab <- filt_colon_taxa_tab
filt_colon_metadata <- data_filt[[3]]; alpha_colon_metadata <- filt_colon_metadata

seq_step <- dim(filt_colon_asv_tab)[1]

Library size

read_counts(filt_colon_asv_tab,line = c(10000))

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.2 NearZeroVar

data_filt <- nearzerovar_filtering(filt_colon_asv_tab,
                                   filt_colon_taxa_tab,
                                   filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]
filt_colon_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_colon_asv_tab)[1]

Library size

read_counts(filt_colon_asv_tab,line = c(5000,10000))

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

Check zero depth

data_filt <- check_zero_depth(filt_colon_asv_tab, 
                              filt_colon_taxa_tab, 
                              filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]; 
filt_colon_taxa_tab <- data_filt[[2]]; 
filt_colon_metadata <- data_filt[[3]];

Library size

read_counts(filt_colon_asv_tab,line = c(5000,10000))

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.3 Final Counts

final_counts_filtering(colon_asv_tab,
                       filt_colon_asv_tab,
                       filt_colon_metadata,
                       seq_step, 0, nearzero_step)

                                               V1
Raw data: ASVs                               6936
Raw data: Samples                             789
Sequencing depth filt: ASVs                  6861
Prevalence filt: ASVs                           0
NearZeroVar filt: ASVs                        381
Filt data: ASVs                               381
Filt data: Samples                            760
Filt data: Patients                           354
Filt data: Patients.1                           0
Filtered samples                               29
Matrices                    Cecum;Rectum;CD;CA;SI
healthy                                       161
non-rPSC                                        0
rPSC                                            0
pre_ltx                                       250
post_ltx                                      349
ETOH                                            0

3.2 Alpha diversity

Calculating Richness, Shannon, Simpson, Pielou indexes on raw (unfiltered) rarefied data. Samples with sequencing depth < 10000 were excluded. Testing by linear mixed-effect model.

path = "../../../results/merged_sites/main_analysis/Q1/alpha_diversity"

Calculation

# Construct MPSE object
alpha_colon_metadata$Sample <- alpha_colon_metadata$SampleID
colon_mpse <- as.MPSE(construct_phyloseq(alpha_colon_asv_tab,
                                         alpha_colon_taxa_tab,
                                         alpha_colon_metadata))

colon_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)

# Calculate alpha diversity - rarefied counts
colon_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)

alpha_data <- data.frame(SampleID=colon_mpse$Sample.x,
                         Observe=colon_mpse$Observe,
                         Shannon=colon_mpse$Shannon,
                         Simpson=colon_mpse$Simpson,
                         Pielou=colon_mpse$Pielou,
                         Group=colon_mpse$Group,
                         Country=colon_mpse$Country,
                         Patient=colon_mpse$Patient)

write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
          row.names = FALSE)

3.2.1 Plots

p_boxplot_alpha <- alpha_diversity_countries(alpha_data)

Using SampleID, Group, Country, Patient as id variables

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha

# see the results
p_boxplot_alpha

3.2.2 Linear Model

path = "../../../results/merged_sites/main_analysis/Q1/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))

Richness

results_model <- pairwise.lmer(
  formula = "Observe ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_observe <- results_model[[1]]
  results_model_observe_detailed <- results_model[[2]]
} else {
  results_model_observe <- results_model
  results_model_observe_detailed <- NA
}

# save the results
pc_observed[[segment]] <- results_model_observe

# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")

Raw results of linear model of richness estimation.
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj	sig
healthy vs Grouppre_ltx	-56.230	11.738	195.570	-4.791	0.000	0.000	***
healthy vs pre_ltx - CZ vs NO	14.193	10.341	194.538	1.372	0.171	0.256
healthy vs Grouppre_ltx:CountryNO	-7.489	15.169	194.358	-0.494	0.622	0.622
pre_ltx vs Grouppost_ltx	28.886	11.811	267.502	2.446	0.015	0.045	*
pre_ltx vs post_ltx - CZ vs NO	6.742	12.174	265.784	0.554	0.580	0.622
pre_ltx vs Grouppost_ltx:CountryNO	-24.047	15.112	262.828	-1.591	0.113	0.203
healthy vs Grouppost_ltx	-27.299	8.699	250.458	-3.138	0.002	0.009	**
healthy vs post_ltx - CZ vs NO	14.360	11.153	251.729	1.288	0.199	0.256
healthy vs Grouppost_ltx:CountryNO	-31.681	14.195	247.531	-2.232	0.027	0.060

knitr::kable(results_model_observe_detailed,digits = 3,
caption = "Raw results of independent country analysis")

Raw results of independent country analysis

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_CZ	171.780	7.001	162.548	24.535	0.000	NA
Grouppost_ltx	-27.276	8.558	161.345	-3.187	0.002	0.003

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_NO	186.110	8.829	88.980	21.080	0	NA
Grouppost_ltx	-58.943	11.539	87.242	-5.108	0	0

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_CZ	171.793	6.342	89.560	27.090	0.000	NA
CountryNO	14.358	9.941	89.408	1.444	0.152	0.152

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
post_ltx_CZ	144.490	5.283	158.786	27.352	0.000	NA
CountryNO	-17.319	9.272	154.970	-1.868	0.064	0.085

Shannon

results_model <- pairwise.lmer(
  formula = "Shannon ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_shannon <- results_model[[1]]
  results_model_shannon_detailed <- results_model[[2]]
} else {
  results_model_shannon <- results_model
  results_model_shannon_detailed <- NA
}

# save the results
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)

# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")

Raw results of linear model of Shannon estimation.
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj	sig
healthy vs Grouppre_ltx	-0.585	0.143	197.588	-4.084	0.000	0.001	***
healthy vs pre_ltx - CZ vs NO	0.015	0.126	196.213	0.116	0.907	0.909
healthy vs Grouppre_ltx:CountryNO	0.081	0.185	195.965	0.437	0.663	0.852
pre_ltx vs Grouppost_ltx	0.364	0.158	265.003	2.307	0.022	0.056
pre_ltx vs post_ltx - CZ vs NO	0.095	0.163	263.448	0.585	0.559	0.839
pre_ltx vs Grouppost_ltx:CountryNO	-0.455	0.202	260.765	-2.253	0.025	0.056
healthy vs Grouppost_ltx	-0.220	0.102	247.586	-2.153	0.032	0.058
healthy vs post_ltx - CZ vs NO	0.015	0.131	248.759	0.114	0.909	0.909
healthy vs Grouppost_ltx:CountryNO	-0.376	0.167	244.904	-2.255	0.025	0.056

knitr::kable(results_model_shannon_detailed,digits = 3,
caption = "Raw results of independent country analysis")

Raw results of independent country analysis

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
pre_ltx_CZ	3.102	0.144	131.105	21.535	0.000	NA
Grouppost_ltx	0.363	0.160	130.742	2.275	0.025	0.049

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
pre_ltx_NO	3.197	0.078	129.531	41.004	0.000	NA
Grouppost_ltx	-0.091	0.125	129.050	-0.732	0.466	0.563

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
pre_ltx_CZ	3.101	0.143	105.863	21.629	0.000	NA
CountryNO	0.095	0.164	104.934	0.580	0.563	0.563

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
post_ltx_CZ	3.466	0.068	156.394	51.151	0.000	NA
CountryNO	-0.360	0.119	153.436	-3.025	0.003	0.012

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_CZ	3.686	0.082	159.084	44.805	0.00	NA
Grouppost_ltx	-0.220	0.101	158.215	-2.190	0.03	0.04

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_NO	3.701	0.104	90.164	35.654	0	NA
Grouppost_ltx	-0.596	0.136	88.069	-4.396	0	0

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_CZ	3.685	0.054	89.090	68.738	0.000	NA
CountryNO	0.017	0.084	88.849	0.205	0.838	0.838

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
post_ltx_CZ	3.466	0.068	156.394	51.151	0.000	NA
CountryNO	-0.360	0.119	153.436	-3.025	0.003	0.006

Simpson

results_model <- pairwise.lmer(
  formula = "Simpson ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_simpson <- results_model[[1]]
  results_model_simpson_detailed <- results_model[[2]]
} else {
  results_model_simpson <- results_model
  results_model_simpson_detailed <- NA
}

# save the results
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)

# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")

Raw results of linear model of Simpson estimation.
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj	sig
healthy vs Grouppre_ltx	-0.059	0.019	199.853	-3.066	0.002	0.022	*
healthy vs pre_ltx - CZ vs NO	-0.007	0.017	198.127	-0.430	0.667	0.738
healthy vs Grouppre_ltx:CountryNO	0.020	0.025	197.803	0.813	0.417	0.626
pre_ltx vs Grouppost_ltx	0.031	0.026	263.530	1.213	0.226	0.455
pre_ltx vs post_ltx - CZ vs NO	0.013	0.027	261.964	0.476	0.635	0.738
pre_ltx vs Grouppost_ltx:CountryNO	-0.052	0.033	259.262	-1.556	0.121	0.363
healthy vs Grouppost_ltx	-0.027	0.017	246.082	-1.594	0.112	0.363
healthy vs post_ltx - CZ vs NO	-0.007	0.022	247.135	-0.335	0.738	0.738
healthy vs Grouppost_ltx:CountryNO	-0.032	0.028	243.697	-1.147	0.253	0.455

knitr::kable(results_model_simpson_detailed,digits = 3,
caption = "Raw results of independent country analysis")

Table: Raw results of independent country analysis

Pielou

results_model <- pairwise.lmer(
  formula = "Pielou ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_pielou <- results_model[[1]]
  results_model_pielou_detailed <- results_model[[2]]
} else {
  results_model_pielou <- results_model
  results_model_pielou_detailed <- NA
}

# save the results
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)

# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")

Raw results of linear model of Pielou estimation.
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj	sig
healthy vs Grouppre_ltx	-0.057	0.020	201.130	-2.828	0.005	0.046	*
healthy vs pre_ltx - CZ vs NO	-0.008	0.018	199.017	-0.446	0.656	0.781
healthy vs Grouppre_ltx:CountryNO	0.015	0.026	198.602	0.565	0.573	0.781
pre_ltx vs Grouppost_ltx	0.039	0.024	265.797	1.631	0.104	0.234
pre_ltx vs post_ltx - CZ vs NO	0.007	0.025	263.735	0.274	0.785	0.785
pre_ltx vs Grouppost_ltx:CountryNO	-0.062	0.031	260.213	-2.015	0.045	0.202
healthy vs Grouppost_ltx	-0.018	0.016	247.076	-1.130	0.260	0.467
healthy vs post_ltx - CZ vs NO	-0.008	0.020	248.451	-0.394	0.694	0.781
healthy vs Grouppost_ltx:CountryNO	-0.047	0.026	243.881	-1.828	0.069	0.206

knitr::kable(results_model_pielou_detailed,digits = 3,
caption = "Raw results of independent country analysis")

Raw results of independent country analysis

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
pre_ltx_CZ	0.664	0.022	130.842	29.916	0.000	NA
Grouppost_ltx	0.039	0.025	130.452	1.582	0.116	0.232

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
pre_ltx_NO	0.670	0.012	129.502	57.528	0.000	NA
Grouppost_ltx	-0.023	0.019	128.819	-1.219	0.225	0.3

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
pre_ltx_CZ	0.664	0.020	107.760	32.578	0.000	NA
CountryNO	0.007	0.023	106.251	0.291	0.772	0.772

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
post_ltx_CZ	0.703	0.011	155.706	65.557	0.000	NA
CountryNO	-0.055	0.019	152.292	-2.919	0.004	0.016

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_CZ	0.721	0.013	158.287	55.872	0.000	NA
Grouppost_ltx	-0.018	0.016	157.427	-1.146	0.254	0.338

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_NO	0.713	0.016	92.108	44.931	0.000	NA
Grouppost_ltx	-0.065	0.021	89.257	-3.151	0.002	0.008

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_CZ	0.720	0.007	89.312	100.415	0.000	NA
CountryNO	-0.008	0.011	88.968	-0.675	0.502	0.502

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
post_ltx_CZ	0.703	0.011	155.706	65.557	0.000	NA
CountryNO	-0.055	0.019	152.292	-2.919	0.004	0.008

3.2.3 Saving results

alpha_list <- list(
  Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
  Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
  Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
  Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
                   
write.xlsx(alpha_list, 
           file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))

3.3 Beta diversity

3.3.1 Main analysis

Genus level, Aitchison distance

level="genus"

path = "../../../results/merged_sites/main_analysis/Q1/beta_diversity"

Aggregation, filtering

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)

filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             colon_metadata,
                            seq_depth_threshold=10000)

Removing 10 ASV(s)

filt_colon_genus_tab <- filt_data[[1]]
filt_colon_genus_taxa <- filt_data[[2]]
filt_colon_genus_metadata <- filt_data[[3]]

3.3.2 PERMANOVA

pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
                           covariate = filt_colon_genus_metadata$Country, 
                           patients = filt_colon_genus_metadata$Patient,
                           sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
                          covariate = filt_colon_genus_metadata$Country, 
                          interaction = TRUE, 
                          patients = filt_colon_genus_metadata$Patient,
                          sim.method = "robust.aitchison", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <-rbind(pp_factor,pp_cov,pp_fac.cov)

# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")

PERMANOVA, GROUP separation
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted	sig
pre_ltx vs healthy	1	1222.921	7.950	0.019	0.001	0.002	**
pre_ltx vs post_ltx	1	630.146	4.062	0.007	0.008	0.008	**
post_ltx vs healthy	1	1736.169	11.014	0.021	0.001	0.002	**

knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")

PERMANOVA, COUNTRY separation
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted	sig
pre_ltx vs healthy , Country	1	738.576	4.801	0.011	0.001	0.001	***
pre_ltx vs post_ltx , Country	1	1532.675	9.879	0.016	0.001	0.001	***
post_ltx vs healthy , Country	1	1728.746	10.966	0.021	0.001	0.001	***

knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")

PERMANOVA, INTERACTION GROUP:Country
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted
pre_ltx vs healthy : Country	1	318.719	2.077	0.005	0.325	0.488
pre_ltx vs post_ltx : Country	1	322.405	2.082	0.003	0.524	0.524
post_ltx vs healthy : Country	1	398.820	2.538	0.005	0.112	0.336

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.1]

if (length(interaction_sig)>0){
 for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_genus_metadata$Group,
                      covariate = filt_colon_genus_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      patients = filt_colon_genus_metadata$Patient)
  print(result_list)
} 
}

3.3.2.0.1 Plots

p <- pca_plot_custom(filt_colon_genus_tab,
                                 filt_colon_genus_taxa,
                                 filt_colon_genus_metadata,
                                 show_boxplots = TRUE,
                                 variable = "Group", size=2, 
                                 show_legend= FALSE)

# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p

# see the results
p

3.3.2.1 Saving results

write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]], 
           file = file.path(path,
           paste0("beta_diversity_results_", segment,".xlsx")))

3.4 Univariate Analysis

Using two DAA tools - linDA and Maaslin2. Only the intersection of sets selected by each tool was shown to minimize false positives. Taxa with a significant interaction effect were excluded based on individual post-hoc analysis of the Czech and Norwegian cohorts. Only taxa with significant log fold change that showed the same change direction in both countries were retained.

Main analysis is performed at genus level. In supplementary analysis, it is performed at ASV and phylum levels.

3.4.1 Main analysis

level="genus"

# needed paths
path = "../../../results/merged_sites/main_analysis/Q1/univariate_analysis"
path_maaslin=file.path("../../../intermediate_files/maaslin/Q1",level)

# variables
raw_linda_results_genus[[segment]] <- list()
linda_results_genus[[segment]] <- list()

# country and interaction problems
list_country_union <- list()
list_venns <- list()

# workbook for final df
wb <- createWorkbook()

# PSC effect
psc_effect <- list()

3.4.1.1 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

colon_genus_asv_taxa_tab <- create_asv_taxa_table(colon_genus_tab,
                                                  colon_genus_taxa_tab)

3.4.1.1.1 pre_ltx vs healthy

3.4.1.1.1.1 linDA

group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")

Removing 135 ASV(s)

Removing 10 ASV(s)

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')

0  features are filtered!
The filtered data has  411  samples and  148  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")


for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}

# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.1.1.2 MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

3.4.1.1.1.3 Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.1.1.4 Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

NULL

## results for czech cohort
list_interaction_significant[[2]]

[1] NA

## results for norwegian cohort
list_interaction_significant[[3]]

[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

3.4.1.1.1.5 Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

3.4.1.1.2 pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

3.4.1.1.2.1 linDA

# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")

Removing 17 ASV(s)

Removing 10 ASV(s)

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')

0  features are filtered!
The filtered data has  600  samples and  130  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}

# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.1.2.2 MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

3.4.1.1.2.3 Group - Intersection

intersection_results <- group_intersection(group, 
                                           list_intersections, 
                                           list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.1.2.4 Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

NULL

## results for czech cohort
list_interaction_significant[[2]]

[1] NA

## results for norwegian cohort
list_interaction_significant[[3]]

[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

3.4.1.1.2.5 Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

3.4.1.1.3 post_ltx vs healthy

group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

3.4.1.1.3.1 linDA

# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,group,
                            usage="linDA")

Removing 71 ASV(s)

Removing 5 ASV(s)

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')

0  features are filtered!
The filtered data has  511  samples and  153  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}

# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.1.3.2 MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

3.4.1.1.3.3 Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.1.3.4 Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

NULL

## results for czech cohort
list_interaction_significant[[2]]

[1] NA

## results for norwegian cohort
list_interaction_significant[[3]]

[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

3.4.1.1.3.5 Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

3.4.1.1.4 Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$colon[grepl(level,names(uni_statistics$colon))],
                                      colon_taxa_tab) + xlab("") + ylab("")

min_clr -2.221209 
max_clr 6.886638 
min_log -3.771476 
max_log 5.049706

dotheatmap_linda

Horizontal bar plot

p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))

Using SeqID as id variables

p_prevalence_final <- ggarrange(p_prevalence,
                                ggplot() + theme_minimal(),
                                nrow = 2,heights = c(1,0.09))
p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),p_prevalence_final,ncol=2,widths = c(1,0.3))

p

dot_heatmap_colon <- p

3.4.1.1.5 PSC effect

To get only PSC associated taxa, we can intersect differentially abundant taxa from the previous analyses (see diagram below).

pre_LTx vs HC and Post_LTx vs HC intersection

A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)

psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
  
# see the results
psc_effect[[paste(segment,level)]]

                           SeqID
3                         Rothia
15                   Barnesiella
18                 Butyricimonas
19                   Odoribacter
27                     Alistipes
28               Parabacteroides
39                    Holdemania
57                   Coprococcus
61              Fusicatenibacter
64                    Hungatella
65             Lachnoclostridium
67  Lachnospiraceae_FCS020_group
89              Colidextribacter
91                Intestinimonas
93                 Oscillibacter
109             Negativibacillus
122     Family_XIII_AD3011_group
133                    Dialister
135                  Veillonella
143                 Enterococcus
146                   Klebsiella
148                  Pseudomonas
                                                                                                                          Taxonomy
3                                   k__Bacteria;p__Actinobacteriota;c__Actinobacteria;o__Micrococcales;f__Micrococcaceae;g__Rothia
15                                   k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Barnesiellaceae;g__Barnesiella
18                                  k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Butyricimonas
19                                    k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Odoribacter
27                                       k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes
28                                k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides
39                                 k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Holdemania
57                                     k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus
61                                k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter
64                                      k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Hungatella
65                               k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium
67                    k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_FCS020_group
89                              k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Colidextribacter
91                                k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Intestinimonas
93                                 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter
109                              k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Negativibacillus
122 k__Bacteria;p__Firmicutes;c__Clostridia;o__Peptostreptococcales-Tissierellales;f__Anaerovoracaceae;g__Family_XIII_AD3011_group
133                   k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Dialister
135                 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella
143                                     k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus
146                   k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Klebsiella
148                     k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas
    log2FoldChange      p_value         padj
3         1.842842 1.452081e-05 1.953709e-04
15       -3.590912 7.871469e-05 6.852808e-04
18       -3.002837 3.969499e-06 7.343572e-05
19       -3.406707 3.432387e-07 1.693311e-05
27       -3.110558 1.594988e-05 1.967152e-04
28       -3.568149 2.950689e-06 6.238599e-05
39       -1.174869 7.087889e-03 2.439553e-02
57       -3.357534 5.546126e-06 8.208266e-05
61       -1.917547 1.097200e-02 3.455012e-02
64        2.775798 1.422148e-04 1.107779e-03
65       -1.736481 1.812625e-03 8.383390e-03
67       -2.111851 1.228192e-03 6.268013e-03
89       -1.624469 8.895010e-03 2.991958e-02
91       -1.592037 6.335927e-03 2.404403e-02
93       -2.063990 9.829922e-04 5.538365e-03
109      -1.649295 1.442001e-02 4.268322e-02
122      -1.590837 1.010377e-03 5.538365e-03
133       2.144460 1.055678e-02 3.396528e-02
135       3.864870 1.310501e-07 9.697706e-06
143       5.049706 2.872348e-10 4.251075e-08
146       2.288824 1.471023e-03 7.257045e-03
148       1.318798 5.015789e-03 2.006316e-02

3.4.1.2 Saving results

# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
             overwrite = TRUE)

# PSC effect
write.xlsx(psc_effect[[paste(segment,level)]],file.path(path,paste0("psc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa

write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("significant_taxa_",segment,".xlsx")))

3.5 Machine learning

The supplementary analysis contains training and validating at clr-transformed data at ASV level and relative-abundance data at ASV and genus level.

path = "../../../results/merged_sites/main_analysis/Q1/models"

3.5.1 ENET

model="enet"

3.5.1.1 ASV level

level="ASV"

3.5.1.1.1 pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

Removing 2138 ASV(s)

Removing 70 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                       [,1]
alpha                           0.200000000
lambda                          0.007285712
auc                             1.000000000
auc_czech                       1.000000000
auc_no                          1.000000000
auc_optimism_corrected          0.944482930
auc_optimism_corrected_CIL      0.879483337
auc_optimism_corrected_CIU      0.986503241
accuracy                        1.000000000
accuracy_czech                          NaN
accuracy_no                     1.000000000
accuracy_optimism_corrected     0.872590475
accuracy_optimism_corrected_CIL 0.773863933
accuracy_optimism_corrected_CIU 0.937692308

enet_model$conf_matrices

$original
    Predicted
True   0   1
   0 161   0
   1   0 250

$czech
    Predicted
True  0  1
   0 95  0
   1  0 42

$no
    Predicted
True   0   1
   0  66   0
   1   0 208

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

roc_c

3.5.1.1.2 pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)

Removing 1157 ASV(s)

Removing 52 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                       [,1]
alpha                           1.000000000
lambda                          0.004424186
auc                             0.999667622
auc_czech                       0.999656593
auc_no                          0.999522640
auc_optimism_corrected          0.788320408
auc_optimism_corrected_CIL      0.715647773
auc_optimism_corrected_CIU      0.842743983
accuracy                        0.993322204
accuracy_czech                          NaN
accuracy_no                     0.994269341
accuracy_optimism_corrected     0.725877580
accuracy_optimism_corrected_CIL 0.657080387
accuracy_optimism_corrected_CIU 0.769266055

enet_model$conf_matrices

$original
    Predicted
True   0   1
   0 347   2
   1   2 248

$czech
    Predicted
True   0   1
   0 207   1
   1   1  41

$no
    Predicted
True   0   1
   0 140   1
   1   1 207

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

roc_c

3.5.1.1.3 post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

Removing 1096 ASV(s)

Removing 50 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                      [,1]
alpha                           0.40000000
lambda                          0.00203662
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.96930614
auc_optimism_corrected_CIL      0.93705851
auc_optimism_corrected_CIU      0.98669590
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.91353052
accuracy_optimism_corrected_CIL 0.86451155
accuracy_optimism_corrected_CIU 0.95141794

enet_model$conf_matrices

$original
    Predicted
True   0   1
   0 161   0
   1   0 349

$czech
    Predicted
True   0   1
   0  95   0
   1   0 208

$no
    Predicted
True   0   1
   0  66   0
   1   0 141

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

roc_c

3.5.1.2 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

3.5.1.2.1 pre_ltx vs healthy

group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

Removing 135 ASV(s)

Removing 10 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                      [,1]
alpha                           0.00000000
lambda                          0.04262835
auc                             0.99955280
auc_czech                       0.99974937
auc_no                          0.99927156
auc_optimism_corrected          0.93420491
auc_optimism_corrected_CIL      0.88169929
auc_optimism_corrected_CIU      0.97185321
accuracy                        0.98783455
accuracy_czech                         NaN
accuracy_no                     0.98905109
accuracy_optimism_corrected     0.85834482
accuracy_optimism_corrected_CIL 0.79082492
accuracy_optimism_corrected_CIU 0.92284119

enet_model$conf_matrices

$original
    Predicted
True   0   1
   0 159   2
   1   3 247

$czech
    Predicted
True  0  1
   0 94  1
   1  1 41

$no
    Predicted
True   0   1
   0  65   1
   1   2 206

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))

[1] "p_value: 0"

roc_c

3.5.1.2.2 pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

Removing 17 ASV(s)

Removing 10 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                      [,1]
alpha                           0.00000000
lambda                          0.05181695
auc                             0.96996571
auc_czech                       0.95887446
auc_no                          0.96447081
auc_optimism_corrected          0.75746917
auc_optimism_corrected_CIL      0.67636751
auc_optimism_corrected_CIU      0.83310363
accuracy                        0.91000000
accuracy_czech                         NaN
accuracy_no                     0.90544413
accuracy_optimism_corrected     0.69549915
accuracy_optimism_corrected_CIL 0.61504214
accuracy_optimism_corrected_CIU 0.77137530

enet_model$conf_matrices

$original
    Predicted
True   0   1
   0 325  25
   1  29 221

$czech
    Predicted
True   0   1
   0 202   7
   1  14  28

$no
    Predicted
True   0   1
   0 123  18
   1  15 193

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))

[1] "p_value: 0"

roc_c

3.5.1.2.3 post_ltx vs healthy

group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

Removing 71 ASV(s)

Removing 5 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group",
                              N=500,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                       [,1]
alpha                           0.600000000
lambda                          0.002245931
auc                             1.000000000
auc_czech                       1.000000000
auc_no                          1.000000000
auc_optimism_corrected          0.961311372
auc_optimism_corrected_CIL      0.923354885
auc_optimism_corrected_CIU      0.985361058
accuracy                        1.000000000
accuracy_czech                          NaN
accuracy_no                     1.000000000
accuracy_optimism_corrected     0.892756667
accuracy_optimism_corrected_CIL 0.838856178
accuracy_optimism_corrected_CIU 0.940785776

enet_model$conf_matrices

$original
    Predicted
True   0   1
   0 161   0
   1   0 350

$czech
    Predicted
True   0   1
   0  95   0
   1   0 209

$no
    Predicted
True   0   1
   0  66   0
   1   0 141

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))

[1] "p_value: 0"

roc_c

3.5.1.3 Saving results

models_summ_df_colon <- do.call(rbind, 
  models_summ[grep(segment,names(models_summ),value = TRUE)])

write.csv(models_summ_df_colon,file.path(path,paste0("elastic_net_",segment,".csv")))

3.6 Results overview

3.6.0.1 Alpha diversity

knitr::kable(pc_observed[[segment]],
             digits = 3,
             caption = "Results of linear model testing ASV Richness")

Results of linear model testing ASV Richness
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj	sig
healthy vs Grouppre_ltx	-56.230	11.738	195.570	-4.791	0.000	0.000	***
healthy vs pre_ltx - CZ vs NO	14.193	10.341	194.538	1.372	0.171	0.256
healthy vs Grouppre_ltx:CountryNO	-7.489	15.169	194.358	-0.494	0.622	0.622
pre_ltx vs Grouppost_ltx	28.886	11.811	267.502	2.446	0.015	0.045	*
pre_ltx vs post_ltx - CZ vs NO	6.742	12.174	265.784	0.554	0.580	0.622
pre_ltx vs Grouppost_ltx:CountryNO	-24.047	15.112	262.828	-1.591	0.113	0.203
healthy vs Grouppost_ltx	-27.299	8.699	250.458	-3.138	0.002	0.009	**
healthy vs post_ltx - CZ vs NO	14.360	11.153	251.729	1.288	0.199	0.256
healthy vs Grouppost_ltx:CountryNO	-31.681	14.195	247.531	-2.232	0.027	0.060

knitr::kable(pc_shannon[[segment]],
             digits = 3,
             caption = "Results of linear model testing Shannon index")

Results of linear model testing Shannon index
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj	sig
healthy vs Grouppre_ltx	-0.585	0.143	197.588	-4.084	0.000	0.001	***
healthy vs pre_ltx - CZ vs NO	0.015	0.126	196.213	0.116	0.907	0.909
healthy vs Grouppre_ltx:CountryNO	0.081	0.185	195.965	0.437	0.663	0.852
pre_ltx vs Grouppost_ltx	0.364	0.158	265.003	2.307	0.022	0.056
pre_ltx vs post_ltx - CZ vs NO	0.095	0.163	263.448	0.585	0.559	0.839
pre_ltx vs Grouppost_ltx:CountryNO	-0.455	0.202	260.765	-2.253	0.025	0.056
healthy vs Grouppost_ltx	-0.220	0.102	247.586	-2.153	0.032	0.058
healthy vs post_ltx - CZ vs NO	0.015	0.131	248.759	0.114	0.909	0.909
healthy vs Grouppost_ltx:CountryNO	-0.376	0.167	244.904	-2.255	0.025	0.056

knitr::kable(pc_simpson[[segment]],
             digits = 3,
             caption = "Results of linear model testing Simpson index")

Results of linear model testing Simpson index
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj	sig
healthy vs Grouppre_ltx	-0.059	0.019	199.853	-3.066	0.002	0.022	*
healthy vs pre_ltx - CZ vs NO	-0.007	0.017	198.127	-0.430	0.667	0.738
healthy vs Grouppre_ltx:CountryNO	0.020	0.025	197.803	0.813	0.417	0.626
pre_ltx vs Grouppost_ltx	0.031	0.026	263.530	1.213	0.226	0.455
pre_ltx vs post_ltx - CZ vs NO	0.013	0.027	261.964	0.476	0.635	0.738
pre_ltx vs Grouppost_ltx:CountryNO	-0.052	0.033	259.262	-1.556	0.121	0.363
healthy vs Grouppost_ltx	-0.027	0.017	246.082	-1.594	0.112	0.363
healthy vs post_ltx - CZ vs NO	-0.007	0.022	247.135	-0.335	0.738	0.738
healthy vs Grouppost_ltx:CountryNO	-0.032	0.028	243.697	-1.147	0.253	0.455

knitr::kable(pc_pielou[[segment]],
             digits = 3,
             caption = "Results of linear model testing Pielou index")

Results of linear model testing Pielou index
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj	sig
healthy vs Grouppre_ltx	-0.057	0.020	201.130	-2.828	0.005	0.046	*
healthy vs pre_ltx - CZ vs NO	-0.008	0.018	199.017	-0.446	0.656	0.781
healthy vs Grouppre_ltx:CountryNO	0.015	0.026	198.602	0.565	0.573	0.781
pre_ltx vs Grouppost_ltx	0.039	0.024	265.797	1.631	0.104	0.234
pre_ltx vs post_ltx - CZ vs NO	0.007	0.025	263.735	0.274	0.785	0.785
pre_ltx vs Grouppost_ltx:CountryNO	-0.062	0.031	260.213	-2.015	0.045	0.202
healthy vs Grouppost_ltx	-0.018	0.016	247.076	-1.130	0.260	0.467
healthy vs post_ltx - CZ vs NO	-0.008	0.020	248.451	-0.394	0.694	0.781
healthy vs Grouppost_ltx:CountryNO	-0.047	0.026	243.881	-1.828	0.069	0.206

Plots

alpha_div_plots[[paste(segment,"Country")]]

3.6.0.2 Beta diversity

Main analysis

knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
             digits = 3,
             caption = "Results of PERMANOVA - robust aitchison distance")

Results of PERMANOVA - robust aitchison distance
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted	sig
pre_ltx vs healthy	1	1222.921	7.950	0.019	0.001	0.002	**
pre_ltx vs post_ltx	1	630.146	4.062	0.007	0.008	0.008	**
post_ltx vs healthy	1	1736.169	11.014	0.021	0.001	0.002	**
pre_ltx vs healthy , Country	1	738.576	4.801	0.011	0.001	0.001	***
pre_ltx vs post_ltx , Country	1	1532.675	9.879	0.016	0.001	0.001	***
post_ltx vs healthy , Country	1	1728.746	10.966	0.021	0.001	0.001	***
pre_ltx vs healthy : Country	1	318.719	2.077	0.005	0.325	0.488
pre_ltx vs post_ltx : Country	1	322.405	2.082	0.003	0.524	0.524
post_ltx vs healthy : Country	1	398.820	2.538	0.005	0.112	0.336

PCA

pca_plots_list[[paste(segment,"genus custom")]]

3.6.0.3 Univariate analysis

Number of significant taxa

knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
      as.data.frame(lapply(psc_effect,nrow))) %>% t() %>% 
  `colnames<-`("Count") %>% 
  `rownames<-`(c(names(list_intersections),"PSC effect Genus")),caption="Number of significant taxa")

Number of significant taxa
	Count
terminal_ileum genus healthy vs pre_ltx	25
terminal_ileum genus pre_ltx vs post_ltx	4
terminal_ileum genus healthy vs post_ltx	38
colon genus healthy vs pre_ltx	33
colon genus pre_ltx vs post_ltx	4
colon genus healthy vs post_ltx	41
PSC effect Genus	22

3.6.0.4 Machine learning

Main models

knitr::kable(models_summ_df_colon %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
             digits=3,caption="Elastic net results")

Elastic net results
	alpha	lambda	auc_optimism_corrected	auc_optimism_corrected_CIL	auc_optimism_corrected_CIU
pre_ltx vs healthy ASV colon	0.2	0.007	0.944	0.879	0.987
pre_ltx vs post_ltx ASV colon	1.0	0.004	0.788	0.716	0.843
post_ltx vs healthy ASV colon	0.4	0.002	0.969	0.937	0.987
pre_ltx vs healthy genus colon	0.0	0.043	0.934	0.882	0.972
pre_ltx vs post_ltx genus colon	0.0	0.052	0.757	0.676	0.833
post_ltx vs healthy genus colon	0.6	0.002	0.961	0.923	0.985

ROC - Genus level

p <- roc_curve_all_custom(roc_cs[c(10:12)],Q="Q1",
                     model_name="enet_model")

p

4 Paper-ready visualizations

Alpha diversity

p_A <- alpha_div_plots$`terminal_ileum Country` +
  ggtitle("Terminal ileum")+
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) 

p_B <-  alpha_div_plots$`colon Country` +
  ggtitle("Colon") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) 

Q1_alpha <- ggarrange(p_A,ggplot() + theme_minimal(),p_B,nrow=1, ncol=3,widths = c(1,0.1,1))

Q1_alpha

Beta diversity

pca_ti <- pca_plots_list$`terminal_ileum genus custom` 
pca_colon <- pca_plots_list$`colon genus custom` 

genus_Q1_beta <- ggarrange(pca_ti,
                           ggplot() + theme_minimal(),
                           pca_colon,ncol=3,
                           widths = c(1,0.1,1))
genus_Q1_beta

Alpha + Beta diversity

alpha_beta <- ggarrange(Q1_alpha,genus_Q1_beta,
                        ncol = 1,nrow=2,labels = c("A","B"))
alpha_beta

Elastic net

Genus level

#models_to_plot <- c("knn_model","rf_model","gbm_model","enet_model")
#names(models_to_plot) <- c("kNN","RF","GBoost","ENET")

models_to_plot <- c("enet_model")
names(models_to_plot) <- c("ENET")
rocs_list <- list()
rocs_list[["enet_model"]] <- roc_cs

# ILEUM
plot_list_ileum <- list()
for (model_name in models_to_plot) {
  plot_list_ileum[[model_name]] <- 
    roc_curve_all_custom(rocs_list[[model_name]][c(4:6)],
                       Q="Q1",
                       model_name=model_name,legend = FALSE) + 
    #ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) + 
    theme(plot.title = element_text(face = "bold",size = 10)) 
}

roc_curve_ti <- ggarrange(plotlist = plot_list_ileum)

# COLON

plot_list_colon <- list()
for (model_name in models_to_plot) {
  plot_list_colon[[model_name]] <- 
    roc_curve_all_custom(rocs_list[[model_name]][c(10:12)],
                       Q="Q1",
                       model_name=model_name,legend = FALSE) + 
    #ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) + 
    theme(plot.title = element_text(face = "bold",size = 10))   
}

roc_curve_colon <- ggarrange(plotlist = plot_list_colon)


roc_curve_plot <- ggarrange(roc_curve_ti,
                            ggplot() + theme_minimal(),
                            roc_curve_colon,
                            ncol=3, widths = c(1,0.1,1))
roc_curve_plot

4.1 FIGURE 2

alpha_beta_elastic <- ggarrange(Q1_alpha,genus_Q1_beta,roc_curve_plot,
                        ncol = 1,nrow=3,labels = LETTERS,heights = c(1,1,0.8))

alpha_beta_elastic <- ggarrange(alpha_beta_elastic,ggplot() + theme_minimal(),ncol=2,
                                widths = c(1,0.15))

alpha_beta_elastic

4.2 FIGURE 3

p_ileum <- dot_heatmap_ileum + 
  ggtitle("Terminal ileum") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
        legend.position = "none")


p_colon <- dot_heatmap_colon  +
  ggtitle("Colon") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
        legend.position = "none")

heatmap_plot <- ggarrange(p_ileum,
                          p_colon,
                          ncol = 2,labels=c("A","B"))

heatmap_plot

5 Session info

sessionInfo()

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=Czech_Czechia.utf8  LC_CTYPE=Czech_Czechia.utf8   
[3] LC_MONETARY=Czech_Czechia.utf8 LC_NUMERIC=C                  
[5] LC_TIME=Czech_Czechia.utf8    

time zone: Europe/Prague
tzcode source: internal

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] mice_3.17.0              picante_1.8.2            ape_5.8-1               
 [4] lubridate_1.9.3          forcats_1.0.0            stringr_1.5.1           
 [7] tidyverse_2.0.0          kableExtra_1.4.0         tidyr_1.3.1             
[10] gbm_2.2.2                doParallel_1.0.17        iterators_1.0.14        
[13] foreach_1.5.2            ranger_0.17.0            ggvenn_0.1.15           
[16] Maaslin2_1.16.0          purrr_1.0.2              pROC_1.18.5             
[19] glmnet_4.1-8             MicrobiomeStat_1.2       caret_6.0-94            
[22] openxlsx_4.2.8           magrittr_2.0.3           emmeans_1.11.2-8        
[25] lmerTest_3.1-3           robustlmm_3.3-1          lme4_1.1-37             
[28] Matrix_1.6-5             mgcv_1.9-1               nlme_3.1-164            
[31] pheatmap_1.0.13          reshape2_1.4.4           vegan_2.6-10            
[34] lattice_0.22-6           permute_0.9-8            ggplotify_0.1.2         
[37] ggrepel_0.9.6            ggpubr_0.6.1             MicrobiotaProcess_1.14.1
[40] phyloseq_1.46.0          ggplot2_3.5.2            tibble_3.2.1            
[43] dplyr_1.1.4              cowplot_1.2.0            readr_2.1.5             
[46] igraph_2.1.4             ccrepe_1.38.1            data.table_1.15.4       

loaded via a namespace (and not attached):
  [1] fs_1.6.5                    matrixStats_1.5.0          
  [3] bitops_1.0-9                RColorBrewer_1.1-3         
  [5] numDeriv_2016.8-1.1         tools_4.3.1                
  [7] backports_1.4.1             R6_2.6.1                   
  [9] lazyeval_0.2.2              jomo_2.7-6                 
 [11] rhdf5filters_1.14.1         withr_3.0.2                
 [13] gridExtra_2.3               cli_3.6.2                  
 [15] Biobase_2.62.0              logging_0.10-108           
 [17] biglm_0.9-3                 sandwich_3.1-1             
 [19] labeling_0.4.3              mvtnorm_1.3-3              
 [21] robustbase_0.99-4-1         pbapply_1.7-4              
 [23] systemfonts_1.2.2           yulab.utils_0.2.1          
 [25] svglite_2.1.3               parallelly_1.38.0          
 [27] rstudioapi_0.17.1           generics_0.1.4             
 [29] gridGraphics_0.5-1          shape_1.4.6.1              
 [31] car_3.1-3                   zip_2.3.2                  
 [33] biomformat_1.30.0           S4Vectors_0.40.2           
 [35] abind_1.4-8                 infotheo_1.2.0.1           
 [37] lifecycle_1.0.4             multcomp_1.4-28            
 [39] yaml_2.3.10                 carData_3.0-5              
 [41] SummarizedExperiment_1.32.0 Rtsne_0.17                 
 [43] rhdf5_2.46.1                recipes_1.3.1              
 [45] SparseArray_1.2.4           grid_4.3.1                 
 [47] mitml_0.4-5                 crayon_1.5.3               
 [49] pillar_1.11.0               knitr_1.50                 
 [51] optparse_1.7.5              GenomicRanges_1.54.1       
 [53] statip_0.2.3                boot_1.3-32                
 [55] estimability_1.5.1          future.apply_1.11.3        
 [57] codetools_0.2-20            pan_1.9                    
 [59] glue_1.7.0                  ggfun_0.2.0                
 [61] vctrs_0.6.5                 treeio_1.26.0              
 [63] Rdpack_2.6.4                gtable_0.3.6               
 [65] maditr_0.8.6                gower_1.0.1                
 [67] xfun_0.52                   rbibutils_2.3              
 [69] S4Arrays_1.2.1              prodlim_2024.06.25         
 [71] libcoin_1.0-10              coda_0.19-4.1              
 [73] reformulas_0.4.1            pcaPP_2.0-5                
 [75] modeest_2.4.0               survival_3.5-8             
 [77] timeDate_4041.110           hardhat_1.4.2              
 [79] lava_1.8.1                  statmod_1.5.0              
 [81] TH.data_1.1-4               ipred_0.9-15               
 [83] ggtree_3.10.1               GenomeInfoDb_1.38.8        
 [85] fBasics_4041.97             rpart_4.1.23               
 [87] colorspace_2.1-1            BiocGenerics_0.48.1        
 [89] DBI_1.2.3                   nnet_7.3-19                
 [91] ade4_1.7-23                 tidyselect_1.2.1           
 [93] timeSeries_4041.111         compiler_4.3.1             
 [95] microbiome_1.24.0           xml2_1.3.8                 
 [97] DelayedArray_0.28.0         scales_1.4.0               
 [99] DEoptimR_1.1-4              spatial_7.3-18             
[101] rappdirs_0.3.3              digest_0.6.35              
[103] minqa_1.2.8                 rmarkdown_2.29             
[105] XVector_0.42.0              htmltools_0.5.8.1          
[107] pkgconfig_2.0.3             MatrixGenerics_1.14.0      
[109] stabledist_0.7-2            fastmap_1.2.0              
[111] rlang_1.1.3                 htmlwidgets_1.6.4          
[113] farver_2.1.2                zoo_1.8-13                 
[115] jsonlite_2.0.0              ModelMetrics_1.2.2.2       
[117] RCurl_1.98-1.17             modeltools_0.2-24          
[119] Formula_1.2-5               GenomeInfoDbData_1.2.11    
[121] patchwork_1.3.2             Rhdf5lib_1.24.2            
[123] Rcpp_1.0.12                 ggnewscale_0.5.2           
[125] fastGHQuad_1.0.1            stringi_1.8.3              
[127] ggstar_1.0.4                stable_1.1.6               
[129] zlibbioc_1.48.2             MASS_7.3-60.0.1            
[131] plyr_1.8.9                  listenv_0.9.1              
[133] Biostrings_2.70.3           splines_4.3.1              
[135] hash_2.2.6.3                multtest_2.58.0            
[137] hms_1.1.3                   ggtreeExtra_1.12.0         
[139] ggsignif_0.6.4              stats4_4.3.1               
[141] rmutil_1.1.10               evaluate_1.0.5             
[143] nloptr_2.2.1                tzdb_0.5.0                 
[145] getopt_1.20.4               future_1.34.0              
[147] clue_0.3-66                 coin_1.4-3                 
[149] broom_1.0.9                 xtable_1.8-4               
[151] tidytree_0.4.6              rstatix_0.7.2              
[153] viridisLite_0.4.2           class_7.3-22               
[155] aplot_0.2.8                 IRanges_2.36.0             
[157] cluster_2.1.8.1             timechange_0.3.0           
[159] globals_0.18.0