Geography-independent mucosal microbiota alterations in primary sclerosing cholangitis persist after liver transplantation

1. Hypothesis: PSC is associated with the gut microbiome changes

1 Introduction

1.1 About

Pre_LTx vs Post_LTx vs Healthy analysis on merged data

  • Alpha diversity – Richness, Shannon, Simpson, and Pielou indices tested by linear fixed effect or mixed effect model (group effect, country effect, interaction effect)

  • Beta diversity – PERMANOVA (group effect, country effect, interaction effect), PCA visualization

  • Differential abundance testing:

    • o Group effect – linDA + MaAsLin2 intersection

    • o Country effect – taxa with significant interaction effect were excluded based on individual post-hoc analysis. Taxa had to have the same direction in both countries

  • ML models, trained and validated using out-of-sample bootstrap (500 resamplings):

  • o ENET

  • o kNN

  • o GBoost

  • o RF


Pairwise comparisons:

Pre_ltx vs Healthy – an effect of disease, liver cirrhosis and overall bad clinical condition

Pre_ltx vs post_ltx – an effect of transplantation + general improvement of the clinical state

Post_ltx vs Healthy – does LTx lead to „healthy microbiome”?

Report structure The analysis is divided into two segments: terminal ileum and colon. Each part begins with data filtering, where sequencing depth is assessed and nearZeroVar() is applied to remove taxa with no variance in the dataset. This is followed by a series of analytical steps:

  1. Alpha diversity

  2. Beta diversity

  3. Univariate testing

  4. Machine learning models training and validation

At the end of the analysis, a brief summary of the results is provided, as well as final plots that are reported in the original publication.

Importing libraries and custom functions built for this analysis

source("custom_functions.R")

1.2 Clinical characteristics

clinical_char <- read.xlsx("../../../results/clinical_characteristics.xlsx",
                           colNames = FALSE)
colnames(clinical_char) <- clinical_char[3,]
clinical_char <- clinical_char[-c(1,2,3),]
header_groups <- c(" " = 2, "HC" = 2, "pre_LTx" = 2, "post_LTx_non-rPSC"=2, "post_LTx_rPSC"=2)
knitr::kable(clinical_char, align="l", escape=FALSE,digits=3) %>% 
  kable_styling(bootstrap_options = c("hover", "striped", "responsive")) %>% 
  kable_styling(full_width = TRUE) %>%
  add_header_above(header_groups)
HC
pre_LTx
post_LTx_non-rPSC
post_LTx_rPSC
NA Norway Czech Norway Czech Norway Czech Norway Czech
4 F/M [n] 10/30 30/26 19/65 12/19 15/36 26/57 13 3/22
5 Age [years] NA 50 (23;68) 40 (17;77) 35 (17;63) 49 (23;70) 48 (32;81) NA 48 (26;81)
6 IBD yes/no [n] - - 66/18 25/6 31/7 72/11 11/2 24/1
7 Total bilirubin [µmol/l] - 21 (3;68) 24 62 (9;500) 19.5 (11;49) 18 (5;132) 24 (13;52) 24 (6;148)
8 AST [µkat/l] - 0.5 (0.3;1.9) 1.1 (0.2;8.6) 1.6 (0.5;4.0) 0.6 (0.2;3.2) 0.4 (0.2;1.7) 1.1 (0.3, 4.0) 0.5 (0.2;1.9)
9 ALT [µkat/l] - 0.5 (0.3;1.7) 1.6 (0.5;8.6) 1.7 (0.5;5.2) 0.6 (0.1,4.1) 0.5 (0.3;4.0) 1.1 (0.2;4.5) 0.5 (0.3;3.5)
10 ALP [µkat/l] - 1.3 (0.7;2.3) 4.2 (0.9;17.3) 6.0 (1.6;17.8) 1.3 (0.6;7.9) 1.4 (0.6;3.5) 3.7 (0.5;13.9) 2.3 (0.5;13.1)
11 GGT [µkat/l] - 0.4 (0.2;5.0) - 2.6 (0.6;15.2) - 0.4 (0.1; 4.8) - 1.4 (0.3;13.7)
12 INR [-] - 1.1 (1.1;1.2) - 1.1 (0.9;1.8) - 1.1 (0.9;1.6) - 1.1 (1.0;1.7)
13 Creatinine [µmol/l] - 77 (50;115) 65 (39; 232) 60 (44;101) 84 (59; 159) 87 (49;248) 81 (65; 133) 85 (50;780)
14 Albumin [g/l] - 50 (43;57) 41 (26;49) 38 (20;50) 43 (35;49) 45 (26;51) 38 (34; 50) 42 (26;50)
15 Fecal calprotectin [µg/g] - - 59 (1;2844) 123 (6;4513) 30 (1;1945) 275 (6;4821) 55 (10;832) 564 (14;4301)
16 NANCY_max [-] - - - 2 (0;4) - 2 (0;4) - 2 (0;4)
17 eMayo [-] - - - 1 (0;2) - 1 (0;3) - 1 (0;2)
18 Mayo_DAI [-] - - - 2 (0;5) - 1 (0;8) - 2 (0;7)
19 Mayo_PSC risk score [-] - - 0.3 (;2.2;3.5) 0.3 (;2.2;3.5) 0.3 (;1.1;2.2) - 1.0 (;1.4;3.1) -
20 AOM_score [-] - - 1.9 (0.5;3.8) 2 (1.1;5.0) 1.9 (0.8;2.7) - 2.2 (1.2;3.5) -
21 APRI_score [-] - 0.2 (0.1;1.5) 0.7 (0.1;5.5) 1.1 (0.3;14.3) 0.5 (0.2;2.3) 0.3 (0.1;1.5) 0.7 (0.2;3.6) 0.4 (0.1;1.4)
22 FIB-4_score [-] - 0.8 (0.4;3.4) 1.1 (0.8;8.6) 1.6 (0.4;23.1) 1.5 (0.6;3.6) 1.1 (0.3; 5.0) 2.2 (0.5;5.8) 1.1 (0.5;3.1)
23 MELD_score [-] - 7.8 (7.1;8.6) - 8.2 (6.4;18.0) - 8.0 (6.4;14.6) - 7.8 (6.5;22.8)
24 Platelets [10^9/l] - 267 (145;400) 265 (37; 712) 201 (20;477) 197 (101;434) 208 (41;442) 212 (69;409) 213 (83;416)

1.3 Data Import

Importing ASV, taxa and metadata tables for both Czech and Norway samples.

Czech

path = "../../../../data/analysis_ready_data/ikem/"
asv_tab_ikem <- as.data.frame(fread(file.path(path,"asv_table_ikem.csv"),
                                    check.names = FALSE))
taxa_tab_ikem <- as.data.frame(fread(file.path(path,"taxa_table_ikem.csv"),
                                     check.names = FALSE))
metadata_ikem <- as.data.frame(fread(file.path(path,"metadata_ikem.csv"),
                                     check.names = FALSE))

Norway

path = "../../../../data/analysis_ready_data/norway/"
asv_tab_norway <- as.data.frame(fread(file.path(path,"asv_table_norway.csv"),
                                    check.names = FALSE))
taxa_tab_norway <- as.data.frame(fread(file.path(path,"taxa_table_norway.csv"),
                                    check.names = FALSE))
metadata_norway <- as.data.frame(fread(file.path(path,"metadata_norway.csv"),
                                    check.names = FALSE))

1.4 Merging

Merging two countries based on the different matrices - Ileum, Colon.

Terminal ileum

ileum_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="TI",Q="Q1")
Removing 1498 ASV(s)
Removing 1834 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
ileum_asv_tab <- ileum_data[[1]]
ileum_taxa_tab <- ileum_data[[2]]
ileum_metadata <- ileum_data[[3]]

Reads statistics

summary(colSums(ileum_asv_tab[,-1]))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
   3441   16579   20972   24071   28473   85910 

Colon

colon_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="colon",Q="Q1")
Removing 739 ASV(s)
Removing 266 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
colon_asv_tab <- colon_data[[1]]
colon_taxa_tab <- colon_data[[2]]
colon_metadata <- colon_data[[3]]

Reads statistics

summary(colSums(colon_asv_tab[,-1]))
   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
     63   19814   26676   30841   37553  136048 

2 Data Analysis - Terminal ileum

segment="terminal_ileum"

2.1 Filtering

Rules:

  • sequencing depth > 10000

  • nearZeroVar() with default settings

Rarefaction Curve

path="../../../intermediate_files/rarecurves"
seq_depth_threshold <- 10000
ps <- construct_phyloseq(ileum_asv_tab,ileum_taxa_tab,ileum_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_ileum.Rdata"))
load(file.path(path,"rarefaction_ileum.Rdata"))

prare <- ggrarecurve(obj=rareres,
                      factorNames="Country",
                      indexNames=c("Observe")) + 
  theme_bw() +
  theme(axis.text=element_text(size=8), panel.grid=element_blank(),
        strip.background = element_rect(colour=NA,fill="grey"),
        strip.text.x = element_text(face="bold")) + 
  geom_vline(xintercept = seq_depth_threshold, 
             linetype="dashed", 
             color = "red") + 
  xlim(0, 20000)
The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)
prare

Library size

read_counts(ileum_asv_tab, line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.1 Sequencing depth

data_filt <- seq_depth_filtering(ileum_asv_tab,
                                 ileum_taxa_tab,
                                 ileum_metadata,
                                 seq_depth_threshold = 10000)
Removing 131 ASV(s)
filt_ileum_asv_tab <- data_filt[[1]]; alpha_ileum_asv_tab <- filt_ileum_asv_tab
filt_ileum_taxa_tab <- data_filt[[2]]; alpha_ileum_taxa_tab <- filt_ileum_taxa_tab
filt_ileum_metadata <- data_filt[[3]]; alpha_ileum_metadata <- filt_ileum_metadata

seq_step <- dim(filt_ileum_asv_tab)[1]

Library size

read_counts(filt_ileum_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.2 NearZeroVar

data_filt <- nearzerovar_filtering(filt_ileum_asv_tab, 
                                   filt_ileum_taxa_tab,
                                   filt_ileum_metadata)

filt_ileum_asv_tab <- data_filt[[1]]
filt_ileum_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_ileum_asv_tab)[1]

Library size

read_counts(filt_ileum_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.3 Final Counts

final_counts_filtering(ileum_asv_tab,
                       filt_ileum_asv_tab,
                       filt_ileum_metadata,
                       seq_step, 0, nearzero_step) %>% `colnames<-`("Count")
                            Count
Raw data: ASVs               5132
Raw data: Samples             294
Sequencing depth filt: ASVs  5001
Prevalence filt: ASVs           0
NearZeroVar filt: ASVs        479
Filt data: ASVs               479
Filt data: Samples            281
Filt data: Patients           281
Filt data: Patients.1           0
Filtered samples               13
Matrices                       TI
healthy                        73
non-rPSC                        0
rPSC                            0
pre_ltx                        70
post_ltx                      138
ETOH                            0

2.2 Alpha diversity

Calculating Richness, Shannon, Simpson, Pielou indexes on raw (unfiltered) rarefied data. Samples with sequencing depth < 10000 were excluded. Testing by linear model.

path = "../../../results/merged_sites/main_analysis/Q1/alpha_diversity"

Calculation

# Construct MPSE object
alpha_ileum_metadata$Sample <- alpha_ileum_metadata$SampleID
ileum_mpse <- as.MPSE(construct_phyloseq(alpha_ileum_asv_tab,
                                         alpha_ileum_taxa_tab,
                                         alpha_ileum_metadata))

ileum_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)

# Calculate alpha diversity - rarefied counts
ileum_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)
alpha_div_plots <- list()

# preparing data frame
alpha_data <- data.frame(SampleID=ileum_mpse$Sample.x,
                         Observe=ileum_mpse$Observe,
                         Shannon=ileum_mpse$Shannon,
                         Simpson=ileum_mpse$Simpson,
                         Pielou=ileum_mpse$Pielou,
                         Group=ileum_mpse$Group,
                         Country=ileum_mpse$Country,
                         Patient=ileum_mpse$Patient)

# save the result
write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
          row.names = FALSE)

2.2.1 Plots

p_boxplot_alpha <- alpha_diversity_countries(alpha_data)
Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha

# see the results
p_boxplot_alpha

2.2.2 Linear Models

path = "../../../results/merged_sites/main_analysis/Q1/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))

Richness

# run model
results_model <- pairwise.lm(formula = "Observe ~ Group * Country",
                             factors=alpha_data$Group,
                             data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_observe <- results_model[[1]]
  results_model_observe_emeans <- results_model[[2]]
} else {
  results_model_observe <- results_model
  results_model_observe_emeans <- NA
}

# save the results
pc_observed <- list(); 
pc_observed[[segment]] <- results_model_observe
# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")
Raw results of linear model of richness estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -53.642 11.742 -4.568 0.000 0.000 ***
healthy , pre_ltx - CZ vs NO 11.219 10.488 1.070 0.287 0.361
healthy vs Grouppre_ltx:CountryNO 6.313 15.403 0.410 0.683 0.683
pre_ltx vs Grouppost_ltx 25.623 11.366 2.254 0.025 0.057
pre_ltx , post_ltx - CZ vs NO 17.533 12.615 1.390 0.166 0.249
pre_ltx vs Grouppost_ltx:CountryNO -35.933 15.921 -2.257 0.025 0.057
healthy vs Grouppost_ltx -28.019 9.237 -3.033 0.003 0.012 *
healthy , post_ltx - CZ vs NO 11.219 11.267 0.996 0.321 0.361
healthy vs Grouppost_ltx:CountryNO -29.620 14.629 -2.025 0.044 0.080
knitr::kable(results_model_observe_emeans,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
contrast Country estimate SE df t.ratio p.value
post_ltx - pre_ltx CZ 25.623 11.366 204 2.254 0.025
post_ltx - pre_ltx NO -10.310 11.148 204 -0.925 0.356
post_ltx - healthy CZ -28.019 9.237 207 -3.033 0.003
post_ltx - healthy NO -57.639 11.344 207 -5.081 0.000

Shannon

# run model
results_model <- pairwise.lm(formula = "Shannon ~ Group * Country",
                             factors=alpha_data$Group,
                             data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_shannon <- results_model[[1]]
  results_model_shannon_emeans <- results_model[[2]]
} else {
  results_model_shannon <- results_model
  results_model_shannon_emeans <- NA
}

# save the results
pc_shannon <- list(); 
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)
# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")
Raw results of linear model of Shannon estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.505 0.159 -3.171 0.002 0.017 *
healthy , pre_ltx - CZ vs NO 0.007 0.142 0.046 0.963 0.963
healthy vs Grouppre_ltx:CountryNO 0.144 0.209 0.687 0.493 0.634
pre_ltx vs Grouppost_ltx 0.372 0.148 2.514 0.013 0.049 *
pre_ltx , post_ltx - CZ vs NO 0.150 0.164 0.915 0.361 0.542
pre_ltx vs Grouppost_ltx:CountryNO -0.502 0.207 -2.423 0.016 0.049 *
healthy vs Grouppost_ltx -0.134 0.111 -1.200 0.231 0.416
healthy , post_ltx - CZ vs NO 0.007 0.136 0.048 0.962 0.963
healthy vs Grouppost_ltx:CountryNO -0.358 0.176 -2.031 0.044 0.098
knitr::kable(results_model_shannon_emeans,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
contrast Country estimate SE df t.ratio p.value
post_ltx - pre_ltx CZ 0.372 0.148 204 2.514 0.013
post_ltx - pre_ltx NO -0.130 0.145 204 -0.898 0.370
post_ltx - healthy CZ -0.134 0.111 207 -1.200 0.231
post_ltx - healthy NO -0.492 0.137 207 -3.597 0.000

Simpson

# run model
results_model <- pairwise.lm(formula = "Simpson ~ Group * Country",
                                     factors=alpha_data$Group,
                                     data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_simpson <- results_model[[1]]
  results_model_simpson_emeans <- results_model[[2]]
} else {
  results_model_simpson <- results_model
  results_model_simpson_emeans <- NA
}


# save the results
pc_simpson <- list(); 
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)
# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")
Raw results of linear model of Simpson estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.046 0.019 -2.481 0.014 0.129
healthy , pre_ltx - CZ vs NO -0.008 0.017 -0.466 0.642 0.692
healthy vs Grouppre_ltx:CountryNO 0.027 0.024 1.092 0.277 0.498
pre_ltx vs Grouppost_ltx 0.037 0.021 1.763 0.079 0.238
pre_ltx , post_ltx - CZ vs NO 0.019 0.023 0.811 0.418 0.627
pre_ltx vs Grouppost_ltx:CountryNO -0.060 0.029 -2.033 0.043 0.195
healthy vs Grouppost_ltx -0.009 0.016 -0.572 0.568 0.692
healthy , post_ltx - CZ vs NO -0.008 0.020 -0.396 0.692 0.692
healthy vs Grouppost_ltx:CountryNO -0.033 0.025 -1.303 0.194 0.436
knitr::kable(results_model_simpson_emeans,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
contrast Country estimate SE df t.ratio p.value
post_ltx - pre_ltx CZ 0.037 0.021 204 1.763 0.079
post_ltx - pre_ltx NO -0.023 0.021 204 -1.106 0.270

Pielou

# run model
results_model <- pairwise.lm(formula = "Pielou ~ Group * Country",
                                     factors=alpha_data$Group,
                                     data=alpha_data)

# check interaction

if (!is.data.frame(results_model)){
  results_model_pielou <- results_model[[1]]
  results_model_pielou_emeans <- results_model[[2]]
} else {
  results_model_pielou <- results_model
  results_model_pielou_emeans <- NA
}

# save the results
pc_pielou <- list(); 
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)
# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")
Raw results of linear model of Pielou estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.040 0.024 -1.708 0.090 0.285
healthy , pre_ltx - CZ vs NO -0.008 0.021 -0.394 0.694 0.943
healthy vs Grouppre_ltx:CountryNO 0.010 0.031 0.332 0.740 0.943
pre_ltx vs Grouppost_ltx 0.042 0.022 1.922 0.056 0.285
pre_ltx , post_ltx - CZ vs NO 0.002 0.024 0.082 0.935 0.943
pre_ltx vs Grouppost_ltx:CountryNO -0.051 0.030 -1.677 0.095 0.285
healthy vs Grouppost_ltx 0.001 0.017 0.072 0.943 0.943
healthy , post_ltx - CZ vs NO -0.008 0.020 -0.407 0.684 0.943
healthy vs Grouppost_ltx:CountryNO -0.041 0.027 -1.525 0.129 0.290
knitr::kable(results_model_pielou_emeans,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
contrast Country estimate SE df t.ratio p.value
post_ltx - pre_ltx CZ 0.042 0.022 204 1.922 0.056
post_ltx - pre_ltx NO -0.009 0.021 204 -0.435 0.664

2.2.3 Saving results

alpha_list <- list(
  Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
  Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
  Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
  Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
                   
write.xlsx(alpha_list, 
           file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))

2.3 Beta diversity

Calculating Aitchison distance (euclidean distance on clr-transformed data) at genus level (main analysis). In supplements, there are also bray-curtis and Jaccard distance at ASV and genus levels. Testing by PERMANOVA.

2.3.1 Main analysis

Genus level, Aitchison distance

level="genus"
path = "../../../results/merged_sites/main_analysis/Q1/beta_diversity"
pairwise_aitchison_raw <- list()
pca_plots_list <- list()

Aggregation, filtering

# Aggregation
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)
# Filtration
filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             ileum_metadata,
                             seq_depth_threshold=10000)
Removing 5 ASV(s)
filt_ileum_genus_tab <- filt_data[[1]]
filt_ileum_genus_taxa <- filt_data[[2]]
filt_ileum_metadata <- filt_data[[3]]
2.3.1.0.1 PERMANOVA
# prepare dataset
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_ileum_metadata$Group,
                           covariate = filt_ileum_metadata$Country, 
                           sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,
                          filt_ileum_metadata$Group,
                          covariate = filt_ileum_metadata$Country, 
                          interaction = TRUE, sim.method = "robust.aitchison",
                          p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 695.306 4.259 0.029 0.001 0.002 **
pre_ltx vs post_ltx 1 309.768 1.811 0.009 0.003 0.003 **
post_ltx vs healthy 1 914.409 5.454 0.025 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy , Country 1 482.074 2.953 0.020 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 699.075 4.087 0.019 0.001 0.001 ***
post_ltx vs healthy , Country 1 728.020 4.342 0.020 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy : Country 1 237.598 1.460 0.010 0.010 0.030 *
pre_ltx vs post_ltx : Country 1 237.265 1.390 0.007 0.021 0.032 *
post_ltx vs healthy : Country 1 202.888 1.211 0.006 0.106 0.106

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.1]

for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2)
  print(result_list)
}
$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Model     1    275.1 0.02844 1.7272  0.003      0.003
Residual 59   9396.0 0.97156                         
Total    60   9671.1 1.00000                         

$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs     R2      F Pr(>F) p.adjusted
Model     1    657.8 0.0474 3.9806  0.001  0.0013333
Residual 80  13220.8 0.9526                         
Total    81  13878.7 1.0000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2     F Pr(>F) p.adjusted
Model     1    362.6 0.03086 2.165  0.001  0.0013333
Residual 68  11387.2 0.96914                        
Total    69  11749.8 1.00000                        

$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Model     1    357.1 0.03082 2.2579  0.001  0.0013333
Residual 71  11229.6 0.96918                         
Total    72  11586.7 1.00000                         

$pre_ltx_post_ltx_CZ
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
          Df SumOfSqs     R2      F Pr(>F) p.adjusted
Model      1    184.3 0.0085 1.0629  0.314      0.314
Residual 124  21497.3 0.9915                         
Total    125  21681.6 1.0000                         

$pre_ltx_post_ltx_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
         Df SumOfSqs      R2      F Pr(>F) p.adjusted
Model     1    362.8 0.02649 2.1767  0.001  0.0013333
Residual 80  13332.7 0.97351                         
Total    81  13695.4 1.00000                         

$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
         Df SumOfSqs      R2     F Pr(>F) p.adjusted
Model     1    362.6 0.03086 2.165  0.001  0.0013333
Residual 68  11387.2 0.96914                        
Total    69  11749.8 1.00000                        

$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999

adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
          Df SumOfSqs      R2      F Pr(>F) p.adjusted
Model      1    573.8 0.02389 3.3288  0.001  0.0013333
Residual 136  23442.8 0.97611                         
Total    137  24016.6 1.00000                         
2.3.1.0.2 Plots
p <- pca_plot_custom(filt_ileum_genus_tab,
                                 filt_ileum_genus_taxa,
                                 filt_ileum_metadata,
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend= FALSE)

# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p

# see the results
p

2.3.1.1 Saving results

write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]], 
           file = file.path(path,
           paste0("beta_diversity_results_", segment,".xlsx")))

2.4 Univariate Analysis

Using two DAA tools - linDA and Maaslin. Only the intersection of sets selected by each tool was shown to minimize false positives. Taxa with a significant interaction effect were excluded based on individual post-hoc analysis of the Czech and Norwegian cohorts. Only taxa with significant log fold change that showed the same change direction in both countries were retained.

Main analysis is performed at genus level. In supplements, analysis is performed at ASV and phylum levels.

2.4.1 Main analysis

level="genus"

# needed paths
path = "../../../results/merged_sites/main_analysis/Q1/univariate_analysis"
path_maaslin=file.path("../../../intermediate_files/maaslin/Q1",level)
# variables
raw_linda_results_genus <- list();
raw_linda_results_genus[[segment]] <- list()
linda_results_genus <- list(); 
linda_results_genus[[segment]] <- list()

# country and interaction problems
list_country_union <- list()
list_intersections <- list()
list_venns <- list()
uni_statistics <- list()

# workbook for final df
wb <- createWorkbook()

# PSC effect
psc_effect <- list()

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

ileum_genus_asv_taxa_tab <- create_asv_taxa_table(ileum_genus_tab,
                                                  ileum_genus_taxa_tab)

2.4.1.1 pre_ltx vs healthy

We can assume, that the differentially abundant taxa between these two groups are caused by PSC disease and other factors, like liver cirrhosis and overall bad clinical condition.

group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
2.4.1.1.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 84 ASV(s)
Removing 10 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  143  samples and  184  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plots
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

2.4.1.1.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

2.4.1.1.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.1.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
2.4.1.1.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

2.4.1.2 pre_ltx vs post_ltx

group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
2.4.1.2.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 46 ASV(s)
Removing 6 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  208  samples and  172  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

volcano

2.4.1.2.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

2.4.1.2.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.2.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
2.4.1.2.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

2.4.1.3 post_ltx vs healthy

We can assume, that the differentially abundant taxa between these two groups remain because of the persistent gut MB alteration due to the PSC disease. Also, transplantation itself could add to this difference, as it has definitely strong impact on microbial composition.

group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
2.4.1.3.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 45 ASV(s)
Removing 2 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  211  samples and  180  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

2.4.1.3.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

2.4.1.3.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.3.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
2.4.1.3.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

2.4.1.4 Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab) 
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
                                      ileum_taxa_tab) + xlab("") + ylab("")
min_clr -1.767585 
max_clr 6.908912 
min_log -5.062147 
max_log 4.892498 
dotheatmap_linda

Horizontal bar plot

p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))
Using SeqID as id variables
p_prevalence_final <- ggarrange(p_prevalence,
                                ggplot() + theme_minimal(),
                                nrow = 2,heights = c(1,0.09))

p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),
               p_prevalence_final,
               ncol=2,widths = c(1,0.3))
Warning: Removed 80 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 80 rows containing missing values or values outside the scale range
(`geom_text()`).
p

dot_heatmap_ileum <- p

2.4.1.5 PSC effect

To get only PSC associated taxa, we can intersect differentially abundant taxa from the previous analyses (see diagram below).

pre_LTx vs HC and Post_LTx vs HC intersection

imagename
A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)

psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
  
# see the results
psc_effect[[paste(segment,level)]] 
                           SeqID
10                 Enterorhabdus
16                   Barnesiella
19                 Butyricimonas
20                   Odoribacter
27                     Alistipes
29               Parabacteroides
41                    Holdemania
64                   Coprococcus
68              Fusicatenibacter
72             Lachnoclostridium
74  Lachnospiraceae_FCS020_group
105                Oscillibacter
118             Faecalibacterium
155                  Pseudomonas
164                 Enterococcus
                                                                                                        Taxonomy
10     k__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;f__Eggerthellaceae;g__Enterorhabdus
16                 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Barnesiellaceae;g__Barnesiella
19                k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Butyricimonas
20                  k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Odoribacter
27                     k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes
29              k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides
41               k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Holdemania
64                   k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus
68              k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter
72             k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium
74  k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_FCS020_group
105              k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter
118            k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium
155   k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas
164                   k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus
    log2FoldChange      p_value         padj
10       -1.050549 1.087023e-02 4.545734e-02
16       -3.177161 1.263665e-03 7.750481e-03
19       -2.900425 6.571752e-05 9.301556e-04
20       -2.971657 3.696980e-05 6.184040e-04
27       -2.559742 5.731016e-04 4.336862e-03
29       -3.418185 1.366753e-05 2.794251e-04
41       -1.213660 1.082333e-02 4.545734e-02
64       -3.868534 6.688508e-07 3.076714e-05
68       -2.864295 5.224510e-04 4.336862e-03
72       -1.781011 1.555333e-03 9.231654e-03
74       -2.231945 8.971276e-04 6.113758e-03
105      -1.970465 2.772265e-03 1.416936e-02
118      -3.360183 8.947299e-06 2.351861e-04
155       1.573079 8.758249e-03 4.028794e-02
164       4.892498 4.554602e-10 8.380467e-08

2.4.1.6 Saving results

# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
             overwrite = TRUE)

# PSC effect
write.xlsx(psc_effect[[paste(segment,level)]],file.path(path,paste0("psc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa

write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("significant_taxa_",segment,".xlsx")))

2.5 Machine learning

Binary classification was performed using four independent models: Elastic Net (ENET), RF (RF), Gradient Boosting (GBoost), K-nearest Neighbors (kNN). Training and validation were performed via bootstrapping (N=500) on clr-transformed data. Model performance metrics expressed as AUC were calculated based on an out-of-sample principle.

The supplementary analysis contains training and validating at clr-transformed data at ASV level and relative-abundance data at ASV and genus level.

path = "../../../results/merged_sites/main_analysis/Q1/models"

2.5.1 ENET

model="enet"

2.5.1.1 ASV level

level="ASV"
2.5.1.1.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, usage="ml_clr")
Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC curve
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ <- list()
models_cm <- list()
betas <- list()
roc_cs <- list()

models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.4000000
lambda                          0.0256910
auc                             1.0000000
auc_czech                       1.0000000
auc_no                          1.0000000
auc_optimism_corrected          0.9562938
auc_optimism_corrected_CIL      0.8940903
auc_optimism_corrected_CIU      0.9955143
accuracy                        1.0000000
accuracy_czech                        NaN
accuracy_no                     1.0000000
accuracy_optimism_corrected     0.8824810
accuracy_optimism_corrected_CIL 0.7808636
accuracy_optimism_corrected_CIU 0.9600000
enet_model$conf_matrices
$original
    Predicted
True  0  1
   0 73  0
   1  0 70

$czech
    Predicted
True  0  1
   0 37  0
   1  0 24

$no
    Predicted
True  0  1
   0 36  0
   1  0 46
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

2.5.1.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.2000000
lambda                          0.1292328
auc                             0.9967909
auc_czech                       0.9979575
auc_no                          0.9969807
auc_optimism_corrected          0.8645187
auc_optimism_corrected_CIL      0.7852767
auc_optimism_corrected_CIU      0.9318161
accuracy                        0.9278846
accuracy_czech                        NaN
accuracy_no                     0.9512195
accuracy_optimism_corrected     0.8220597
accuracy_optimism_corrected_CIL 0.7333333
accuracy_optimism_corrected_CIU 0.8924940
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 138   0
   1  15  55

$czech
    Predicted
True   0   1
   0 102   0
   1  11  13

$no
    Predicted
True  0  1
   0 36  0
   1  4 42
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

2.5.1.1.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
                                     ileum_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.20000000
lambda                          0.04799701
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.95799961
auc_optimism_corrected_CIL      0.89999175
auc_optimism_corrected_CIU      0.98977781
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.89383497
accuracy_optimism_corrected_CIL 0.82118019
accuracy_optimism_corrected_CIU 0.96028870
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0  73   0
   1   0 138

$czech
    Predicted
True   0   1
   0  37   0
   1   0 102

$no
    Predicted
True  0  1
   0 36  0
   1  0 36
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

2.5.1.2 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
2.5.1.2.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.20000000
lambda                          0.03984448
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.96582239
auc_optimism_corrected_CIL      0.91575521
auc_optimism_corrected_CIU      0.99568966
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.89754798
accuracy_optimism_corrected_CIL 0.80769231
accuracy_optimism_corrected_CIU 0.97141139
enet_model$conf_matrices
$original
    Predicted
True  0  1
   0 73  0
   1  0 70

$czech
    Predicted
True  0  1
   0 37  0
   1  0 24

$no
    Predicted
True  0  1
   0 36  0
   1  0 46
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

2.5.1.2.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.0000000
lambda                          0.6674012
auc                             0.9726708
auc_czech                       0.9616013
auc_no                          0.9800725
auc_optimism_corrected          0.8344709
auc_optimism_corrected_CIL      0.7487803
auc_optimism_corrected_CIU      0.9154473
accuracy                        0.8653846
accuracy_czech                        NaN
accuracy_no                     0.8780488
accuracy_optimism_corrected     0.7788698
accuracy_optimism_corrected_CIL 0.6973684
accuracy_optimism_corrected_CIU 0.8516141
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 136   2
   1  26  44

$czech
    Predicted
True   0   1
   0 101   1
   1  17   7

$no
    Predicted
True  0  1
   0 35  1
   1  9 37
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

2.5.1.2.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,group, 
                                     usage="ml_clr")
Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.00000000
lambda                          0.07097722
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.96566611
auc_optimism_corrected_CIL      0.92200341
auc_optimism_corrected_CIU      0.99386451
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.90031834
accuracy_optimism_corrected_CIL 0.82800926
accuracy_optimism_corrected_CIU 0.96054545
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0  73   0
   1   0 138

$czech
    Predicted
True   0   1
   0  37   0
   1   0 102

$no
    Predicted
True  0  1
   0 36  0
   1  0 36
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

2.5.1.3 Saving results

models_summ_df_ileum <- do.call(rbind, 
  models_summ[grep(segment,names(models_summ),value = TRUE)])

write.csv(models_summ_df_ileum,file.path(path,paste0("elastic_net_",segment,".csv")))

2.6 Results overview

2.6.0.1 Alpha diversity

knitr::kable(pc_observed[[segment]],
             digits = 3,
             caption = "Results of linear model testing ASV Richness")
Results of linear model testing ASV Richness
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -53.642 11.742 -4.568 0.000 0.000 ***
healthy , pre_ltx - CZ vs NO 11.219 10.488 1.070 0.287 0.361
healthy vs Grouppre_ltx:CountryNO 6.313 15.403 0.410 0.683 0.683
pre_ltx vs Grouppost_ltx 25.623 11.366 2.254 0.025 0.057
pre_ltx , post_ltx - CZ vs NO 17.533 12.615 1.390 0.166 0.249
pre_ltx vs Grouppost_ltx:CountryNO -35.933 15.921 -2.257 0.025 0.057
healthy vs Grouppost_ltx -28.019 9.237 -3.033 0.003 0.012 *
healthy , post_ltx - CZ vs NO 11.219 11.267 0.996 0.321 0.361
healthy vs Grouppost_ltx:CountryNO -29.620 14.629 -2.025 0.044 0.080
knitr::kable(pc_shannon[[segment]],
             digits = 3,
             caption = "Results of linear model testing Shannon index")
Results of linear model testing Shannon index
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.505 0.159 -3.171 0.002 0.017 *
healthy , pre_ltx - CZ vs NO 0.007 0.142 0.046 0.963 0.963
healthy vs Grouppre_ltx:CountryNO 0.144 0.209 0.687 0.493 0.634
pre_ltx vs Grouppost_ltx 0.372 0.148 2.514 0.013 0.049 *
pre_ltx , post_ltx - CZ vs NO 0.150 0.164 0.915 0.361 0.542
pre_ltx vs Grouppost_ltx:CountryNO -0.502 0.207 -2.423 0.016 0.049 *
healthy vs Grouppost_ltx -0.134 0.111 -1.200 0.231 0.416
healthy , post_ltx - CZ vs NO 0.007 0.136 0.048 0.962 0.963
healthy vs Grouppost_ltx:CountryNO -0.358 0.176 -2.031 0.044 0.098
knitr::kable(pc_simpson[[segment]],
             digits = 3,
             caption = "Results of linear model testing Simpson index")
Results of linear model testing Simpson index
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.046 0.019 -2.481 0.014 0.129
healthy , pre_ltx - CZ vs NO -0.008 0.017 -0.466 0.642 0.692
healthy vs Grouppre_ltx:CountryNO 0.027 0.024 1.092 0.277 0.498
pre_ltx vs Grouppost_ltx 0.037 0.021 1.763 0.079 0.238
pre_ltx , post_ltx - CZ vs NO 0.019 0.023 0.811 0.418 0.627
pre_ltx vs Grouppost_ltx:CountryNO -0.060 0.029 -2.033 0.043 0.195
healthy vs Grouppost_ltx -0.009 0.016 -0.572 0.568 0.692
healthy , post_ltx - CZ vs NO -0.008 0.020 -0.396 0.692 0.692
healthy vs Grouppost_ltx:CountryNO -0.033 0.025 -1.303 0.194 0.436
knitr::kable(pc_pielou[[segment]],
             digits = 3,
             caption = "Results of linear model testing Pielou index")
Results of linear model testing Pielou index
Estimate Std..Error t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.040 0.024 -1.708 0.090 0.285
healthy , pre_ltx - CZ vs NO -0.008 0.021 -0.394 0.694 0.943
healthy vs Grouppre_ltx:CountryNO 0.010 0.031 0.332 0.740 0.943
pre_ltx vs Grouppost_ltx 0.042 0.022 1.922 0.056 0.285
pre_ltx , post_ltx - CZ vs NO 0.002 0.024 0.082 0.935 0.943
pre_ltx vs Grouppost_ltx:CountryNO -0.051 0.030 -1.677 0.095 0.285
healthy vs Grouppost_ltx 0.001 0.017 0.072 0.943 0.943
healthy , post_ltx - CZ vs NO -0.008 0.020 -0.407 0.684 0.943
healthy vs Grouppost_ltx:CountryNO -0.041 0.027 -1.525 0.129 0.290

Plots

alpha_div_plots[[paste(segment,"Country")]]

2.6.0.2 Beta diversity

Main analysis

knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
             digits = 3,
             caption = "Results of PERMANOVA - robust aitchison distance")
Results of PERMANOVA - robust aitchison distance
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 695.306 4.259 0.029 0.001 0.002 **
pre_ltx vs post_ltx 1 309.768 1.811 0.009 0.003 0.003 **
post_ltx vs healthy 1 914.409 5.454 0.025 0.001 0.002 **
pre_ltx vs healthy , Country 1 482.074 2.953 0.020 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 699.075 4.087 0.019 0.001 0.001 ***
post_ltx vs healthy , Country 1 728.020 4.342 0.020 0.001 0.001 ***
pre_ltx vs healthy : Country 1 237.598 1.460 0.010 0.010 0.030 *
pre_ltx vs post_ltx : Country 1 237.265 1.390 0.007 0.021 0.032 *
post_ltx vs healthy : Country 1 202.888 1.211 0.006 0.106 0.106

PCA

pca_plots_list[[paste(segment,"genus custom")]]

2.6.0.3 Univariate analysis

Number of significant taxa

knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
      as.data.frame(lapply(psc_effect,nrow))) %>% t() %>% 
  `colnames<-`("Count") %>% 
  `rownames<-`(c(names(list_intersections),"PSC effect Genus")),caption="Number of significant taxa")
Number of significant taxa
Count
terminal_ileum genus healthy vs pre_ltx 25
terminal_ileum genus pre_ltx vs post_ltx 4
terminal_ileum genus healthy vs post_ltx 38
PSC effect Genus 15

2.6.0.4 Machine learning

Main models

knitr::kable(models_summ_df_ileum %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
             digits=2,caption="Elastic net results")
Elastic net results
alpha lambda auc_optimism_corrected auc_optimism_corrected_CIL auc_optimism_corrected_CIU
pre_ltx vs healthy ASV terminal_ileum 0.4 0.03 0.96 0.89 1.00
pre_ltx vs post_ltx ASV terminal_ileum 0.2 0.13 0.86 0.79 0.93
post_ltx vs healthy ASV terminal_ileum 0.2 0.05 0.96 0.90 0.99
pre_ltx vs healthy genus terminal_ileum 0.2 0.04 0.97 0.92 1.00
pre_ltx vs post_ltx genus terminal_ileum 0.0 0.67 0.83 0.75 0.92
post_ltx vs healthy genus terminal_ileum 0.0 0.07 0.97 0.92 0.99

ROC - Genus level

p <- roc_curve_all_custom(roc_cs[c(4:6)],Q="Q1",
                     model_name="enet_model")

p

3 Data Analysis - Colon

segment="colon"

3.1 Filtering

Rules: - sequencing depth > 10000 - nearZeroVar() with default settings

Rarefaction Curve

path="../../../intermediate_files/rarecurves"
seq_depth_threshold <- 10000
ps <- construct_phyloseq(colon_asv_tab,colon_taxa_tab,colon_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_colon.Rdata"))
load(file.path(path,"rarefaction_colon.Rdata"))
seq_depth_threshold <- 10000
prare <- ggrarecurve(obj=rareres,
                      factorNames="Country",
                      indexNames=c("Observe")) + 
        theme_bw()+
        theme(axis.text=element_text(size=8), 
              panel.grid=element_blank(),
              strip.background = element_rect(colour=NA,fill="grey"),
              strip.text.x = element_text(face="bold")) + 
        geom_vline(xintercept = seq_depth_threshold, 
                   linetype="dashed", color = "red") + 
        xlim(0, 20000)
The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)
prare

Library size

read_counts(colon_asv_tab, line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.1 Sequencing depth

data_filt <- seq_depth_filtering(colon_asv_tab,
                                 colon_taxa_tab,
                                 colon_metadata,
                                 seq_depth_threshold = 10000)
Removing 75 ASV(s)
filt_colon_asv_tab <- data_filt[[1]]; alpha_colon_asv_tab <- filt_colon_asv_tab
filt_colon_taxa_tab <- data_filt[[2]]; alpha_colon_taxa_tab <- filt_colon_taxa_tab
filt_colon_metadata <- data_filt[[3]]; alpha_colon_metadata <- filt_colon_metadata

seq_step <- dim(filt_colon_asv_tab)[1]

Library size

read_counts(filt_colon_asv_tab,line = c(10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.2 NearZeroVar

data_filt <- nearzerovar_filtering(filt_colon_asv_tab,
                                   filt_colon_taxa_tab,
                                   filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]
filt_colon_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_colon_asv_tab)[1]

Library size

read_counts(filt_colon_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

Check zero depth

data_filt <- check_zero_depth(filt_colon_asv_tab, 
                              filt_colon_taxa_tab, 
                              filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]; 
filt_colon_taxa_tab <- data_filt[[2]]; 
filt_colon_metadata <- data_filt[[3]]; 

Library size

read_counts(filt_colon_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.3 Final Counts

final_counts_filtering(colon_asv_tab,
                       filt_colon_asv_tab,
                       filt_colon_metadata,
                       seq_step, 0, nearzero_step)
                                               V1
Raw data: ASVs                               6936
Raw data: Samples                             789
Sequencing depth filt: ASVs                  6861
Prevalence filt: ASVs                           0
NearZeroVar filt: ASVs                        381
Filt data: ASVs                               381
Filt data: Samples                            760
Filt data: Patients                           354
Filt data: Patients.1                           0
Filtered samples                               29
Matrices                    Cecum;Rectum;CD;CA;SI
healthy                                       161
non-rPSC                                        0
rPSC                                            0
pre_ltx                                       250
post_ltx                                      349
ETOH                                            0

3.2 Alpha diversity

Calculating Richness, Shannon, Simpson, Pielou indexes on raw (unfiltered) rarefied data. Samples with sequencing depth < 10000 were excluded. Testing by linear mixed-effect model.

path = "../../../results/merged_sites/main_analysis/Q1/alpha_diversity"

Calculation

# Construct MPSE object
alpha_colon_metadata$Sample <- alpha_colon_metadata$SampleID
colon_mpse <- as.MPSE(construct_phyloseq(alpha_colon_asv_tab,
                                         alpha_colon_taxa_tab,
                                         alpha_colon_metadata))

colon_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)

# Calculate alpha diversity - rarefied counts
colon_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)
alpha_data <- data.frame(SampleID=colon_mpse$Sample.x,
                         Observe=colon_mpse$Observe,
                         Shannon=colon_mpse$Shannon,
                         Simpson=colon_mpse$Simpson,
                         Pielou=colon_mpse$Pielou,
                         Group=colon_mpse$Group,
                         Country=colon_mpse$Country,
                         Patient=colon_mpse$Patient)

write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
          row.names = FALSE)

3.2.1 Plots

p_boxplot_alpha <- alpha_diversity_countries(alpha_data)
Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha

# see the results
p_boxplot_alpha

3.2.2 Linear Model

path = "../../../results/merged_sites/main_analysis/Q1/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))

Richness

results_model <- pairwise.lmer(
  formula = "Observe ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_observe <- results_model[[1]]
  results_model_observe_detailed <- results_model[[2]]
} else {
  results_model_observe <- results_model
  results_model_observe_detailed <- NA
}

# save the results
pc_observed[[segment]] <- results_model_observe
# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")
Raw results of linear model of richness estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -56.230 11.738 195.570 -4.791 0.000 0.000 ***
healthy vs pre_ltx - CZ vs NO 14.193 10.341 194.538 1.372 0.171 0.256
healthy vs Grouppre_ltx:CountryNO -7.489 15.169 194.358 -0.494 0.622 0.622
pre_ltx vs Grouppost_ltx 28.886 11.811 267.502 2.446 0.015 0.045 *
pre_ltx vs post_ltx - CZ vs NO 6.742 12.174 265.784 0.554 0.580 0.622
pre_ltx vs Grouppost_ltx:CountryNO -24.047 15.112 262.828 -1.591 0.113 0.203
healthy vs Grouppost_ltx -27.299 8.699 250.458 -3.138 0.002 0.009 **
healthy vs post_ltx - CZ vs NO 14.360 11.153 251.729 1.288 0.199 0.256
healthy vs Grouppost_ltx:CountryNO -31.681 14.195 247.531 -2.232 0.027 0.060
knitr::kable(results_model_observe_detailed,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 171.780 7.001 162.548 24.535 0.000 NA
Grouppost_ltx -27.276 8.558 161.345 -3.187 0.002 0.003
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_NO 186.110 8.829 88.980 21.080 0 NA
Grouppost_ltx -58.943 11.539 87.242 -5.108 0 0
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 171.793 6.342 89.560 27.090 0.000 NA
CountryNO 14.358 9.941 89.408 1.444 0.152 0.152
Estimate Std. Error df t value Pr(>|t|) p.adjusted
post_ltx_CZ 144.490 5.283 158.786 27.352 0.000 NA
CountryNO -17.319 9.272 154.970 -1.868 0.064 0.085

Shannon

results_model <- pairwise.lmer(
  formula = "Shannon ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_shannon <- results_model[[1]]
  results_model_shannon_detailed <- results_model[[2]]
} else {
  results_model_shannon <- results_model
  results_model_shannon_detailed <- NA
}

# save the results
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)
# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")
Raw results of linear model of Shannon estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.585 0.143 197.588 -4.084 0.000 0.001 ***
healthy vs pre_ltx - CZ vs NO 0.015 0.126 196.213 0.116 0.907 0.909
healthy vs Grouppre_ltx:CountryNO 0.081 0.185 195.965 0.437 0.663 0.852
pre_ltx vs Grouppost_ltx 0.364 0.158 265.003 2.307 0.022 0.056
pre_ltx vs post_ltx - CZ vs NO 0.095 0.163 263.448 0.585 0.559 0.839
pre_ltx vs Grouppost_ltx:CountryNO -0.455 0.202 260.765 -2.253 0.025 0.056
healthy vs Grouppost_ltx -0.220 0.102 247.586 -2.153 0.032 0.058
healthy vs post_ltx - CZ vs NO 0.015 0.131 248.759 0.114 0.909 0.909
healthy vs Grouppost_ltx:CountryNO -0.376 0.167 244.904 -2.255 0.025 0.056
knitr::kable(results_model_shannon_detailed,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
Estimate Std. Error df t value Pr(>|t|) p.adjusted
pre_ltx_CZ 3.102 0.144 131.105 21.535 0.000 NA
Grouppost_ltx 0.363 0.160 130.742 2.275 0.025 0.049
Estimate Std. Error df t value Pr(>|t|) p.adjusted
pre_ltx_NO 3.197 0.078 129.531 41.004 0.000 NA
Grouppost_ltx -0.091 0.125 129.050 -0.732 0.466 0.563
Estimate Std. Error df t value Pr(>|t|) p.adjusted
pre_ltx_CZ 3.101 0.143 105.863 21.629 0.000 NA
CountryNO 0.095 0.164 104.934 0.580 0.563 0.563
Estimate Std. Error df t value Pr(>|t|) p.adjusted
post_ltx_CZ 3.466 0.068 156.394 51.151 0.000 NA
CountryNO -0.360 0.119 153.436 -3.025 0.003 0.012
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 3.686 0.082 159.084 44.805 0.00 NA
Grouppost_ltx -0.220 0.101 158.215 -2.190 0.03 0.04
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_NO 3.701 0.104 90.164 35.654 0 NA
Grouppost_ltx -0.596 0.136 88.069 -4.396 0 0
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 3.685 0.054 89.090 68.738 0.000 NA
CountryNO 0.017 0.084 88.849 0.205 0.838 0.838
Estimate Std. Error df t value Pr(>|t|) p.adjusted
post_ltx_CZ 3.466 0.068 156.394 51.151 0.000 NA
CountryNO -0.360 0.119 153.436 -3.025 0.003 0.006

Simpson

results_model <- pairwise.lmer(
  formula = "Simpson ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_simpson <- results_model[[1]]
  results_model_simpson_detailed <- results_model[[2]]
} else {
  results_model_simpson <- results_model
  results_model_simpson_detailed <- NA
}

# save the results
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)
# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")
Raw results of linear model of Simpson estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.059 0.019 199.853 -3.066 0.002 0.022 *
healthy vs pre_ltx - CZ vs NO -0.007 0.017 198.127 -0.430 0.667 0.738
healthy vs Grouppre_ltx:CountryNO 0.020 0.025 197.803 0.813 0.417 0.626
pre_ltx vs Grouppost_ltx 0.031 0.026 263.530 1.213 0.226 0.455
pre_ltx vs post_ltx - CZ vs NO 0.013 0.027 261.964 0.476 0.635 0.738
pre_ltx vs Grouppost_ltx:CountryNO -0.052 0.033 259.262 -1.556 0.121 0.363
healthy vs Grouppost_ltx -0.027 0.017 246.082 -1.594 0.112 0.363
healthy vs post_ltx - CZ vs NO -0.007 0.022 247.135 -0.335 0.738 0.738
healthy vs Grouppost_ltx:CountryNO -0.032 0.028 243.697 -1.147 0.253 0.455
knitr::kable(results_model_simpson_detailed,digits = 3,
caption = "Raw results of independent country analysis")

Table: Raw results of independent country analysis

Pielou

results_model <- pairwise.lmer(
  formula = "Pielou ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_pielou <- results_model[[1]]
  results_model_pielou_detailed <- results_model[[2]]
} else {
  results_model_pielou <- results_model
  results_model_pielou_detailed <- NA
}

# save the results
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)
# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")
Raw results of linear model of Pielou estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.057 0.020 201.130 -2.828 0.005 0.046 *
healthy vs pre_ltx - CZ vs NO -0.008 0.018 199.017 -0.446 0.656 0.781
healthy vs Grouppre_ltx:CountryNO 0.015 0.026 198.602 0.565 0.573 0.781
pre_ltx vs Grouppost_ltx 0.039 0.024 265.797 1.631 0.104 0.234
pre_ltx vs post_ltx - CZ vs NO 0.007 0.025 263.735 0.274 0.785 0.785
pre_ltx vs Grouppost_ltx:CountryNO -0.062 0.031 260.213 -2.015 0.045 0.202
healthy vs Grouppost_ltx -0.018 0.016 247.076 -1.130 0.260 0.467
healthy vs post_ltx - CZ vs NO -0.008 0.020 248.451 -0.394 0.694 0.781
healthy vs Grouppost_ltx:CountryNO -0.047 0.026 243.881 -1.828 0.069 0.206
knitr::kable(results_model_pielou_detailed,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
Estimate Std. Error df t value Pr(>|t|) p.adjusted
pre_ltx_CZ 0.664 0.022 130.842 29.916 0.000 NA
Grouppost_ltx 0.039 0.025 130.452 1.582 0.116 0.232
Estimate Std. Error df t value Pr(>|t|) p.adjusted
pre_ltx_NO 0.670 0.012 129.502 57.528 0.000 NA
Grouppost_ltx -0.023 0.019 128.819 -1.219 0.225 0.3
Estimate Std. Error df t value Pr(>|t|) p.adjusted
pre_ltx_CZ 0.664 0.020 107.760 32.578 0.000 NA
CountryNO 0.007 0.023 106.251 0.291 0.772 0.772
Estimate Std. Error df t value Pr(>|t|) p.adjusted
post_ltx_CZ 0.703 0.011 155.706 65.557 0.000 NA
CountryNO -0.055 0.019 152.292 -2.919 0.004 0.016
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 0.721 0.013 158.287 55.872 0.000 NA
Grouppost_ltx -0.018 0.016 157.427 -1.146 0.254 0.338
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_NO 0.713 0.016 92.108 44.931 0.000 NA
Grouppost_ltx -0.065 0.021 89.257 -3.151 0.002 0.008
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 0.720 0.007 89.312 100.415 0.000 NA
CountryNO -0.008 0.011 88.968 -0.675 0.502 0.502
Estimate Std. Error df t value Pr(>|t|) p.adjusted
post_ltx_CZ 0.703 0.011 155.706 65.557 0.000 NA
CountryNO -0.055 0.019 152.292 -2.919 0.004 0.008

3.2.3 Saving results

alpha_list <- list(
  Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
  Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
  Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
  Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
                   
write.xlsx(alpha_list, 
           file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))

3.3 Beta diversity

Calculating Aitchison distance (euclidean distance on clr-transformed data) at genus level (main analysis). In supplements, there are also bray-curtis and Jaccard distance at ASV and genus levels. Testing by PERMANOVA.

3.3.1 Main analysis

Genus level, Aitchison distance

level="genus"
path = "../../../results/merged_sites/main_analysis/Q1/beta_diversity"

Aggregation, filtering

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)

filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             colon_metadata,
                            seq_depth_threshold=10000)
Removing 10 ASV(s)
filt_colon_genus_tab <- filt_data[[1]]
filt_colon_genus_taxa <- filt_data[[2]]
filt_colon_genus_metadata <- filt_data[[3]]

3.3.2 PERMANOVA

pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
                           covariate = filt_colon_genus_metadata$Country, 
                           patients = filt_colon_genus_metadata$Patient,
                           sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
                          covariate = filt_colon_genus_metadata$Country, 
                          interaction = TRUE, 
                          patients = filt_colon_genus_metadata$Patient,
                          sim.method = "robust.aitchison", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <-rbind(pp_factor,pp_cov,pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 1222.921 7.950 0.019 0.001 0.002 **
pre_ltx vs post_ltx 1 630.146 4.062 0.007 0.008 0.008 **
post_ltx vs healthy 1 1736.169 11.014 0.021 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy , Country 1 738.576 4.801 0.011 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 1532.675 9.879 0.016 0.001 0.001 ***
post_ltx vs healthy , Country 1 1728.746 10.966 0.021 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy : Country 1 318.719 2.077 0.005 0.325 0.488
pre_ltx vs post_ltx : Country 1 322.405 2.082 0.003 0.524 0.524
post_ltx vs healthy : Country 1 398.820 2.538 0.005 0.112 0.336

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.1]

if (length(interaction_sig)>0){
 for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_genus_metadata$Group,
                      covariate = filt_colon_genus_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      patients = filt_colon_genus_metadata$Patient)
  print(result_list)
} 
}
3.3.2.0.1 Plots
p <- pca_plot_custom(filt_colon_genus_tab,
                                 filt_colon_genus_taxa,
                                 filt_colon_genus_metadata,
                                 show_boxplots = TRUE,
                                 variable = "Group", size=2, 
                                 show_legend= FALSE)

# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p

# see the results
p

3.3.2.1 Saving results

write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]], 
           file = file.path(path,
           paste0("beta_diversity_results_", segment,".xlsx")))

3.4 Univariate Analysis

Using two DAA tools - linDA and Maaslin2. Only the intersection of sets selected by each tool was shown to minimize false positives. Taxa with a significant interaction effect were excluded based on individual post-hoc analysis of the Czech and Norwegian cohorts. Only taxa with significant log fold change that showed the same change direction in both countries were retained.

Main analysis is performed at genus level. In supplementary analysis, it is performed at ASV and phylum levels.

3.4.1 Main analysis

level="genus"
# needed paths
path = "../../../results/merged_sites/main_analysis/Q1/univariate_analysis"
path_maaslin=file.path("../../../intermediate_files/maaslin/Q1",level)

# variables
raw_linda_results_genus[[segment]] <- list()
linda_results_genus[[segment]] <- list()

# country and interaction problems
list_country_union <- list()
list_venns <- list()

# workbook for final df
wb <- createWorkbook()

# PSC effect
psc_effect <- list()

3.4.1.1 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

colon_genus_asv_taxa_tab <- create_asv_taxa_table(colon_genus_tab,
                                                  colon_genus_taxa_tab)
3.4.1.1.1 pre_ltx vs healthy
3.4.1.1.1.1 linDA
group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 135 ASV(s)
Removing 10 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  411  samples and  148  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")


for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.1.1.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

3.4.1.1.1.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.1.1.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
3.4.1.1.1.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.1.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
3.4.1.1.2.1 linDA
# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 17 ASV(s)
Removing 10 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  600  samples and  130  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.1.2.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

3.4.1.1.2.3 Group - Intersection
intersection_results <- group_intersection(group, 
                                           list_intersections, 
                                           list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.1.2.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
3.4.1.1.2.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.1.1.3 post_ltx vs healthy
group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
3.4.1.1.3.1 linDA
# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,group,
                            usage="linDA")
Removing 71 ASV(s)
Removing 5 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  511  samples and  153  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.1.3.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("Country effect")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

3.4.1.1.3.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.1.3.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
3.4.1.1.3.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.1.1.4 Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$colon[grepl(level,names(uni_statistics$colon))],
                                      colon_taxa_tab) + xlab("") + ylab("")
min_clr -2.221209 
max_clr 6.886638 
min_log -3.771476 
max_log 5.049706 
dotheatmap_linda

Horizontal bar plot

p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))
Using SeqID as id variables
p_prevalence_final <- ggarrange(p_prevalence,
                                ggplot() + theme_minimal(),
                                nrow = 2,heights = c(1,0.09))
p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),p_prevalence_final,ncol=2,widths = c(1,0.3))

p

dot_heatmap_colon <- p
3.4.1.1.5 PSC effect

To get only PSC associated taxa, we can intersect differentially abundant taxa from the previous analyses (see diagram below).

pre_LTx vs HC and Post_LTx vs HC intersection

imagename
A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)

psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
  
# see the results
psc_effect[[paste(segment,level)]] 
                           SeqID
3                         Rothia
15                   Barnesiella
18                 Butyricimonas
19                   Odoribacter
27                     Alistipes
28               Parabacteroides
39                    Holdemania
57                   Coprococcus
61              Fusicatenibacter
64                    Hungatella
65             Lachnoclostridium
67  Lachnospiraceae_FCS020_group
89              Colidextribacter
91                Intestinimonas
93                 Oscillibacter
109             Negativibacillus
122     Family_XIII_AD3011_group
133                    Dialister
135                  Veillonella
143                 Enterococcus
146                   Klebsiella
148                  Pseudomonas
                                                                                                                          Taxonomy
3                                   k__Bacteria;p__Actinobacteriota;c__Actinobacteria;o__Micrococcales;f__Micrococcaceae;g__Rothia
15                                   k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Barnesiellaceae;g__Barnesiella
18                                  k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Butyricimonas
19                                    k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Odoribacter
27                                       k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes
28                                k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides
39                                 k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Holdemania
57                                     k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus
61                                k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter
64                                      k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Hungatella
65                               k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium
67                    k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_FCS020_group
89                              k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Colidextribacter
91                                k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Intestinimonas
93                                 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter
109                              k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Negativibacillus
122 k__Bacteria;p__Firmicutes;c__Clostridia;o__Peptostreptococcales-Tissierellales;f__Anaerovoracaceae;g__Family_XIII_AD3011_group
133                   k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Dialister
135                 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella
143                                     k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus
146                   k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Klebsiella
148                     k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas
    log2FoldChange      p_value         padj
3         1.842842 1.452081e-05 1.953709e-04
15       -3.590912 7.871469e-05 6.852808e-04
18       -3.002837 3.969499e-06 7.343572e-05
19       -3.406707 3.432387e-07 1.693311e-05
27       -3.110558 1.594988e-05 1.967152e-04
28       -3.568149 2.950689e-06 6.238599e-05
39       -1.174869 7.087889e-03 2.439553e-02
57       -3.357534 5.546126e-06 8.208266e-05
61       -1.917547 1.097200e-02 3.455012e-02
64        2.775798 1.422148e-04 1.107779e-03
65       -1.736481 1.812625e-03 8.383390e-03
67       -2.111851 1.228192e-03 6.268013e-03
89       -1.624469 8.895010e-03 2.991958e-02
91       -1.592037 6.335927e-03 2.404403e-02
93       -2.063990 9.829922e-04 5.538365e-03
109      -1.649295 1.442001e-02 4.268322e-02
122      -1.590837 1.010377e-03 5.538365e-03
133       2.144460 1.055678e-02 3.396528e-02
135       3.864870 1.310501e-07 9.697706e-06
143       5.049706 2.872348e-10 4.251075e-08
146       2.288824 1.471023e-03 7.257045e-03
148       1.318798 5.015789e-03 2.006316e-02

3.4.1.2 Saving results

# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
             overwrite = TRUE)

# PSC effect
write.xlsx(psc_effect[[paste(segment,level)]],file.path(path,paste0("psc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa

write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("significant_taxa_",segment,".xlsx")))

3.5 Machine learning

Binary classification was performed using four independent models: Elastic Net (ENET), RF (RF), Gradient Boosting (GBoost), K-nearest Neighbors (kNN). Training and validation were performed via bootstrapping (N=500) on clr-transformed data. Model performance metrics expressed as AUC were calculated based on an out-of-sample principle.

The supplementary analysis contains training and validating at clr-transformed data at ASV level and relative-abundance data at ASV and genus level.

path = "../../../results/merged_sites/main_analysis/Q1/models"

3.5.1 ENET

model="enet"

3.5.1.1 ASV level

level="ASV"
3.5.1.1.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           0.200000000
lambda                          0.007285712
auc                             1.000000000
auc_czech                       1.000000000
auc_no                          1.000000000
auc_optimism_corrected          0.944482930
auc_optimism_corrected_CIL      0.879483337
auc_optimism_corrected_CIU      0.986503241
accuracy                        1.000000000
accuracy_czech                          NaN
accuracy_no                     1.000000000
accuracy_optimism_corrected     0.872590475
accuracy_optimism_corrected_CIL 0.773863933
accuracy_optimism_corrected_CIU 0.937692308
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 161   0
   1   0 250

$czech
    Predicted
True  0  1
   0 95  0
   1  0 42

$no
    Predicted
True   0   1
   0  66   0
   1   0 208
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

3.5.1.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, usage="ml_clr",
                                     patient = TRUE)
Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           1.000000000
lambda                          0.004424186
auc                             0.999667622
auc_czech                       0.999656593
auc_no                          0.999522640
auc_optimism_corrected          0.788320408
auc_optimism_corrected_CIL      0.715647773
auc_optimism_corrected_CIU      0.842743983
accuracy                        0.993322204
accuracy_czech                          NaN
accuracy_no                     0.994269341
accuracy_optimism_corrected     0.725877580
accuracy_optimism_corrected_CIL 0.657080387
accuracy_optimism_corrected_CIU 0.769266055
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 347   2
   1   2 248

$czech
    Predicted
True   0   1
   0 207   1
   1   1  41

$no
    Predicted
True   0   1
   0 140   1
   1   1 207
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

3.5.1.1.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
                                     colon_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=10,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.40000000
lambda                          0.00203662
auc                             1.00000000
auc_czech                       1.00000000
auc_no                          1.00000000
auc_optimism_corrected          0.96930614
auc_optimism_corrected_CIL      0.93705851
auc_optimism_corrected_CIU      0.98669590
accuracy                        1.00000000
accuracy_czech                         NaN
accuracy_no                     1.00000000
accuracy_optimism_corrected     0.91353052
accuracy_optimism_corrected_CIL 0.86451155
accuracy_optimism_corrected_CIU 0.95141794
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 161   0
   1   0 349

$czech
    Predicted
True   0   1
   0  95   0
   1   0 208

$no
    Predicted
True   0   1
   0  66   0
   1   0 141
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

3.5.1.2 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
3.5.1.2.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.00000000
lambda                          0.04262835
auc                             0.99955280
auc_czech                       0.99974937
auc_no                          0.99927156
auc_optimism_corrected          0.93420491
auc_optimism_corrected_CIL      0.88169929
auc_optimism_corrected_CIU      0.97185321
accuracy                        0.98783455
accuracy_czech                         NaN
accuracy_no                     0.98905109
accuracy_optimism_corrected     0.85834482
accuracy_optimism_corrected_CIL 0.79082492
accuracy_optimism_corrected_CIU 0.92284119
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 159   2
   1   3 247

$czech
    Predicted
True  0  1
   0 94  1
   1  1 41

$no
    Predicted
True   0   1
   0  65   1
   1   2 206
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

3.5.1.2.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.00000000
lambda                          0.05181695
auc                             0.96996571
auc_czech                       0.95887446
auc_no                          0.96447081
auc_optimism_corrected          0.75746917
auc_optimism_corrected_CIL      0.67636751
auc_optimism_corrected_CIU      0.83310363
accuracy                        0.91000000
accuracy_czech                         NaN
accuracy_no                     0.90544413
accuracy_optimism_corrected     0.69549915
accuracy_optimism_corrected_CIL 0.61504214
accuracy_optimism_corrected_CIU 0.77137530
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 325  25
   1  29 221

$czech
    Predicted
True   0   1
   0 202   7
   1  14  28

$no
    Predicted
True   0   1
   0 123  18
   1  15 193
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

3.5.1.2.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group",
                              N=500,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q1")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                       [,1]
alpha                           0.600000000
lambda                          0.002245931
auc                             1.000000000
auc_czech                       1.000000000
auc_no                          1.000000000
auc_optimism_corrected          0.961311372
auc_optimism_corrected_CIL      0.923354885
auc_optimism_corrected_CIU      0.985361058
accuracy                        1.000000000
accuracy_czech                          NaN
accuracy_no                     1.000000000
accuracy_optimism_corrected     0.892756667
accuracy_optimism_corrected_CIL 0.838856178
accuracy_optimism_corrected_CIU 0.940785776
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 161   0
   1   0 350

$czech
    Predicted
True   0   1
   0  95   0
   1   0 209

$no
    Predicted
True   0   1
   0  66   0
   1   0 141
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))
[1] "p_value: 0"
roc_c

3.5.1.3 Saving results

models_summ_df_colon <- do.call(rbind, 
  models_summ[grep(segment,names(models_summ),value = TRUE)])

write.csv(models_summ_df_colon,file.path(path,paste0("elastic_net_",segment,".csv")))

3.6 Results overview

3.6.0.1 Alpha diversity

knitr::kable(pc_observed[[segment]],
             digits = 3,
             caption = "Results of linear model testing ASV Richness")
Results of linear model testing ASV Richness
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -56.230 11.738 195.570 -4.791 0.000 0.000 ***
healthy vs pre_ltx - CZ vs NO 14.193 10.341 194.538 1.372 0.171 0.256
healthy vs Grouppre_ltx:CountryNO -7.489 15.169 194.358 -0.494 0.622 0.622
pre_ltx vs Grouppost_ltx 28.886 11.811 267.502 2.446 0.015 0.045 *
pre_ltx vs post_ltx - CZ vs NO 6.742 12.174 265.784 0.554 0.580 0.622
pre_ltx vs Grouppost_ltx:CountryNO -24.047 15.112 262.828 -1.591 0.113 0.203
healthy vs Grouppost_ltx -27.299 8.699 250.458 -3.138 0.002 0.009 **
healthy vs post_ltx - CZ vs NO 14.360 11.153 251.729 1.288 0.199 0.256
healthy vs Grouppost_ltx:CountryNO -31.681 14.195 247.531 -2.232 0.027 0.060
knitr::kable(pc_shannon[[segment]],
             digits = 3,
             caption = "Results of linear model testing Shannon index")
Results of linear model testing Shannon index
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.585 0.143 197.588 -4.084 0.000 0.001 ***
healthy vs pre_ltx - CZ vs NO 0.015 0.126 196.213 0.116 0.907 0.909
healthy vs Grouppre_ltx:CountryNO 0.081 0.185 195.965 0.437 0.663 0.852
pre_ltx vs Grouppost_ltx 0.364 0.158 265.003 2.307 0.022 0.056
pre_ltx vs post_ltx - CZ vs NO 0.095 0.163 263.448 0.585 0.559 0.839
pre_ltx vs Grouppost_ltx:CountryNO -0.455 0.202 260.765 -2.253 0.025 0.056
healthy vs Grouppost_ltx -0.220 0.102 247.586 -2.153 0.032 0.058
healthy vs post_ltx - CZ vs NO 0.015 0.131 248.759 0.114 0.909 0.909
healthy vs Grouppost_ltx:CountryNO -0.376 0.167 244.904 -2.255 0.025 0.056
knitr::kable(pc_simpson[[segment]],
             digits = 3,
             caption = "Results of linear model testing Simpson index")
Results of linear model testing Simpson index
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.059 0.019 199.853 -3.066 0.002 0.022 *
healthy vs pre_ltx - CZ vs NO -0.007 0.017 198.127 -0.430 0.667 0.738
healthy vs Grouppre_ltx:CountryNO 0.020 0.025 197.803 0.813 0.417 0.626
pre_ltx vs Grouppost_ltx 0.031 0.026 263.530 1.213 0.226 0.455
pre_ltx vs post_ltx - CZ vs NO 0.013 0.027 261.964 0.476 0.635 0.738
pre_ltx vs Grouppost_ltx:CountryNO -0.052 0.033 259.262 -1.556 0.121 0.363
healthy vs Grouppost_ltx -0.027 0.017 246.082 -1.594 0.112 0.363
healthy vs post_ltx - CZ vs NO -0.007 0.022 247.135 -0.335 0.738 0.738
healthy vs Grouppost_ltx:CountryNO -0.032 0.028 243.697 -1.147 0.253 0.455
knitr::kable(pc_pielou[[segment]],
             digits = 3,
             caption = "Results of linear model testing Pielou index")
Results of linear model testing Pielou index
Estimate Std..Error df t.value Pr…t.. p.adj sig
healthy vs Grouppre_ltx -0.057 0.020 201.130 -2.828 0.005 0.046 *
healthy vs pre_ltx - CZ vs NO -0.008 0.018 199.017 -0.446 0.656 0.781
healthy vs Grouppre_ltx:CountryNO 0.015 0.026 198.602 0.565 0.573 0.781
pre_ltx vs Grouppost_ltx 0.039 0.024 265.797 1.631 0.104 0.234
pre_ltx vs post_ltx - CZ vs NO 0.007 0.025 263.735 0.274 0.785 0.785
pre_ltx vs Grouppost_ltx:CountryNO -0.062 0.031 260.213 -2.015 0.045 0.202
healthy vs Grouppost_ltx -0.018 0.016 247.076 -1.130 0.260 0.467
healthy vs post_ltx - CZ vs NO -0.008 0.020 248.451 -0.394 0.694 0.781
healthy vs Grouppost_ltx:CountryNO -0.047 0.026 243.881 -1.828 0.069 0.206

Plots

alpha_div_plots[[paste(segment,"Country")]]

3.6.0.2 Beta diversity

Main analysis

knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
             digits = 3,
             caption = "Results of PERMANOVA - robust aitchison distance")
Results of PERMANOVA - robust aitchison distance
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
pre_ltx vs healthy 1 1222.921 7.950 0.019 0.001 0.002 **
pre_ltx vs post_ltx 1 630.146 4.062 0.007 0.008 0.008 **
post_ltx vs healthy 1 1736.169 11.014 0.021 0.001 0.002 **
pre_ltx vs healthy , Country 1 738.576 4.801 0.011 0.001 0.001 ***
pre_ltx vs post_ltx , Country 1 1532.675 9.879 0.016 0.001 0.001 ***
post_ltx vs healthy , Country 1 1728.746 10.966 0.021 0.001 0.001 ***
pre_ltx vs healthy : Country 1 318.719 2.077 0.005 0.325 0.488
pre_ltx vs post_ltx : Country 1 322.405 2.082 0.003 0.524 0.524
post_ltx vs healthy : Country 1 398.820 2.538 0.005 0.112 0.336

PCA

pca_plots_list[[paste(segment,"genus custom")]]

3.6.0.3 Univariate analysis

Number of significant taxa

knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
      as.data.frame(lapply(psc_effect,nrow))) %>% t() %>% 
  `colnames<-`("Count") %>% 
  `rownames<-`(c(names(list_intersections),"PSC effect Genus")),caption="Number of significant taxa")
Number of significant taxa
Count
terminal_ileum genus healthy vs pre_ltx 25
terminal_ileum genus pre_ltx vs post_ltx 4
terminal_ileum genus healthy vs post_ltx 38
colon genus healthy vs pre_ltx 33
colon genus pre_ltx vs post_ltx 4
colon genus healthy vs post_ltx 41
PSC effect Genus 22

3.6.0.4 Machine learning

Main models

knitr::kable(models_summ_df_colon %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
             digits=3,caption="Elastic net results")
Elastic net results
alpha lambda auc_optimism_corrected auc_optimism_corrected_CIL auc_optimism_corrected_CIU
pre_ltx vs healthy ASV colon 0.2 0.007 0.944 0.879 0.987
pre_ltx vs post_ltx ASV colon 1.0 0.004 0.788 0.716 0.843
post_ltx vs healthy ASV colon 0.4 0.002 0.969 0.937 0.987
pre_ltx vs healthy genus colon 0.0 0.043 0.934 0.882 0.972
pre_ltx vs post_ltx genus colon 0.0 0.052 0.757 0.676 0.833
post_ltx vs healthy genus colon 0.6 0.002 0.961 0.923 0.985

ROC - Genus level

p <- roc_curve_all_custom(roc_cs[c(10:12)],Q="Q1",
                     model_name="enet_model")

p

4 Paper-ready visualizations

Alpha diversity

p_A <- alpha_div_plots$`terminal_ileum Country` +
  ggtitle("Terminal ileum")+
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) 

p_B <-  alpha_div_plots$`colon Country` +
  ggtitle("Colon") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) 

Q1_alpha <- ggarrange(p_A,ggplot() + theme_minimal(),p_B,nrow=1, ncol=3,widths = c(1,0.1,1))

Q1_alpha

Beta diversity

pca_ti <- pca_plots_list$`terminal_ileum genus custom` 
pca_colon <- pca_plots_list$`colon genus custom` 

genus_Q1_beta <- ggarrange(pca_ti,
                           ggplot() + theme_minimal(),
                           pca_colon,ncol=3,
                           widths = c(1,0.1,1))
genus_Q1_beta

Alpha + Beta diversity

alpha_beta <- ggarrange(Q1_alpha,genus_Q1_beta,
                        ncol = 1,nrow=2,labels = c("A","B"))
alpha_beta

Elastic net

Genus level

#models_to_plot <- c("knn_model","rf_model","gbm_model","enet_model")
#names(models_to_plot) <- c("kNN","RF","GBoost","ENET")

models_to_plot <- c("enet_model")
names(models_to_plot) <- c("ENET")
rocs_list <- list()
rocs_list[["enet_model"]] <- roc_cs

# ILEUM
plot_list_ileum <- list()
for (model_name in models_to_plot) {
  plot_list_ileum[[model_name]] <- 
    roc_curve_all_custom(rocs_list[[model_name]][c(4:6)],
                       Q="Q1",
                       model_name=model_name,legend = FALSE) + 
    #ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) + 
    theme(plot.title = element_text(face = "bold",size = 10)) 
}

roc_curve_ti <- ggarrange(plotlist = plot_list_ileum)

# COLON

plot_list_colon <- list()
for (model_name in models_to_plot) {
  plot_list_colon[[model_name]] <- 
    roc_curve_all_custom(rocs_list[[model_name]][c(10:12)],
                       Q="Q1",
                       model_name=model_name,legend = FALSE) + 
    #ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) + 
    theme(plot.title = element_text(face = "bold",size = 10))   
}

roc_curve_colon <- ggarrange(plotlist = plot_list_colon)


roc_curve_plot <- ggarrange(roc_curve_ti,
                            ggplot() + theme_minimal(),
                            roc_curve_colon,
                            ncol=3, widths = c(1,0.1,1))
roc_curve_plot

4.1 FIGURE 2

alpha_beta_elastic <- ggarrange(Q1_alpha,genus_Q1_beta,roc_curve_plot,
                        ncol = 1,nrow=3,labels = LETTERS,heights = c(1,1,0.8))

alpha_beta_elastic <- ggarrange(alpha_beta_elastic,ggplot() + theme_minimal(),ncol=2,
                                widths = c(1,0.15))

alpha_beta_elastic

4.2 FIGURE 3

p_ileum <- dot_heatmap_ileum + 
  ggtitle("Terminal ileum") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
        legend.position = "none")


p_colon <- dot_heatmap_colon  +
  ggtitle("Colon") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
        legend.position = "none")

heatmap_plot <- ggarrange(p_ileum,
                          p_colon,
                          ncol = 2,labels=c("A","B"))

heatmap_plot 

5 Session info

sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=Czech_Czechia.utf8  LC_CTYPE=Czech_Czechia.utf8   
[3] LC_MONETARY=Czech_Czechia.utf8 LC_NUMERIC=C                  
[5] LC_TIME=Czech_Czechia.utf8    

time zone: Europe/Prague
tzcode source: internal

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] mice_3.17.0              picante_1.8.2            ape_5.8-1               
 [4] lubridate_1.9.3          forcats_1.0.0            stringr_1.5.1           
 [7] tidyverse_2.0.0          kableExtra_1.4.0         tidyr_1.3.1             
[10] gbm_2.2.2                doParallel_1.0.17        iterators_1.0.14        
[13] foreach_1.5.2            ranger_0.17.0            ggvenn_0.1.15           
[16] Maaslin2_1.16.0          purrr_1.0.2              pROC_1.18.5             
[19] glmnet_4.1-8             MicrobiomeStat_1.2       caret_6.0-94            
[22] openxlsx_4.2.8           magrittr_2.0.3           emmeans_1.11.2-8        
[25] lmerTest_3.1-3           robustlmm_3.3-1          lme4_1.1-37             
[28] Matrix_1.6-5             mgcv_1.9-1               nlme_3.1-164            
[31] pheatmap_1.0.13          reshape2_1.4.4           vegan_2.6-10            
[34] lattice_0.22-6           permute_0.9-8            ggplotify_0.1.2         
[37] ggrepel_0.9.6            ggpubr_0.6.1             MicrobiotaProcess_1.14.1
[40] phyloseq_1.46.0          ggplot2_3.5.2            tibble_3.2.1            
[43] dplyr_1.1.4              cowplot_1.2.0            readr_2.1.5             
[46] igraph_2.1.4             ccrepe_1.38.1            data.table_1.15.4       

loaded via a namespace (and not attached):
  [1] fs_1.6.5                    matrixStats_1.5.0          
  [3] bitops_1.0-9                RColorBrewer_1.1-3         
  [5] numDeriv_2016.8-1.1         tools_4.3.1                
  [7] backports_1.4.1             R6_2.6.1                   
  [9] lazyeval_0.2.2              jomo_2.7-6                 
 [11] rhdf5filters_1.14.1         withr_3.0.2                
 [13] gridExtra_2.3               cli_3.6.2                  
 [15] Biobase_2.62.0              logging_0.10-108           
 [17] biglm_0.9-3                 sandwich_3.1-1             
 [19] labeling_0.4.3              mvtnorm_1.3-3              
 [21] robustbase_0.99-4-1         pbapply_1.7-4              
 [23] systemfonts_1.2.2           yulab.utils_0.2.1          
 [25] svglite_2.1.3               parallelly_1.38.0          
 [27] rstudioapi_0.17.1           generics_0.1.4             
 [29] gridGraphics_0.5-1          shape_1.4.6.1              
 [31] car_3.1-3                   zip_2.3.2                  
 [33] biomformat_1.30.0           S4Vectors_0.40.2           
 [35] abind_1.4-8                 infotheo_1.2.0.1           
 [37] lifecycle_1.0.4             multcomp_1.4-28            
 [39] yaml_2.3.10                 carData_3.0-5              
 [41] SummarizedExperiment_1.32.0 Rtsne_0.17                 
 [43] rhdf5_2.46.1                recipes_1.3.1              
 [45] SparseArray_1.2.4           grid_4.3.1                 
 [47] mitml_0.4-5                 crayon_1.5.3               
 [49] pillar_1.11.0               knitr_1.50                 
 [51] optparse_1.7.5              GenomicRanges_1.54.1       
 [53] statip_0.2.3                boot_1.3-32                
 [55] estimability_1.5.1          future.apply_1.11.3        
 [57] codetools_0.2-20            pan_1.9                    
 [59] glue_1.7.0                  ggfun_0.2.0                
 [61] vctrs_0.6.5                 treeio_1.26.0              
 [63] Rdpack_2.6.4                gtable_0.3.6               
 [65] maditr_0.8.6                gower_1.0.1                
 [67] xfun_0.52                   rbibutils_2.3              
 [69] S4Arrays_1.2.1              prodlim_2024.06.25         
 [71] libcoin_1.0-10              coda_0.19-4.1              
 [73] reformulas_0.4.1            pcaPP_2.0-5                
 [75] modeest_2.4.0               survival_3.5-8             
 [77] timeDate_4041.110           hardhat_1.4.2              
 [79] lava_1.8.1                  statmod_1.5.0              
 [81] TH.data_1.1-4               ipred_0.9-15               
 [83] ggtree_3.10.1               GenomeInfoDb_1.38.8        
 [85] fBasics_4041.97             rpart_4.1.23               
 [87] colorspace_2.1-1            BiocGenerics_0.48.1        
 [89] DBI_1.2.3                   nnet_7.3-19                
 [91] ade4_1.7-23                 tidyselect_1.2.1           
 [93] timeSeries_4041.111         compiler_4.3.1             
 [95] microbiome_1.24.0           xml2_1.3.8                 
 [97] DelayedArray_0.28.0         scales_1.4.0               
 [99] DEoptimR_1.1-4              spatial_7.3-18             
[101] rappdirs_0.3.3              digest_0.6.35              
[103] minqa_1.2.8                 rmarkdown_2.29             
[105] XVector_0.42.0              htmltools_0.5.8.1          
[107] pkgconfig_2.0.3             MatrixGenerics_1.14.0      
[109] stabledist_0.7-2            fastmap_1.2.0              
[111] rlang_1.1.3                 htmlwidgets_1.6.4          
[113] farver_2.1.2                zoo_1.8-13                 
[115] jsonlite_2.0.0              ModelMetrics_1.2.2.2       
[117] RCurl_1.98-1.17             modeltools_0.2-24          
[119] Formula_1.2-5               GenomeInfoDbData_1.2.11    
[121] patchwork_1.3.2             Rhdf5lib_1.24.2            
[123] Rcpp_1.0.12                 ggnewscale_0.5.2           
[125] fastGHQuad_1.0.1            stringi_1.8.3              
[127] ggstar_1.0.4                stable_1.1.6               
[129] zlibbioc_1.48.2             MASS_7.3-60.0.1            
[131] plyr_1.8.9                  listenv_0.9.1              
[133] Biostrings_2.70.3           splines_4.3.1              
[135] hash_2.2.6.3                multtest_2.58.0            
[137] hms_1.1.3                   ggtreeExtra_1.12.0         
[139] ggsignif_0.6.4              stats4_4.3.1               
[141] rmutil_1.1.10               evaluate_1.0.5             
[143] nloptr_2.2.1                tzdb_0.5.0                 
[145] getopt_1.20.4               future_1.34.0              
[147] clue_0.3-66                 coin_1.4-3                 
[149] broom_1.0.9                 xtable_1.8-4               
[151] tidytree_0.4.6              rstatix_0.7.2              
[153] viridisLite_0.4.2           class_7.3-22               
[155] aplot_0.2.8                 IRanges_2.36.0             
[157] cluster_2.1.8.1             timechange_0.3.0           
[159] globals_0.18.0