source("custom_functions.R")Geography-independent mucosal microbiota alterations in primary sclerosing cholangitis persist after liver transplantation
1. Hypothesis: PSC is associated with the gut microbiome changes
1 Introduction
1.1 About
Pre_LTx vs Post_LTx vs Healthy analysis on merged data
Alpha diversity – Richness, Shannon, Simpson, and Pielou indices tested by linear fixed effect or mixed effect model (group effect, country effect, interaction effect)
Beta diversity – PERMANOVA (group effect, country effect, interaction effect), PCA visualization
Differential abundance testing:
o Group effect – linDA + MaAsLin2 intersection
o Country effect – taxa with significant interaction effect were excluded based on individual post-hoc analysis. Taxa had to have the same direction in both countries
ML models, trained and validated using out-of-sample bootstrap (500 resamplings):
o ENET
o kNN
o GBoost
o RF
Pairwise comparisons:
Pre_ltx vs Healthy – an effect of disease, liver cirrhosis and overall bad clinical condition
Pre_ltx vs post_ltx – an effect of transplantation + general improvement of the clinical state
Post_ltx vs Healthy – does LTx lead to „healthy microbiome”?
Report structure The analysis is divided into two segments: terminal ileum and colon. Each part begins with data filtering, where sequencing depth is assessed and nearZeroVar() is applied to remove taxa with no variance in the dataset. This is followed by a series of analytical steps:
Alpha diversity
Beta diversity
Univariate testing
Machine learning models training and validation
At the end of the analysis, a brief summary of the results is provided, as well as final plots that are reported in the original publication.
Importing libraries and custom functions built for this analysis
1.2 Clinical characteristics
clinical_char <- read.xlsx("../../../results/clinical_characteristics.xlsx",
colNames = FALSE)
colnames(clinical_char) <- clinical_char[3,]
clinical_char <- clinical_char[-c(1,2,3),]
header_groups <- c(" " = 2, "HC" = 2, "pre_LTx" = 2, "post_LTx_non-rPSC"=2, "post_LTx_rPSC"=2)knitr::kable(clinical_char, align="l", escape=FALSE,digits=3) %>%
kable_styling(bootstrap_options = c("hover", "striped", "responsive")) %>%
kable_styling(full_width = TRUE) %>%
add_header_above(header_groups)| NA | Norway | Czech | Norway | Czech | Norway | Czech | Norway | Czech | |
|---|---|---|---|---|---|---|---|---|---|
| 4 | F/M [n] | 10/30 | 30/26 | 19/65 | 12/19 | 15/36 | 26/57 | 13 | 3/22 |
| 5 | Age [years] | NA | 50 (23;68) | 40 (17;77) | 35 (17;63) | 49 (23;70) | 48 (32;81) | NA | 48 (26;81) |
| 6 | IBD yes/no [n] | - | - | 66/18 | 25/6 | 31/7 | 72/11 | 11/2 | 24/1 |
| 7 | Total bilirubin [µmol/l] | - | 21 (3;68) | 24 | 62 (9;500) | 19.5 (11;49) | 18 (5;132) | 24 (13;52) | 24 (6;148) |
| 8 | AST [µkat/l] | - | 0.5 (0.3;1.9) | 1.1 (0.2;8.6) | 1.6 (0.5;4.0) | 0.6 (0.2;3.2) | 0.4 (0.2;1.7) | 1.1 (0.3, 4.0) | 0.5 (0.2;1.9) |
| 9 | ALT [µkat/l] | - | 0.5 (0.3;1.7) | 1.6 (0.5;8.6) | 1.7 (0.5;5.2) | 0.6 (0.1,4.1) | 0.5 (0.3;4.0) | 1.1 (0.2;4.5) | 0.5 (0.3;3.5) |
| 10 | ALP [µkat/l] | - | 1.3 (0.7;2.3) | 4.2 (0.9;17.3) | 6.0 (1.6;17.8) | 1.3 (0.6;7.9) | 1.4 (0.6;3.5) | 3.7 (0.5;13.9) | 2.3 (0.5;13.1) |
| 11 | GGT [µkat/l] | - | 0.4 (0.2;5.0) | - | 2.6 (0.6;15.2) | - | 0.4 (0.1; 4.8) | - | 1.4 (0.3;13.7) |
| 12 | INR [-] | - | 1.1 (1.1;1.2) | - | 1.1 (0.9;1.8) | - | 1.1 (0.9;1.6) | - | 1.1 (1.0;1.7) |
| 13 | Creatinine [µmol/l] | - | 77 (50;115) | 65 (39; 232) | 60 (44;101) | 84 (59; 159) | 87 (49;248) | 81 (65; 133) | 85 (50;780) |
| 14 | Albumin [g/l] | - | 50 (43;57) | 41 (26;49) | 38 (20;50) | 43 (35;49) | 45 (26;51) | 38 (34; 50) | 42 (26;50) |
| 15 | Fecal calprotectin [µg/g] | - | - | 59 (1;2844) | 123 (6;4513) | 30 (1;1945) | 275 (6;4821) | 55 (10;832) | 564 (14;4301) |
| 16 | NANCY_max [-] | - | - | - | 2 (0;4) | - | 2 (0;4) | - | 2 (0;4) |
| 17 | eMayo [-] | - | - | - | 1 (0;2) | - | 1 (0;3) | - | 1 (0;2) |
| 18 | Mayo_DAI [-] | - | - | - | 2 (0;5) | - | 1 (0;8) | - | 2 (0;7) |
| 19 | Mayo_PSC risk score [-] | - | - | 0.3 (;2.2;3.5) | 0.3 (;2.2;3.5) | 0.3 (;1.1;2.2) | - | 1.0 (;1.4;3.1) | - |
| 20 | AOM_score [-] | - | - | 1.9 (0.5;3.8) | 2 (1.1;5.0) | 1.9 (0.8;2.7) | - | 2.2 (1.2;3.5) | - |
| 21 | APRI_score [-] | - | 0.2 (0.1;1.5) | 0.7 (0.1;5.5) | 1.1 (0.3;14.3) | 0.5 (0.2;2.3) | 0.3 (0.1;1.5) | 0.7 (0.2;3.6) | 0.4 (0.1;1.4) |
| 22 | FIB-4_score [-] | - | 0.8 (0.4;3.4) | 1.1 (0.8;8.6) | 1.6 (0.4;23.1) | 1.5 (0.6;3.6) | 1.1 (0.3; 5.0) | 2.2 (0.5;5.8) | 1.1 (0.5;3.1) |
| 23 | MELD_score [-] | - | 7.8 (7.1;8.6) | - | 8.2 (6.4;18.0) | - | 8.0 (6.4;14.6) | - | 7.8 (6.5;22.8) |
| 24 | Platelets [10^9/l] | - | 267 (145;400) | 265 (37; 712) | 201 (20;477) | 197 (101;434) | 208 (41;442) | 212 (69;409) | 213 (83;416) |
1.3 Data Import
Importing ASV, taxa and metadata tables for both Czech and Norway samples.
Czech
path = "../../../../data/analysis_ready_data/ikem/"
asv_tab_ikem <- as.data.frame(fread(file.path(path,"asv_table_ikem.csv"),
check.names = FALSE))
taxa_tab_ikem <- as.data.frame(fread(file.path(path,"taxa_table_ikem.csv"),
check.names = FALSE))
metadata_ikem <- as.data.frame(fread(file.path(path,"metadata_ikem.csv"),
check.names = FALSE))Norway
path = "../../../../data/analysis_ready_data/norway/"
asv_tab_norway <- as.data.frame(fread(file.path(path,"asv_table_norway.csv"),
check.names = FALSE))
taxa_tab_norway <- as.data.frame(fread(file.path(path,"taxa_table_norway.csv"),
check.names = FALSE))
metadata_norway <- as.data.frame(fread(file.path(path,"metadata_norway.csv"),
check.names = FALSE))1.4 Merging
Merging two countries based on the different matrices - Ileum, Colon.
Terminal ileum
ileum_data <- merging_data(asv_tab_1=asv_tab_ikem,
asv_tab_2=asv_tab_norway,
taxa_tab_1=taxa_tab_ikem,
taxa_tab_2=taxa_tab_norway,
metadata_1=metadata_ikem,
metadata_2=metadata_norway,
segment="TI",Q="Q1")Removing 1498 ASV(s)
Removing 1834 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
ileum_asv_tab <- ileum_data[[1]]
ileum_taxa_tab <- ileum_data[[2]]
ileum_metadata <- ileum_data[[3]]Reads statistics
summary(colSums(ileum_asv_tab[,-1])) Min. 1st Qu. Median Mean 3rd Qu. Max.
3441 16579 20972 24071 28473 85910
Colon
colon_data <- merging_data(asv_tab_1=asv_tab_ikem,
asv_tab_2=asv_tab_norway,
taxa_tab_1=taxa_tab_ikem,
taxa_tab_2=taxa_tab_norway,
metadata_1=metadata_ikem,
metadata_2=metadata_norway,
segment="colon",Q="Q1")Removing 739 ASV(s)
Removing 266 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
colon_asv_tab <- colon_data[[1]]
colon_taxa_tab <- colon_data[[2]]
colon_metadata <- colon_data[[3]]Reads statistics
summary(colSums(colon_asv_tab[,-1])) Min. 1st Qu. Median Mean 3rd Qu. Max.
63 19814 26676 30841 37553 136048
2 Data Analysis - Terminal ileum
segment="terminal_ileum"2.1 Filtering
Rules:
sequencing depth > 10000
nearZeroVar() with default settings
Rarefaction Curve
path="../../../intermediate_files/rarecurves"
seq_depth_threshold <- 10000ps <- construct_phyloseq(ileum_asv_tab,ileum_taxa_tab,ileum_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_ileum.Rdata"))load(file.path(path,"rarefaction_ileum.Rdata"))
prare <- ggrarecurve(obj=rareres,
factorNames="Country",
indexNames=c("Observe")) +
theme_bw() +
theme(axis.text=element_text(size=8), panel.grid=element_blank(),
strip.background = element_rect(colour=NA,fill="grey"),
strip.text.x = element_text(face="bold")) +
geom_vline(xintercept = seq_depth_threshold,
linetype="dashed",
color = "red") +
xlim(0, 20000)The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)
prareLibrary size
read_counts(ileum_asv_tab, line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
2.1.1 Sequencing depth
data_filt <- seq_depth_filtering(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
seq_depth_threshold = 10000)Removing 131 ASV(s)
filt_ileum_asv_tab <- data_filt[[1]]; alpha_ileum_asv_tab <- filt_ileum_asv_tab
filt_ileum_taxa_tab <- data_filt[[2]]; alpha_ileum_taxa_tab <- filt_ileum_taxa_tab
filt_ileum_metadata <- data_filt[[3]]; alpha_ileum_metadata <- filt_ileum_metadata
seq_step <- dim(filt_ileum_asv_tab)[1]Library size
read_counts(filt_ileum_asv_tab,line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
2.1.2 NearZeroVar
data_filt <- nearzerovar_filtering(filt_ileum_asv_tab,
filt_ileum_taxa_tab,
filt_ileum_metadata)
filt_ileum_asv_tab <- data_filt[[1]]
filt_ileum_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_ileum_asv_tab)[1]Library size
read_counts(filt_ileum_asv_tab,line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
2.1.3 Final Counts
final_counts_filtering(ileum_asv_tab,
filt_ileum_asv_tab,
filt_ileum_metadata,
seq_step, 0, nearzero_step) %>% `colnames<-`("Count") Count
Raw data: ASVs 5132
Raw data: Samples 294
Sequencing depth filt: ASVs 5001
Prevalence filt: ASVs 0
NearZeroVar filt: ASVs 479
Filt data: ASVs 479
Filt data: Samples 281
Filt data: Patients 281
Filt data: Patients.1 0
Filtered samples 13
Matrices TI
healthy 73
non-rPSC 0
rPSC 0
pre_ltx 70
post_ltx 138
ETOH 0
2.2 Alpha diversity
Calculating Richness, Shannon, Simpson, Pielou indexes on raw (unfiltered) rarefied data. Samples with sequencing depth < 10000 were excluded. Testing by linear model.
path = "../../../results/merged_sites/main_analysis/Q1/alpha_diversity"Calculation
# Construct MPSE object
alpha_ileum_metadata$Sample <- alpha_ileum_metadata$SampleID
ileum_mpse <- as.MPSE(construct_phyloseq(alpha_ileum_asv_tab,
alpha_ileum_taxa_tab,
alpha_ileum_metadata))
ileum_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)
# Calculate alpha diversity - rarefied counts
ileum_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)alpha_div_plots <- list()
# preparing data frame
alpha_data <- data.frame(SampleID=ileum_mpse$Sample.x,
Observe=ileum_mpse$Observe,
Shannon=ileum_mpse$Shannon,
Simpson=ileum_mpse$Simpson,
Pielou=ileum_mpse$Pielou,
Group=ileum_mpse$Group,
Country=ileum_mpse$Country,
Patient=ileum_mpse$Patient)
# save the result
write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
row.names = FALSE)2.2.1 Plots
p_boxplot_alpha <- alpha_diversity_countries(alpha_data)Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha
# see the results
p_boxplot_alpha2.2.2 Linear Models
path = "../../../results/merged_sites/main_analysis/Q1/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))Richness
# run model
results_model <- pairwise.lm(formula = "Observe ~ Group * Country",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_observe <- results_model[[1]]
results_model_observe_emeans <- results_model[[2]]
} else {
results_model_observe <- results_model
results_model_observe_emeans <- NA
}
# save the results
pc_observed <- list();
pc_observed[[segment]] <- results_model_observe# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -53.642 | 11.742 | -4.568 | 0.000 | 0.000 | *** |
| healthy , pre_ltx - CZ vs NO | 11.219 | 10.488 | 1.070 | 0.287 | 0.361 | |
| healthy vs Grouppre_ltx:CountryNO | 6.313 | 15.403 | 0.410 | 0.683 | 0.683 | |
| pre_ltx vs Grouppost_ltx | 25.623 | 11.366 | 2.254 | 0.025 | 0.057 | |
| pre_ltx , post_ltx - CZ vs NO | 17.533 | 12.615 | 1.390 | 0.166 | 0.249 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -35.933 | 15.921 | -2.257 | 0.025 | 0.057 | |
| healthy vs Grouppost_ltx | -28.019 | 9.237 | -3.033 | 0.003 | 0.012 | * |
| healthy , post_ltx - CZ vs NO | 11.219 | 11.267 | 0.996 | 0.321 | 0.361 | |
| healthy vs Grouppost_ltx:CountryNO | -29.620 | 14.629 | -2.025 | 0.044 | 0.080 |
knitr::kable(results_model_observe_emeans,digits = 3,
caption = "Raw results of independent country analysis")| contrast | Country | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| post_ltx - pre_ltx | CZ | 25.623 | 11.366 | 204 | 2.254 | 0.025 |
| post_ltx - pre_ltx | NO | -10.310 | 11.148 | 204 | -0.925 | 0.356 |
| post_ltx - healthy | CZ | -28.019 | 9.237 | 207 | -3.033 | 0.003 |
| post_ltx - healthy | NO | -57.639 | 11.344 | 207 | -5.081 | 0.000 |
Shannon
# run model
results_model <- pairwise.lm(formula = "Shannon ~ Group * Country",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_shannon <- results_model[[1]]
results_model_shannon_emeans <- results_model[[2]]
} else {
results_model_shannon <- results_model
results_model_shannon_emeans <- NA
}
# save the results
pc_shannon <- list();
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.505 | 0.159 | -3.171 | 0.002 | 0.017 | * |
| healthy , pre_ltx - CZ vs NO | 0.007 | 0.142 | 0.046 | 0.963 | 0.963 | |
| healthy vs Grouppre_ltx:CountryNO | 0.144 | 0.209 | 0.687 | 0.493 | 0.634 | |
| pre_ltx vs Grouppost_ltx | 0.372 | 0.148 | 2.514 | 0.013 | 0.049 | * |
| pre_ltx , post_ltx - CZ vs NO | 0.150 | 0.164 | 0.915 | 0.361 | 0.542 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.502 | 0.207 | -2.423 | 0.016 | 0.049 | * |
| healthy vs Grouppost_ltx | -0.134 | 0.111 | -1.200 | 0.231 | 0.416 | |
| healthy , post_ltx - CZ vs NO | 0.007 | 0.136 | 0.048 | 0.962 | 0.963 | |
| healthy vs Grouppost_ltx:CountryNO | -0.358 | 0.176 | -2.031 | 0.044 | 0.098 |
knitr::kable(results_model_shannon_emeans,digits = 3,
caption = "Raw results of independent country analysis")| contrast | Country | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| post_ltx - pre_ltx | CZ | 0.372 | 0.148 | 204 | 2.514 | 0.013 |
| post_ltx - pre_ltx | NO | -0.130 | 0.145 | 204 | -0.898 | 0.370 |
| post_ltx - healthy | CZ | -0.134 | 0.111 | 207 | -1.200 | 0.231 |
| post_ltx - healthy | NO | -0.492 | 0.137 | 207 | -3.597 | 0.000 |
Simpson
# run model
results_model <- pairwise.lm(formula = "Simpson ~ Group * Country",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_simpson <- results_model[[1]]
results_model_simpson_emeans <- results_model[[2]]
} else {
results_model_simpson <- results_model
results_model_simpson_emeans <- NA
}
# save the results
pc_simpson <- list();
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.046 | 0.019 | -2.481 | 0.014 | 0.129 | |
| healthy , pre_ltx - CZ vs NO | -0.008 | 0.017 | -0.466 | 0.642 | 0.692 | |
| healthy vs Grouppre_ltx:CountryNO | 0.027 | 0.024 | 1.092 | 0.277 | 0.498 | |
| pre_ltx vs Grouppost_ltx | 0.037 | 0.021 | 1.763 | 0.079 | 0.238 | |
| pre_ltx , post_ltx - CZ vs NO | 0.019 | 0.023 | 0.811 | 0.418 | 0.627 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.060 | 0.029 | -2.033 | 0.043 | 0.195 | |
| healthy vs Grouppost_ltx | -0.009 | 0.016 | -0.572 | 0.568 | 0.692 | |
| healthy , post_ltx - CZ vs NO | -0.008 | 0.020 | -0.396 | 0.692 | 0.692 | |
| healthy vs Grouppost_ltx:CountryNO | -0.033 | 0.025 | -1.303 | 0.194 | 0.436 |
knitr::kable(results_model_simpson_emeans,digits = 3,
caption = "Raw results of independent country analysis")| contrast | Country | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| post_ltx - pre_ltx | CZ | 0.037 | 0.021 | 204 | 1.763 | 0.079 |
| post_ltx - pre_ltx | NO | -0.023 | 0.021 | 204 | -1.106 | 0.270 |
Pielou
# run model
results_model <- pairwise.lm(formula = "Pielou ~ Group * Country",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_pielou <- results_model[[1]]
results_model_pielou_emeans <- results_model[[2]]
} else {
results_model_pielou <- results_model
results_model_pielou_emeans <- NA
}
# save the results
pc_pielou <- list();
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.040 | 0.024 | -1.708 | 0.090 | 0.285 | |
| healthy , pre_ltx - CZ vs NO | -0.008 | 0.021 | -0.394 | 0.694 | 0.943 | |
| healthy vs Grouppre_ltx:CountryNO | 0.010 | 0.031 | 0.332 | 0.740 | 0.943 | |
| pre_ltx vs Grouppost_ltx | 0.042 | 0.022 | 1.922 | 0.056 | 0.285 | |
| pre_ltx , post_ltx - CZ vs NO | 0.002 | 0.024 | 0.082 | 0.935 | 0.943 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.051 | 0.030 | -1.677 | 0.095 | 0.285 | |
| healthy vs Grouppost_ltx | 0.001 | 0.017 | 0.072 | 0.943 | 0.943 | |
| healthy , post_ltx - CZ vs NO | -0.008 | 0.020 | -0.407 | 0.684 | 0.943 | |
| healthy vs Grouppost_ltx:CountryNO | -0.041 | 0.027 | -1.525 | 0.129 | 0.290 |
knitr::kable(results_model_pielou_emeans,digits = 3,
caption = "Raw results of independent country analysis")| contrast | Country | estimate | SE | df | t.ratio | p.value |
|---|---|---|---|---|---|---|
| post_ltx - pre_ltx | CZ | 0.042 | 0.022 | 204 | 1.922 | 0.056 |
| post_ltx - pre_ltx | NO | -0.009 | 0.021 | 204 | -0.435 | 0.664 |
2.2.3 Saving results
alpha_list <- list(
Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
write.xlsx(alpha_list,
file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))2.3 Beta diversity
Calculating Aitchison distance (euclidean distance on clr-transformed data) at genus level (main analysis). In supplements, there are also bray-curtis and Jaccard distance at ASV and genus levels. Testing by PERMANOVA.
2.3.1 Main analysis
Genus level, Aitchison distance
level="genus"
path = "../../../results/merged_sites/main_analysis/Q1/beta_diversity"pairwise_aitchison_raw <- list()
pca_plots_list <- list()Aggregation, filtering
# Aggregation
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level=level,
names=TRUE)
# Filtration
filt_data <- filtering_steps(genus_data[[1]],
genus_data[[2]],
ileum_metadata,
seq_depth_threshold=10000)Removing 5 ASV(s)
filt_ileum_genus_tab <- filt_data[[1]]
filt_ileum_genus_taxa <- filt_data[[2]]
filt_ileum_metadata <- filt_data[[3]]2.3.1.0.1 PERMANOVA
# prepare dataset
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
sim.method = "robust.aitchison", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,
filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
interaction = TRUE, sim.method = "robust.aitchison",
p.adjust.m="BH")
# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 695.306 | 4.259 | 0.029 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 309.768 | 1.811 | 0.009 | 0.003 | 0.003 | ** |
| post_ltx vs healthy | 1 | 914.409 | 5.454 | 0.025 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy , Country | 1 | 482.074 | 2.953 | 0.020 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 699.075 | 4.087 | 0.019 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 728.020 | 4.342 | 0.020 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy : Country | 1 | 237.598 | 1.460 | 0.010 | 0.010 | 0.030 | * |
| pre_ltx vs post_ltx : Country | 1 | 237.265 | 1.390 | 0.007 | 0.021 | 0.032 | * |
| post_ltx vs healthy : Country | 1 | 202.888 | 1.211 | 0.006 | 0.106 | 0.106 |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.1]
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_ileum_metadata$Group,
covariate = filt_ileum_metadata$Country,
group1 = group1,
group2 = group2)
print(result_list)
}$pre_ltx_healthy_CZ
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Model 1 275.1 0.02844 1.7272 0.003 0.003
Residual 59 9396.0 0.97156
Total 60 9671.1 1.00000
$pre_ltx_healthy_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Model 1 657.8 0.0474 3.9806 0.001 0.0013333
Residual 80 13220.8 0.9526
Total 81 13878.7 1.0000
$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Model 1 362.6 0.03086 2.165 0.001 0.0013333
Residual 68 11387.2 0.96914
Total 69 11749.8 1.00000
$healthy_CZ_vs_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Model 1 357.1 0.03082 2.2579 0.001 0.0013333
Residual 71 11229.6 0.96918
Total 72 11586.7 1.00000
$pre_ltx_post_ltx_CZ
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Model 1 184.3 0.0085 1.0629 0.314 0.314
Residual 124 21497.3 0.9915
Total 125 21681.6 1.0000
$pre_ltx_post_ltx_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Fac, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Model 1 362.8 0.02649 2.1767 0.001 0.0013333
Residual 80 13332.7 0.97351
Total 81 13695.4 1.00000
$pre_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Model 1 362.6 0.03086 2.165 0.001 0.0013333
Residual 68 11387.2 0.96914
Total 69 11749.8 1.00000
$post_ltx_CZ_vs_NO
Permutation test for adonis under reduced model
Permutation: free
Number of permutations: 999
adonis2(formula = x1 ~ Cov, data = x2, permutations = perm)
Df SumOfSqs R2 F Pr(>F) p.adjusted
Model 1 573.8 0.02389 3.3288 0.001 0.0013333
Residual 136 23442.8 0.97611
Total 137 24016.6 1.00000
2.3.1.0.2 Plots
p <- pca_plot_custom(filt_ileum_genus_tab,
filt_ileum_genus_taxa,
filt_ileum_metadata,
show_boxplots = TRUE,
variable = "Group", size=3, show_legend= FALSE)
# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p
# see the results
p2.3.1.1 Saving results
write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]],
file = file.path(path,
paste0("beta_diversity_results_", segment,".xlsx")))2.4 Univariate Analysis
Using two DAA tools - linDA and Maaslin. Only the intersection of sets selected by each tool was shown to minimize false positives. Taxa with a significant interaction effect were excluded based on individual post-hoc analysis of the Czech and Norwegian cohorts. Only taxa with significant log fold change that showed the same change direction in both countries were retained.
Main analysis is performed at genus level. In supplements, analysis is performed at ASV and phylum levels.
2.4.1 Main analysis
level="genus"
# needed paths
path = "../../../results/merged_sites/main_analysis/Q1/univariate_analysis"
path_maaslin=file.path("../../../intermediate_files/maaslin/Q1",level)# variables
raw_linda_results_genus <- list();
raw_linda_results_genus[[segment]] <- list()
linda_results_genus <- list();
linda_results_genus[[segment]] <- list()
# country and interaction problems
list_country_union <- list()
list_intersections <- list()
list_venns <- list()
uni_statistics <- list()
# workbook for final df
wb <- createWorkbook()
# PSC effect
psc_effect <- list()Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
ileum_genus_asv_taxa_tab <- create_asv_taxa_table(ileum_genus_tab,
ileum_genus_taxa_tab)2.4.1.1 pre_ltx vs healthy
We can assume, that the differentially abundant taxa between these two groups are caused by PSC disease and other factors, like liver cirrhosis and overall bad clinical condition.
group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])2.4.1.1.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group, usage="linDA")Removing 84 ASV(s)
Removing 10 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 143 samples and 184 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plots
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano2.4.1.1.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("NO vs CZ")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano2.4.1.1.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn2.4.1.1.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)2.4.1.1.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])2.4.1.2.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group, usage="linDA")Removing 46 ASV(s)
Removing 6 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 208 samples and 172 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano2.4.1.2.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano2.4.1.2.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn2.4.1.2.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)2.4.1.2.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.1.3 post_ltx vs healthy
We can assume, that the differentially abundant taxa between these two groups remain because of the persistent gut MB alteration due to the PSC disease. Also, transplantation itself could add to this difference, as it has definitely strong impact on microbial composition.
group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])2.4.1.3.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group, usage="linDA")Removing 45 ASV(s)
Removing 2 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_ileum_uni_data,
filt_ileum_uni_metadata,
formula = '~ Group * Country')0 features are filtered!
The filtered data has 211 samples and 180 features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_ileum_uni_data,
filt_ileum_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output,
group1,
taxa_table = filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_ileum_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano2.4.1.3.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) +
ggtitle(comparison_title)
volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano2.4.1.3.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn2.4.1.3.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_ileum_uni_data,
filt_ileum_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)2.4.1.3.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)2.4.1.4 Visualization
Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.
list_heatmap <- list_intersections[grep(paste(segment,level),
names(list_intersections),value=TRUE)]
p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_lindaDot heatmap
dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
ileum_taxa_tab) + xlab("") + ylab("")min_clr -1.767585
max_clr 6.908912
min_log -5.062147
max_log 4.892498
dotheatmap_lindaHorizontal bar plot
p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))Using SeqID as id variables
p_prevalence_final <- ggarrange(p_prevalence,
ggplot() + theme_minimal(),
nrow = 2,heights = c(1,0.09))
p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),
p_prevalence_final,
ncol=2,widths = c(1,0.3))Warning: Removed 80 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 80 rows containing missing values or values outside the scale range
(`geom_text()`).
pdot_heatmap_ileum <- p2.4.1.5 PSC effect
To get only PSC associated taxa, we can intersect differentially abundant taxa from the previous analyses (see diagram below).
pre_LTx vs HC and Post_LTx vs HC intersection

A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)
psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
# see the results
psc_effect[[paste(segment,level)]] SeqID
10 Enterorhabdus
16 Barnesiella
19 Butyricimonas
20 Odoribacter
27 Alistipes
29 Parabacteroides
41 Holdemania
64 Coprococcus
68 Fusicatenibacter
72 Lachnoclostridium
74 Lachnospiraceae_FCS020_group
105 Oscillibacter
118 Faecalibacterium
155 Pseudomonas
164 Enterococcus
Taxonomy
10 k__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;f__Eggerthellaceae;g__Enterorhabdus
16 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Barnesiellaceae;g__Barnesiella
19 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Butyricimonas
20 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Odoribacter
27 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes
29 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides
41 k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Holdemania
64 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus
68 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter
72 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium
74 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_FCS020_group
105 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter
118 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Faecalibacterium
155 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas
164 k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus
log2FoldChange p_value padj
10 -1.050549 1.087023e-02 4.545734e-02
16 -3.177161 1.263665e-03 7.750481e-03
19 -2.900425 6.571752e-05 9.301556e-04
20 -2.971657 3.696980e-05 6.184040e-04
27 -2.559742 5.731016e-04 4.336862e-03
29 -3.418185 1.366753e-05 2.794251e-04
41 -1.213660 1.082333e-02 4.545734e-02
64 -3.868534 6.688508e-07 3.076714e-05
68 -2.864295 5.224510e-04 4.336862e-03
72 -1.781011 1.555333e-03 9.231654e-03
74 -2.231945 8.971276e-04 6.113758e-03
105 -1.970465 2.772265e-03 1.416936e-02
118 -3.360183 8.947299e-06 2.351861e-04
155 1.573079 8.758249e-03 4.028794e-02
164 4.892498 4.554602e-10 8.380467e-08
2.4.1.6 Saving results
# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
overwrite = TRUE)
# PSC effect
write.xlsx(psc_effect[[paste(segment,level)]],file.path(path,paste0("psc_effect_",segment,".xlsx")))
# SIGNIFICANT taxa
write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
`names<-`(gsub(segment, "", names(
list_intersections[grepl(segment,names(list_intersections))]))),
file.path(path,paste0("significant_taxa_",segment,".xlsx")))2.5 Machine learning
Binary classification was performed using four independent models: Elastic Net (ENET), RF (RF), Gradient Boosting (GBoost), K-nearest Neighbors (kNN). Training and validation were performed via bootstrapping (N=500) on clr-transformed data. Model performance metrics expressed as AUC were calculated based on an out-of-sample principle.
The supplementary analysis contains training and validating at clr-transformed data at ASV level and relative-abundance data at ASV and genus level.
path = "../../../results/merged_sites/main_analysis/Q1/models"2.5.1 ENET
model="enet"2.5.1.1 ASV level
level="ASV"2.5.1.1.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group, usage="ml_clr")Removing 1598 ASV(s)
Removing 146 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC curve
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ <- list()
models_cm <- list()
betas <- list()
roc_cs <- list()
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.4000000
lambda 0.0256910
auc 1.0000000
auc_czech 1.0000000
auc_no 1.0000000
auc_optimism_corrected 0.9562938
auc_optimism_corrected_CIL 0.8940903
auc_optimism_corrected_CIU 0.9955143
accuracy 1.0000000
accuracy_czech NaN
accuracy_no 1.0000000
accuracy_optimism_corrected 0.8824810
accuracy_optimism_corrected_CIL 0.7808636
accuracy_optimism_corrected_CIU 0.9600000
enet_model$conf_matrices$original
Predicted
True 0 1
0 73 0
1 0 70
$czech
Predicted
True 0 1
0 37 0
1 0 24
$no
Predicted
True 0 1
0 36 0
1 0 46
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c2.5.1.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 979 ASV(s)
Removing 68 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.2000000
lambda 0.1292328
auc 0.9967909
auc_czech 0.9979575
auc_no 0.9969807
auc_optimism_corrected 0.8645187
auc_optimism_corrected_CIL 0.7852767
auc_optimism_corrected_CIU 0.9318161
accuracy 0.9278846
accuracy_czech NaN
accuracy_no 0.9512195
accuracy_optimism_corrected 0.8220597
accuracy_optimism_corrected_CIL 0.7333333
accuracy_optimism_corrected_CIU 0.8924940
enet_model$conf_matrices$original
Predicted
True 0 1
0 138 0
1 15 55
$czech
Predicted
True 0 1
0 102 0
1 11 13
$no
Predicted
True 0 1
0 36 0
1 4 42
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c2.5.1.1.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 641 ASV(s)
Removing 104 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.20000000
lambda 0.04799701
auc 1.00000000
auc_czech 1.00000000
auc_no 1.00000000
auc_optimism_corrected 0.95799961
auc_optimism_corrected_CIL 0.89999175
auc_optimism_corrected_CIU 0.98977781
accuracy 1.00000000
accuracy_czech NaN
accuracy_no 1.00000000
accuracy_optimism_corrected 0.89383497
accuracy_optimism_corrected_CIL 0.82118019
accuracy_optimism_corrected_CIU 0.96028870
enet_model$conf_matrices$original
Predicted
True 0 1
0 73 0
1 0 138
$czech
Predicted
True 0 1
0 37 0
1 0 102
$no
Predicted
True 0 1
0 36 0
1 0 36
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c2.5.1.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level = level)
ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]2.5.1.2.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 84 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.20000000
lambda 0.03984448
auc 1.00000000
auc_czech 1.00000000
auc_no 1.00000000
auc_optimism_corrected 0.96582239
auc_optimism_corrected_CIL 0.91575521
auc_optimism_corrected_CIU 0.99568966
accuracy 1.00000000
accuracy_czech NaN
accuracy_no 1.00000000
accuracy_optimism_corrected 0.89754798
accuracy_optimism_corrected_CIL 0.80769231
accuracy_optimism_corrected_CIU 0.97141139
enet_model$conf_matrices$original
Predicted
True 0 1
0 73 0
1 0 70
$czech
Predicted
True 0 1
0 37 0
1 0 24
$no
Predicted
True 0 1
0 36 0
1 0 46
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c2.5.1.2.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,
group,
usage="ml_clr")Removing 46 ASV(s)
Removing 6 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group", N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.0000000
lambda 0.6674012
auc 0.9726708
auc_czech 0.9616013
auc_no 0.9800725
auc_optimism_corrected 0.8344709
auc_optimism_corrected_CIL 0.7487803
auc_optimism_corrected_CIU 0.9154473
accuracy 0.8653846
accuracy_czech NaN
accuracy_no 0.8780488
accuracy_optimism_corrected 0.7788698
accuracy_optimism_corrected_CIL 0.6973684
accuracy_optimism_corrected_CIU 0.8516141
enet_model$conf_matrices$original
Predicted
True 0 1
0 136 2
1 26 44
$czech
Predicted
True 0 1
0 101 1
1 17 7
$no
Predicted
True 0 1
0 35 1
1 9 37
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c2.5.1.2.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
ileum_genus_taxa_tab,
ileum_metadata,group,
usage="ml_clr")Removing 45 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.00000000
lambda 0.07097722
auc 1.00000000
auc_czech 1.00000000
auc_no 1.00000000
auc_optimism_corrected 0.96566611
auc_optimism_corrected_CIL 0.92200341
auc_optimism_corrected_CIU 0.99386451
accuracy 1.00000000
accuracy_czech NaN
accuracy_no 1.00000000
accuracy_optimism_corrected 0.90031834
accuracy_optimism_corrected_CIL 0.82800926
accuracy_optimism_corrected_CIU 0.96054545
enet_model$conf_matrices$original
Predicted
True 0 1
0 73 0
1 0 138
$czech
Predicted
True 0 1
0 37 0
1 0 102
$no
Predicted
True 0 1
0 36 0
1 0 36
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c2.5.1.3 Saving results
models_summ_df_ileum <- do.call(rbind,
models_summ[grep(segment,names(models_summ),value = TRUE)])
write.csv(models_summ_df_ileum,file.path(path,paste0("elastic_net_",segment,".csv")))2.6 Results overview
2.6.0.1 Alpha diversity
knitr::kable(pc_observed[[segment]],
digits = 3,
caption = "Results of linear model testing ASV Richness")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -53.642 | 11.742 | -4.568 | 0.000 | 0.000 | *** |
| healthy , pre_ltx - CZ vs NO | 11.219 | 10.488 | 1.070 | 0.287 | 0.361 | |
| healthy vs Grouppre_ltx:CountryNO | 6.313 | 15.403 | 0.410 | 0.683 | 0.683 | |
| pre_ltx vs Grouppost_ltx | 25.623 | 11.366 | 2.254 | 0.025 | 0.057 | |
| pre_ltx , post_ltx - CZ vs NO | 17.533 | 12.615 | 1.390 | 0.166 | 0.249 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -35.933 | 15.921 | -2.257 | 0.025 | 0.057 | |
| healthy vs Grouppost_ltx | -28.019 | 9.237 | -3.033 | 0.003 | 0.012 | * |
| healthy , post_ltx - CZ vs NO | 11.219 | 11.267 | 0.996 | 0.321 | 0.361 | |
| healthy vs Grouppost_ltx:CountryNO | -29.620 | 14.629 | -2.025 | 0.044 | 0.080 |
knitr::kable(pc_shannon[[segment]],
digits = 3,
caption = "Results of linear model testing Shannon index")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.505 | 0.159 | -3.171 | 0.002 | 0.017 | * |
| healthy , pre_ltx - CZ vs NO | 0.007 | 0.142 | 0.046 | 0.963 | 0.963 | |
| healthy vs Grouppre_ltx:CountryNO | 0.144 | 0.209 | 0.687 | 0.493 | 0.634 | |
| pre_ltx vs Grouppost_ltx | 0.372 | 0.148 | 2.514 | 0.013 | 0.049 | * |
| pre_ltx , post_ltx - CZ vs NO | 0.150 | 0.164 | 0.915 | 0.361 | 0.542 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.502 | 0.207 | -2.423 | 0.016 | 0.049 | * |
| healthy vs Grouppost_ltx | -0.134 | 0.111 | -1.200 | 0.231 | 0.416 | |
| healthy , post_ltx - CZ vs NO | 0.007 | 0.136 | 0.048 | 0.962 | 0.963 | |
| healthy vs Grouppost_ltx:CountryNO | -0.358 | 0.176 | -2.031 | 0.044 | 0.098 |
knitr::kable(pc_simpson[[segment]],
digits = 3,
caption = "Results of linear model testing Simpson index")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.046 | 0.019 | -2.481 | 0.014 | 0.129 | |
| healthy , pre_ltx - CZ vs NO | -0.008 | 0.017 | -0.466 | 0.642 | 0.692 | |
| healthy vs Grouppre_ltx:CountryNO | 0.027 | 0.024 | 1.092 | 0.277 | 0.498 | |
| pre_ltx vs Grouppost_ltx | 0.037 | 0.021 | 1.763 | 0.079 | 0.238 | |
| pre_ltx , post_ltx - CZ vs NO | 0.019 | 0.023 | 0.811 | 0.418 | 0.627 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.060 | 0.029 | -2.033 | 0.043 | 0.195 | |
| healthy vs Grouppost_ltx | -0.009 | 0.016 | -0.572 | 0.568 | 0.692 | |
| healthy , post_ltx - CZ vs NO | -0.008 | 0.020 | -0.396 | 0.692 | 0.692 | |
| healthy vs Grouppost_ltx:CountryNO | -0.033 | 0.025 | -1.303 | 0.194 | 0.436 |
knitr::kable(pc_pielou[[segment]],
digits = 3,
caption = "Results of linear model testing Pielou index")| Estimate | Std..Error | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.040 | 0.024 | -1.708 | 0.090 | 0.285 | |
| healthy , pre_ltx - CZ vs NO | -0.008 | 0.021 | -0.394 | 0.694 | 0.943 | |
| healthy vs Grouppre_ltx:CountryNO | 0.010 | 0.031 | 0.332 | 0.740 | 0.943 | |
| pre_ltx vs Grouppost_ltx | 0.042 | 0.022 | 1.922 | 0.056 | 0.285 | |
| pre_ltx , post_ltx - CZ vs NO | 0.002 | 0.024 | 0.082 | 0.935 | 0.943 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.051 | 0.030 | -1.677 | 0.095 | 0.285 | |
| healthy vs Grouppost_ltx | 0.001 | 0.017 | 0.072 | 0.943 | 0.943 | |
| healthy , post_ltx - CZ vs NO | -0.008 | 0.020 | -0.407 | 0.684 | 0.943 | |
| healthy vs Grouppost_ltx:CountryNO | -0.041 | 0.027 | -1.525 | 0.129 | 0.290 |
Plots
alpha_div_plots[[paste(segment,"Country")]]2.6.0.2 Beta diversity
Main analysis
knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
digits = 3,
caption = "Results of PERMANOVA - robust aitchison distance")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 695.306 | 4.259 | 0.029 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 309.768 | 1.811 | 0.009 | 0.003 | 0.003 | ** |
| post_ltx vs healthy | 1 | 914.409 | 5.454 | 0.025 | 0.001 | 0.002 | ** |
| pre_ltx vs healthy , Country | 1 | 482.074 | 2.953 | 0.020 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 699.075 | 4.087 | 0.019 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 728.020 | 4.342 | 0.020 | 0.001 | 0.001 | *** |
| pre_ltx vs healthy : Country | 1 | 237.598 | 1.460 | 0.010 | 0.010 | 0.030 | * |
| pre_ltx vs post_ltx : Country | 1 | 237.265 | 1.390 | 0.007 | 0.021 | 0.032 | * |
| post_ltx vs healthy : Country | 1 | 202.888 | 1.211 | 0.006 | 0.106 | 0.106 |
PCA
pca_plots_list[[paste(segment,"genus custom")]]2.6.0.3 Univariate analysis
Number of significant taxa
knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
as.data.frame(lapply(psc_effect,nrow))) %>% t() %>%
`colnames<-`("Count") %>%
`rownames<-`(c(names(list_intersections),"PSC effect Genus")),caption="Number of significant taxa")| Count | |
|---|---|
| terminal_ileum genus healthy vs pre_ltx | 25 |
| terminal_ileum genus pre_ltx vs post_ltx | 4 |
| terminal_ileum genus healthy vs post_ltx | 38 |
| PSC effect Genus | 15 |
2.6.0.4 Machine learning
Main models
knitr::kable(models_summ_df_ileum %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
digits=2,caption="Elastic net results")| alpha | lambda | auc_optimism_corrected | auc_optimism_corrected_CIL | auc_optimism_corrected_CIU | |
|---|---|---|---|---|---|
| pre_ltx vs healthy ASV terminal_ileum | 0.4 | 0.03 | 0.96 | 0.89 | 1.00 |
| pre_ltx vs post_ltx ASV terminal_ileum | 0.2 | 0.13 | 0.86 | 0.79 | 0.93 |
| post_ltx vs healthy ASV terminal_ileum | 0.2 | 0.05 | 0.96 | 0.90 | 0.99 |
| pre_ltx vs healthy genus terminal_ileum | 0.2 | 0.04 | 0.97 | 0.92 | 1.00 |
| pre_ltx vs post_ltx genus terminal_ileum | 0.0 | 0.67 | 0.83 | 0.75 | 0.92 |
| post_ltx vs healthy genus terminal_ileum | 0.0 | 0.07 | 0.97 | 0.92 | 0.99 |
ROC - Genus level
p <- roc_curve_all_custom(roc_cs[c(4:6)],Q="Q1",
model_name="enet_model")
p3 Data Analysis - Colon
segment="colon"3.1 Filtering
Rules: - sequencing depth > 10000 - nearZeroVar() with default settings
Rarefaction Curve
path="../../../intermediate_files/rarecurves"
seq_depth_threshold <- 10000ps <- construct_phyloseq(colon_asv_tab,colon_taxa_tab,colon_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_colon.Rdata"))load(file.path(path,"rarefaction_colon.Rdata"))
seq_depth_threshold <- 10000
prare <- ggrarecurve(obj=rareres,
factorNames="Country",
indexNames=c("Observe")) +
theme_bw()+
theme(axis.text=element_text(size=8),
panel.grid=element_blank(),
strip.background = element_rect(colour=NA,fill="grey"),
strip.text.x = element_text(face="bold")) +
geom_vline(xintercept = seq_depth_threshold,
linetype="dashed", color = "red") +
xlim(0, 20000)The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)
prareLibrary size
read_counts(colon_asv_tab, line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
3.1.1 Sequencing depth
data_filt <- seq_depth_filtering(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
seq_depth_threshold = 10000)Removing 75 ASV(s)
filt_colon_asv_tab <- data_filt[[1]]; alpha_colon_asv_tab <- filt_colon_asv_tab
filt_colon_taxa_tab <- data_filt[[2]]; alpha_colon_taxa_tab <- filt_colon_taxa_tab
filt_colon_metadata <- data_filt[[3]]; alpha_colon_metadata <- filt_colon_metadata
seq_step <- dim(filt_colon_asv_tab)[1]Library size
read_counts(filt_colon_asv_tab,line = c(10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
3.1.2 NearZeroVar
data_filt <- nearzerovar_filtering(filt_colon_asv_tab,
filt_colon_taxa_tab,
filt_colon_metadata)
filt_colon_asv_tab <- data_filt[[1]]
filt_colon_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_colon_asv_tab)[1]Library size
read_counts(filt_colon_asv_tab,line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Check zero depth
data_filt <- check_zero_depth(filt_colon_asv_tab,
filt_colon_taxa_tab,
filt_colon_metadata)
filt_colon_asv_tab <- data_filt[[1]];
filt_colon_taxa_tab <- data_filt[[2]];
filt_colon_metadata <- data_filt[[3]]; Library size
read_counts(filt_colon_asv_tab,line = c(5000,10000))Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
3.1.3 Final Counts
final_counts_filtering(colon_asv_tab,
filt_colon_asv_tab,
filt_colon_metadata,
seq_step, 0, nearzero_step) V1
Raw data: ASVs 6936
Raw data: Samples 789
Sequencing depth filt: ASVs 6861
Prevalence filt: ASVs 0
NearZeroVar filt: ASVs 381
Filt data: ASVs 381
Filt data: Samples 760
Filt data: Patients 354
Filt data: Patients.1 0
Filtered samples 29
Matrices Cecum;Rectum;CD;CA;SI
healthy 161
non-rPSC 0
rPSC 0
pre_ltx 250
post_ltx 349
ETOH 0
3.2 Alpha diversity
Calculating Richness, Shannon, Simpson, Pielou indexes on raw (unfiltered) rarefied data. Samples with sequencing depth < 10000 were excluded. Testing by linear mixed-effect model.
path = "../../../results/merged_sites/main_analysis/Q1/alpha_diversity"Calculation
# Construct MPSE object
alpha_colon_metadata$Sample <- alpha_colon_metadata$SampleID
colon_mpse <- as.MPSE(construct_phyloseq(alpha_colon_asv_tab,
alpha_colon_taxa_tab,
alpha_colon_metadata))
colon_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)
# Calculate alpha diversity - rarefied counts
colon_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)alpha_data <- data.frame(SampleID=colon_mpse$Sample.x,
Observe=colon_mpse$Observe,
Shannon=colon_mpse$Shannon,
Simpson=colon_mpse$Simpson,
Pielou=colon_mpse$Pielou,
Group=colon_mpse$Group,
Country=colon_mpse$Country,
Patient=colon_mpse$Patient)
write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
row.names = FALSE)3.2.1 Plots
p_boxplot_alpha <- alpha_diversity_countries(alpha_data)Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha
# see the results
p_boxplot_alpha3.2.2 Linear Model
path = "../../../results/merged_sites/main_analysis/Q1/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))Richness
results_model <- pairwise.lmer(
formula = "Observe ~ Group * Country + (1|Patient)",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_observe <- results_model[[1]]
results_model_observe_detailed <- results_model[[2]]
} else {
results_model_observe <- results_model
results_model_observe_detailed <- NA
}
# save the results
pc_observed[[segment]] <- results_model_observe# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -56.230 | 11.738 | 195.570 | -4.791 | 0.000 | 0.000 | *** |
| healthy vs pre_ltx - CZ vs NO | 14.193 | 10.341 | 194.538 | 1.372 | 0.171 | 0.256 | |
| healthy vs Grouppre_ltx:CountryNO | -7.489 | 15.169 | 194.358 | -0.494 | 0.622 | 0.622 | |
| pre_ltx vs Grouppost_ltx | 28.886 | 11.811 | 267.502 | 2.446 | 0.015 | 0.045 | * |
| pre_ltx vs post_ltx - CZ vs NO | 6.742 | 12.174 | 265.784 | 0.554 | 0.580 | 0.622 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -24.047 | 15.112 | 262.828 | -1.591 | 0.113 | 0.203 | |
| healthy vs Grouppost_ltx | -27.299 | 8.699 | 250.458 | -3.138 | 0.002 | 0.009 | ** |
| healthy vs post_ltx - CZ vs NO | 14.360 | 11.153 | 251.729 | 1.288 | 0.199 | 0.256 | |
| healthy vs Grouppost_ltx:CountryNO | -31.681 | 14.195 | 247.531 | -2.232 | 0.027 | 0.060 |
knitr::kable(results_model_observe_detailed,digits = 3,
caption = "Raw results of independent country analysis")
|
Shannon
results_model <- pairwise.lmer(
formula = "Shannon ~ Group * Country + (1|Patient)",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_shannon <- results_model[[1]]
results_model_shannon_detailed <- results_model[[2]]
} else {
results_model_shannon <- results_model
results_model_shannon_detailed <- NA
}
# save the results
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.585 | 0.143 | 197.588 | -4.084 | 0.000 | 0.001 | *** |
| healthy vs pre_ltx - CZ vs NO | 0.015 | 0.126 | 196.213 | 0.116 | 0.907 | 0.909 | |
| healthy vs Grouppre_ltx:CountryNO | 0.081 | 0.185 | 195.965 | 0.437 | 0.663 | 0.852 | |
| pre_ltx vs Grouppost_ltx | 0.364 | 0.158 | 265.003 | 2.307 | 0.022 | 0.056 | |
| pre_ltx vs post_ltx - CZ vs NO | 0.095 | 0.163 | 263.448 | 0.585 | 0.559 | 0.839 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.455 | 0.202 | 260.765 | -2.253 | 0.025 | 0.056 | |
| healthy vs Grouppost_ltx | -0.220 | 0.102 | 247.586 | -2.153 | 0.032 | 0.058 | |
| healthy vs post_ltx - CZ vs NO | 0.015 | 0.131 | 248.759 | 0.114 | 0.909 | 0.909 | |
| healthy vs Grouppost_ltx:CountryNO | -0.376 | 0.167 | 244.904 | -2.255 | 0.025 | 0.056 |
knitr::kable(results_model_shannon_detailed,digits = 3,
caption = "Raw results of independent country analysis")
|
|
Simpson
results_model <- pairwise.lmer(
formula = "Simpson ~ Group * Country + (1|Patient)",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_simpson <- results_model[[1]]
results_model_simpson_detailed <- results_model[[2]]
} else {
results_model_simpson <- results_model
results_model_simpson_detailed <- NA
}
# save the results
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.059 | 0.019 | 199.853 | -3.066 | 0.002 | 0.022 | * |
| healthy vs pre_ltx - CZ vs NO | -0.007 | 0.017 | 198.127 | -0.430 | 0.667 | 0.738 | |
| healthy vs Grouppre_ltx:CountryNO | 0.020 | 0.025 | 197.803 | 0.813 | 0.417 | 0.626 | |
| pre_ltx vs Grouppost_ltx | 0.031 | 0.026 | 263.530 | 1.213 | 0.226 | 0.455 | |
| pre_ltx vs post_ltx - CZ vs NO | 0.013 | 0.027 | 261.964 | 0.476 | 0.635 | 0.738 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.052 | 0.033 | 259.262 | -1.556 | 0.121 | 0.363 | |
| healthy vs Grouppost_ltx | -0.027 | 0.017 | 246.082 | -1.594 | 0.112 | 0.363 | |
| healthy vs post_ltx - CZ vs NO | -0.007 | 0.022 | 247.135 | -0.335 | 0.738 | 0.738 | |
| healthy vs Grouppost_ltx:CountryNO | -0.032 | 0.028 | 243.697 | -1.147 | 0.253 | 0.455 |
knitr::kable(results_model_simpson_detailed,digits = 3,
caption = "Raw results of independent country analysis")Table: Raw results of independent country analysis
Pielou
results_model <- pairwise.lmer(
formula = "Pielou ~ Group * Country + (1|Patient)",
factors=alpha_data$Group,
data=alpha_data)
# check interaction
if (!is.data.frame(results_model)){
results_model_pielou <- results_model[[1]]
results_model_pielou_detailed <- results_model[[2]]
} else {
results_model_pielou <- results_model
results_model_pielou_detailed <- NA
}
# save the results
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.057 | 0.020 | 201.130 | -2.828 | 0.005 | 0.046 | * |
| healthy vs pre_ltx - CZ vs NO | -0.008 | 0.018 | 199.017 | -0.446 | 0.656 | 0.781 | |
| healthy vs Grouppre_ltx:CountryNO | 0.015 | 0.026 | 198.602 | 0.565 | 0.573 | 0.781 | |
| pre_ltx vs Grouppost_ltx | 0.039 | 0.024 | 265.797 | 1.631 | 0.104 | 0.234 | |
| pre_ltx vs post_ltx - CZ vs NO | 0.007 | 0.025 | 263.735 | 0.274 | 0.785 | 0.785 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.062 | 0.031 | 260.213 | -2.015 | 0.045 | 0.202 | |
| healthy vs Grouppost_ltx | -0.018 | 0.016 | 247.076 | -1.130 | 0.260 | 0.467 | |
| healthy vs post_ltx - CZ vs NO | -0.008 | 0.020 | 248.451 | -0.394 | 0.694 | 0.781 | |
| healthy vs Grouppost_ltx:CountryNO | -0.047 | 0.026 | 243.881 | -1.828 | 0.069 | 0.206 |
knitr::kable(results_model_pielou_detailed,digits = 3,
caption = "Raw results of independent country analysis")
|
|
3.2.3 Saving results
alpha_list <- list(
Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
write.xlsx(alpha_list,
file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))3.3 Beta diversity
Calculating Aitchison distance (euclidean distance on clr-transformed data) at genus level (main analysis). In supplements, there are also bray-curtis and Jaccard distance at ASV and genus levels. Testing by PERMANOVA.
3.3.1 Main analysis
Genus level, Aitchison distance
level="genus"path = "../../../results/merged_sites/main_analysis/Q1/beta_diversity"Aggregation, filtering
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level=level,
names=TRUE)
filt_data <- filtering_steps(genus_data[[1]],
genus_data[[2]],
colon_metadata,
seq_depth_threshold=10000)Removing 10 ASV(s)
filt_colon_genus_tab <- filt_data[[1]]
filt_colon_genus_taxa <- filt_data[[2]]
filt_colon_genus_metadata <- filt_data[[3]]3.3.2 PERMANOVA
pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()
# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
patients = filt_colon_genus_metadata$Patient,
sim.method = "robust.aitchison", p.adjust.m="BH")
# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
interaction = TRUE,
patients = filt_colon_genus_metadata$Patient,
sim.method = "robust.aitchison", p.adjust.m="BH")
# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]
cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols;
# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <-rbind(pp_factor,pp_cov,pp_fac.cov)# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 1222.921 | 7.950 | 0.019 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 630.146 | 4.062 | 0.007 | 0.008 | 0.008 | ** |
| post_ltx vs healthy | 1 | 1736.169 | 11.014 | 0.021 | 0.001 | 0.002 | ** |
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy , Country | 1 | 738.576 | 4.801 | 0.011 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 1532.675 | 9.879 | 0.016 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 1728.746 | 10.966 | 0.021 | 0.001 | 0.001 | *** |
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy : Country | 1 | 318.719 | 2.077 | 0.005 | 0.325 | 0.488 | |
| pre_ltx vs post_ltx : Country | 1 | 322.405 | 2.082 | 0.003 | 0.524 | 0.524 | |
| post_ltx vs healthy : Country | 1 | 398.820 | 2.538 | 0.005 | 0.112 | 0.336 |
Interaction check
interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.1]
if (length(interaction_sig)>0){
for (i in 1:length(interaction_sig)){
group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
group2 <- unlist(strsplit(group2,split = " : "))[1]
result_list <- adonis_postanalysis(x=pairwise_df,
factors = filt_colon_genus_metadata$Group,
covariate = filt_colon_genus_metadata$Country,
group1 = group1,
group2 = group2,
patients = filt_colon_genus_metadata$Patient)
print(result_list)
}
}3.3.2.0.1 Plots
p <- pca_plot_custom(filt_colon_genus_tab,
filt_colon_genus_taxa,
filt_colon_genus_metadata,
show_boxplots = TRUE,
variable = "Group", size=2,
show_legend= FALSE)
# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p
# see the results
p3.3.2.1 Saving results
write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]],
file = file.path(path,
paste0("beta_diversity_results_", segment,".xlsx")))3.4 Univariate Analysis
Using two DAA tools - linDA and Maaslin2. Only the intersection of sets selected by each tool was shown to minimize false positives. Taxa with a significant interaction effect were excluded based on individual post-hoc analysis of the Czech and Norwegian cohorts. Only taxa with significant log fold change that showed the same change direction in both countries were retained.
Main analysis is performed at genus level. In supplementary analysis, it is performed at ASV and phylum levels.
3.4.1 Main analysis
level="genus"# needed paths
path = "../../../results/merged_sites/main_analysis/Q1/univariate_analysis"
path_maaslin=file.path("../../../intermediate_files/maaslin/Q1",level)
# variables
raw_linda_results_genus[[segment]] <- list()
linda_results_genus[[segment]] <- list()
# country and interaction problems
list_country_union <- list()
list_venns <- list()
# workbook for final df
wb <- createWorkbook()
# PSC effect
psc_effect <- list()3.4.1.1 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
colon_genus_asv_taxa_tab <- create_asv_taxa_table(colon_genus_tab,
colon_genus_taxa_tab)3.4.1.1.1 pre_ltx vs healthy
3.4.1.1.1.1 linDA
group <- c("healthy","pre_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group, usage="linDA")Removing 135 ASV(s)
Removing 10 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 411 samples and 148 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcano3.4.1.1.1.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano3.4.1.1.1.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn3.4.1.1.1.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)3.4.1.1.1.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "pre_ltx"
[1] "healthy"
[1] "pre_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.1.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])3.4.1.1.2.1 linDA
# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group, usage="linDA")Removing 17 ASV(s)
Removing 10 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 600 samples and 130 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcano3.4.1.1.2.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano3.4.1.1.2.3 Group - Intersection
intersection_results <- group_intersection(group,
list_intersections,
list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn3.4.1.1.2.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)3.4.1.1.2.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "pre_ltx"
[1] "post_ltx"
[1] "pre_ltx"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.1.1.3 post_ltx vs healthy
group <- c("healthy","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])3.4.1.1.3.1 linDA
# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,group,
usage="linDA")Removing 71 ASV(s)
Removing 5 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]
# fit the model
linda.obj <- linda(filt_colon_uni_data,
filt_colon_uni_metadata,
formula = '~ Group * Country + (1|Patient)')0 features are filtered!
The filtered data has 511 samples and 153 features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)
# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO")
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")
for (grp in c(group1,group2,group3)){
raw_linda_results_genus[[segment]][[grp]] <-
rawlinda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
linda_results_genus[[segment]][[grp]] <-
linda.df(linda.output,
grp,
filt_colon_uni_data,
filt_colon_uni_taxa)
}# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1,
taxa_table = filt_colon_uni_taxa) +
ggtitle(paste(group,collapse=" vs "))
volcano_2 <- volcano_plot_linda(linda.output, group2,
taxa_table = filt_colon_uni_taxa) +
ggtitle("NO vs CZ")
volcano_3 <- volcano_plot_linda(linda.output, group3,
taxa_table = filt_colon_uni_taxa) +
ggtitle("Interaction effect")
volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
# see the plot
volcano3.4.1.1.3.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) +
ggtitle(paste(group[1], "vs", group[2]))
volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") +
ggtitle("Country effect")
volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano3.4.1.1.3.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
linda.output, fit_data,
raw_linda_results_genus,
segment = segment,
level=level)
list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]
# show the results
venn3.4.1.1.3.4 Interaction effect
list_interaction_significant <- country_interaction(group,
linda.output,
list_intersections,
filt_colon_uni_data,
filt_colon_uni_metadata,
segment=segment,
level=level)
# see the result
## significant interaction effect
list_interaction_significant[[1]]NULL
## results for czech cohort
list_interaction_significant[[2]][1] NA
## results for norwegian cohort
list_interaction_significant[[3]][1] NA
Removing problematic taxa
list_intersections <- removing_interaction_problems(group,
list_interaction_significant,
list_intersections,
segment=segment,
level=level)3.4.1.1.3.5 Basic statistics
uni_df <- merge(basic_univariate_statistics(linda_data,group),
raw_linda_results_genus[[segment]][[group1]],
by="SeqID",all=TRUE)[1] "healthy"
[1] "post_ltx"
[1] "healthy"
[1] "post_ltx"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df
# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)3.4.1.1.4 Visualization
Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.
list_heatmap <- list_intersections[grep(paste(segment,level),
names(list_intersections),value=TRUE)]
p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_lindaDot heatmap
dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
uni_statistics$colon[grepl(level,names(uni_statistics$colon))],
colon_taxa_tab) + xlab("") + ylab("")min_clr -2.221209
max_clr 6.886638
min_log -3.771476
max_log 5.049706
dotheatmap_lindaHorizontal bar plot
p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))Using SeqID as id variables
p_prevalence_final <- ggarrange(p_prevalence,
ggplot() + theme_minimal(),
nrow = 2,heights = c(1,0.09))
p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),p_prevalence_final,ncol=2,widths = c(1,0.3))
pdot_heatmap_colon <- p3.4.1.1.5 PSC effect
To get only PSC associated taxa, we can intersect differentially abundant taxa from the previous analyses (see diagram below).
pre_LTx vs HC and Post_LTx vs HC intersection

A <- list_intersections[[paste(segment,level,"healthy vs pre_ltx")]]$SeqID
B <- list_intersections[[paste(segment,level,"healthy vs post_ltx")]]$SeqID
C <- intersect(A,B)
psc_effect[[paste(segment,level)]] <- list_intersections[[paste(segment,level, "healthy vs pre_ltx")]][list_intersections[[paste(segment,level, "healthy vs pre_ltx")]]$SeqID %in% C,]
# see the results
psc_effect[[paste(segment,level)]] SeqID
3 Rothia
15 Barnesiella
18 Butyricimonas
19 Odoribacter
27 Alistipes
28 Parabacteroides
39 Holdemania
57 Coprococcus
61 Fusicatenibacter
64 Hungatella
65 Lachnoclostridium
67 Lachnospiraceae_FCS020_group
89 Colidextribacter
91 Intestinimonas
93 Oscillibacter
109 Negativibacillus
122 Family_XIII_AD3011_group
133 Dialister
135 Veillonella
143 Enterococcus
146 Klebsiella
148 Pseudomonas
Taxonomy
3 k__Bacteria;p__Actinobacteriota;c__Actinobacteria;o__Micrococcales;f__Micrococcaceae;g__Rothia
15 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Barnesiellaceae;g__Barnesiella
18 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Butyricimonas
19 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Marinifilaceae;g__Odoribacter
27 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Rikenellaceae;g__Alistipes
28 k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides
39 k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Holdemania
57 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Coprococcus
61 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Fusicatenibacter
64 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Hungatella
65 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnoclostridium
67 k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_FCS020_group
89 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Colidextribacter
91 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Intestinimonas
93 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Oscillospiraceae;g__Oscillibacter
109 k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Ruminococcaceae;g__Negativibacillus
122 k__Bacteria;p__Firmicutes;c__Clostridia;o__Peptostreptococcales-Tissierellales;f__Anaerovoracaceae;g__Family_XIII_AD3011_group
133 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Dialister
135 k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonella
143 k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Enterococcaceae;g__Enterococcus
146 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Klebsiella
148 k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas
log2FoldChange p_value padj
3 1.842842 1.452081e-05 1.953709e-04
15 -3.590912 7.871469e-05 6.852808e-04
18 -3.002837 3.969499e-06 7.343572e-05
19 -3.406707 3.432387e-07 1.693311e-05
27 -3.110558 1.594988e-05 1.967152e-04
28 -3.568149 2.950689e-06 6.238599e-05
39 -1.174869 7.087889e-03 2.439553e-02
57 -3.357534 5.546126e-06 8.208266e-05
61 -1.917547 1.097200e-02 3.455012e-02
64 2.775798 1.422148e-04 1.107779e-03
65 -1.736481 1.812625e-03 8.383390e-03
67 -2.111851 1.228192e-03 6.268013e-03
89 -1.624469 8.895010e-03 2.991958e-02
91 -1.592037 6.335927e-03 2.404403e-02
93 -2.063990 9.829922e-04 5.538365e-03
109 -1.649295 1.442001e-02 4.268322e-02
122 -1.590837 1.010377e-03 5.538365e-03
133 2.144460 1.055678e-02 3.396528e-02
135 3.864870 1.310501e-07 9.697706e-06
143 5.049706 2.872348e-10 4.251075e-08
146 2.288824 1.471023e-03 7.257045e-03
148 1.318798 5.015789e-03 2.006316e-02
3.4.1.2 Saving results
# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
overwrite = TRUE)
# PSC effect
write.xlsx(psc_effect[[paste(segment,level)]],file.path(path,paste0("psc_effect_",segment,".xlsx")))
# SIGNIFICANT taxa
write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
`names<-`(gsub(segment, "", names(
list_intersections[grepl(segment,names(list_intersections))]))),
file.path(path,paste0("significant_taxa_",segment,".xlsx")))3.5 Machine learning
Binary classification was performed using four independent models: Elastic Net (ENET), RF (RF), Gradient Boosting (GBoost), K-nearest Neighbors (kNN). Training and validation were performed via bootstrapping (N=500) on clr-transformed data. Model performance metrics expressed as AUC were calculated based on an out-of-sample principle.
The supplementary analysis contains training and validating at clr-transformed data at ASV level and relative-abundance data at ASV and genus level.
path = "../../../results/merged_sites/main_analysis/Q1/models"3.5.1 ENET
model="enet"3.5.1.1 ASV level
level="ASV"3.5.1.1.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 2138 ASV(s)
Removing 70 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.200000000
lambda 0.007285712
auc 1.000000000
auc_czech 1.000000000
auc_no 1.000000000
auc_optimism_corrected 0.944482930
auc_optimism_corrected_CIL 0.879483337
auc_optimism_corrected_CIU 0.986503241
accuracy 1.000000000
accuracy_czech NaN
accuracy_no 1.000000000
accuracy_optimism_corrected 0.872590475
accuracy_optimism_corrected_CIL 0.773863933
accuracy_optimism_corrected_CIU 0.937692308
enet_model$conf_matrices$original
Predicted
True 0 1
0 161 0
1 0 250
$czech
Predicted
True 0 1
0 95 0
1 0 42
$no
Predicted
True 0 1
0 66 0
1 0 208
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c3.5.1.1.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group, usage="ml_clr",
patient = TRUE)Removing 1157 ASV(s)
Removing 52 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group", N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 1.000000000
lambda 0.004424186
auc 0.999667622
auc_czech 0.999656593
auc_no 0.999522640
auc_optimism_corrected 0.788320408
auc_optimism_corrected_CIL 0.715647773
auc_optimism_corrected_CIU 0.842743983
accuracy 0.993322204
accuracy_czech NaN
accuracy_no 0.994269341
accuracy_optimism_corrected 0.725877580
accuracy_optimism_corrected_CIL 0.657080387
accuracy_optimism_corrected_CIU 0.769266055
enet_model$conf_matrices$original
Predicted
True 0 1
0 347 2
1 2 248
$czech
Predicted
True 0 1
0 207 1
1 1 41
$no
Predicted
True 0 1
0 140 1
1 1 207
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c3.5.1.1.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 1096 ASV(s)
Removing 50 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=10,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.40000000
lambda 0.00203662
auc 1.00000000
auc_czech 1.00000000
auc_no 1.00000000
auc_optimism_corrected 0.96930614
auc_optimism_corrected_CIL 0.93705851
auc_optimism_corrected_CIU 0.98669590
accuracy 1.00000000
accuracy_czech NaN
accuracy_no 1.00000000
accuracy_optimism_corrected 0.91353052
accuracy_optimism_corrected_CIL 0.86451155
accuracy_optimism_corrected_CIU 0.95141794
enet_model$conf_matrices$original
Predicted
True 0 1
0 161 0
1 0 349
$czech
Predicted
True 0 1
0 95 0
1 0 208
$no
Predicted
True 0 1
0 66 0
1 0 141
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
roc_c3.5.1.2 Genus level
level="genus"Aggregate taxa
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level = level)
colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]3.5.1.2.1 pre_ltx vs healthy
group <- c("pre_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 135 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.00000000
lambda 0.04262835
auc 0.99955280
auc_czech 0.99974937
auc_no 0.99927156
auc_optimism_corrected 0.93420491
auc_optimism_corrected_CIL 0.88169929
auc_optimism_corrected_CIU 0.97185321
accuracy 0.98783455
accuracy_czech NaN
accuracy_no 0.98905109
accuracy_optimism_corrected 0.85834482
accuracy_optimism_corrected_CIL 0.79082492
accuracy_optimism_corrected_CIU 0.92284119
enet_model$conf_matrices$original
Predicted
True 0 1
0 159 2
1 3 247
$czech
Predicted
True 0 1
0 94 1
1 1 41
$no
Predicted
True 0 1
0 65 1
1 2 206
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c3.5.1.2.2 pre_ltx vs post_ltx
group <- c("pre_ltx","post_ltx")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 17 ASV(s)
Removing 10 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.00000000
lambda 0.05181695
auc 0.96996571
auc_czech 0.95887446
auc_no 0.96447081
auc_optimism_corrected 0.75746917
auc_optimism_corrected_CIL 0.67636751
auc_optimism_corrected_CIU 0.83310363
accuracy 0.91000000
accuracy_czech NaN
accuracy_no 0.90544413
accuracy_optimism_corrected 0.69549915
accuracy_optimism_corrected_CIL 0.61504214
accuracy_optimism_corrected_CIU 0.77137530
enet_model$conf_matrices$original
Predicted
True 0 1
0 325 25
1 29 221
$czech
Predicted
True 0 1
0 202 7
1 14 28
$no
Predicted
True 0 1
0 123 18
1 15 193
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c3.5.1.2.3 post_ltx vs healthy
group <- c("post_ltx","healthy")
comparison_name <- paste0(group[1], " vs ",group[2])model_name <- paste(comparison_name,level,segment)
# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
colon_genus_taxa_tab,
colon_metadata,
group,
usage="ml_clr",
patient = TRUE)Removing 71 ASV(s)
Removing 5 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
sample_method = "atypboot",
outcome="Group",
N=500,
clust_var="Patient",
reuse=TRUE,
file=model_name,
Q="Q1")
# ROC
roc_c <- roc_curve(enet_model, group)
# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)
# see the results
enet_model$model_summary %>% t() [,1]
alpha 0.600000000
lambda 0.002245931
auc 1.000000000
auc_czech 1.000000000
auc_no 1.000000000
auc_optimism_corrected 0.961311372
auc_optimism_corrected_CIL 0.923354885
auc_optimism_corrected_CIU 0.985361058
accuracy 1.000000000
accuracy_czech NaN
accuracy_no 1.000000000
accuracy_optimism_corrected 0.892756667
accuracy_optimism_corrected_CIL 0.838856178
accuracy_optimism_corrected_CIU 0.940785776
enet_model$conf_matrices$original
Predicted
True 0 1
0 161 0
1 0 350
$czech
Predicted
True 0 1
0 95 0
1 0 209
$no
Predicted
True 0 1
0 66 0
1 0 141
enet_model$plot`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
# p-value
print(paste("p_value:",mean(enet_model$valid_performances$auc_validation<0.5)*2))[1] "p_value: 0"
roc_c3.5.1.3 Saving results
models_summ_df_colon <- do.call(rbind,
models_summ[grep(segment,names(models_summ),value = TRUE)])
write.csv(models_summ_df_colon,file.path(path,paste0("elastic_net_",segment,".csv")))3.6 Results overview
3.6.0.1 Alpha diversity
knitr::kable(pc_observed[[segment]],
digits = 3,
caption = "Results of linear model testing ASV Richness")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -56.230 | 11.738 | 195.570 | -4.791 | 0.000 | 0.000 | *** |
| healthy vs pre_ltx - CZ vs NO | 14.193 | 10.341 | 194.538 | 1.372 | 0.171 | 0.256 | |
| healthy vs Grouppre_ltx:CountryNO | -7.489 | 15.169 | 194.358 | -0.494 | 0.622 | 0.622 | |
| pre_ltx vs Grouppost_ltx | 28.886 | 11.811 | 267.502 | 2.446 | 0.015 | 0.045 | * |
| pre_ltx vs post_ltx - CZ vs NO | 6.742 | 12.174 | 265.784 | 0.554 | 0.580 | 0.622 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -24.047 | 15.112 | 262.828 | -1.591 | 0.113 | 0.203 | |
| healthy vs Grouppost_ltx | -27.299 | 8.699 | 250.458 | -3.138 | 0.002 | 0.009 | ** |
| healthy vs post_ltx - CZ vs NO | 14.360 | 11.153 | 251.729 | 1.288 | 0.199 | 0.256 | |
| healthy vs Grouppost_ltx:CountryNO | -31.681 | 14.195 | 247.531 | -2.232 | 0.027 | 0.060 |
knitr::kable(pc_shannon[[segment]],
digits = 3,
caption = "Results of linear model testing Shannon index")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.585 | 0.143 | 197.588 | -4.084 | 0.000 | 0.001 | *** |
| healthy vs pre_ltx - CZ vs NO | 0.015 | 0.126 | 196.213 | 0.116 | 0.907 | 0.909 | |
| healthy vs Grouppre_ltx:CountryNO | 0.081 | 0.185 | 195.965 | 0.437 | 0.663 | 0.852 | |
| pre_ltx vs Grouppost_ltx | 0.364 | 0.158 | 265.003 | 2.307 | 0.022 | 0.056 | |
| pre_ltx vs post_ltx - CZ vs NO | 0.095 | 0.163 | 263.448 | 0.585 | 0.559 | 0.839 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.455 | 0.202 | 260.765 | -2.253 | 0.025 | 0.056 | |
| healthy vs Grouppost_ltx | -0.220 | 0.102 | 247.586 | -2.153 | 0.032 | 0.058 | |
| healthy vs post_ltx - CZ vs NO | 0.015 | 0.131 | 248.759 | 0.114 | 0.909 | 0.909 | |
| healthy vs Grouppost_ltx:CountryNO | -0.376 | 0.167 | 244.904 | -2.255 | 0.025 | 0.056 |
knitr::kable(pc_simpson[[segment]],
digits = 3,
caption = "Results of linear model testing Simpson index")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.059 | 0.019 | 199.853 | -3.066 | 0.002 | 0.022 | * |
| healthy vs pre_ltx - CZ vs NO | -0.007 | 0.017 | 198.127 | -0.430 | 0.667 | 0.738 | |
| healthy vs Grouppre_ltx:CountryNO | 0.020 | 0.025 | 197.803 | 0.813 | 0.417 | 0.626 | |
| pre_ltx vs Grouppost_ltx | 0.031 | 0.026 | 263.530 | 1.213 | 0.226 | 0.455 | |
| pre_ltx vs post_ltx - CZ vs NO | 0.013 | 0.027 | 261.964 | 0.476 | 0.635 | 0.738 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.052 | 0.033 | 259.262 | -1.556 | 0.121 | 0.363 | |
| healthy vs Grouppost_ltx | -0.027 | 0.017 | 246.082 | -1.594 | 0.112 | 0.363 | |
| healthy vs post_ltx - CZ vs NO | -0.007 | 0.022 | 247.135 | -0.335 | 0.738 | 0.738 | |
| healthy vs Grouppost_ltx:CountryNO | -0.032 | 0.028 | 243.697 | -1.147 | 0.253 | 0.455 |
knitr::kable(pc_pielou[[segment]],
digits = 3,
caption = "Results of linear model testing Pielou index")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| healthy vs Grouppre_ltx | -0.057 | 0.020 | 201.130 | -2.828 | 0.005 | 0.046 | * |
| healthy vs pre_ltx - CZ vs NO | -0.008 | 0.018 | 199.017 | -0.446 | 0.656 | 0.781 | |
| healthy vs Grouppre_ltx:CountryNO | 0.015 | 0.026 | 198.602 | 0.565 | 0.573 | 0.781 | |
| pre_ltx vs Grouppost_ltx | 0.039 | 0.024 | 265.797 | 1.631 | 0.104 | 0.234 | |
| pre_ltx vs post_ltx - CZ vs NO | 0.007 | 0.025 | 263.735 | 0.274 | 0.785 | 0.785 | |
| pre_ltx vs Grouppost_ltx:CountryNO | -0.062 | 0.031 | 260.213 | -2.015 | 0.045 | 0.202 | |
| healthy vs Grouppost_ltx | -0.018 | 0.016 | 247.076 | -1.130 | 0.260 | 0.467 | |
| healthy vs post_ltx - CZ vs NO | -0.008 | 0.020 | 248.451 | -0.394 | 0.694 | 0.781 | |
| healthy vs Grouppost_ltx:CountryNO | -0.047 | 0.026 | 243.881 | -1.828 | 0.069 | 0.206 |
Plots
alpha_div_plots[[paste(segment,"Country")]]3.6.0.2 Beta diversity
Main analysis
knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
digits = 3,
caption = "Results of PERMANOVA - robust aitchison distance")| pairs | Df | SumsOfSqs | F.Model | R2 | p.value | p.adjusted | sig |
|---|---|---|---|---|---|---|---|
| pre_ltx vs healthy | 1 | 1222.921 | 7.950 | 0.019 | 0.001 | 0.002 | ** |
| pre_ltx vs post_ltx | 1 | 630.146 | 4.062 | 0.007 | 0.008 | 0.008 | ** |
| post_ltx vs healthy | 1 | 1736.169 | 11.014 | 0.021 | 0.001 | 0.002 | ** |
| pre_ltx vs healthy , Country | 1 | 738.576 | 4.801 | 0.011 | 0.001 | 0.001 | *** |
| pre_ltx vs post_ltx , Country | 1 | 1532.675 | 9.879 | 0.016 | 0.001 | 0.001 | *** |
| post_ltx vs healthy , Country | 1 | 1728.746 | 10.966 | 0.021 | 0.001 | 0.001 | *** |
| pre_ltx vs healthy : Country | 1 | 318.719 | 2.077 | 0.005 | 0.325 | 0.488 | |
| pre_ltx vs post_ltx : Country | 1 | 322.405 | 2.082 | 0.003 | 0.524 | 0.524 | |
| post_ltx vs healthy : Country | 1 | 398.820 | 2.538 | 0.005 | 0.112 | 0.336 |
PCA
pca_plots_list[[paste(segment,"genus custom")]]3.6.0.3 Univariate analysis
Number of significant taxa
knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
as.data.frame(lapply(psc_effect,nrow))) %>% t() %>%
`colnames<-`("Count") %>%
`rownames<-`(c(names(list_intersections),"PSC effect Genus")),caption="Number of significant taxa")| Count | |
|---|---|
| terminal_ileum genus healthy vs pre_ltx | 25 |
| terminal_ileum genus pre_ltx vs post_ltx | 4 |
| terminal_ileum genus healthy vs post_ltx | 38 |
| colon genus healthy vs pre_ltx | 33 |
| colon genus pre_ltx vs post_ltx | 4 |
| colon genus healthy vs post_ltx | 41 |
| PSC effect Genus | 22 |
3.6.0.4 Machine learning
Main models
knitr::kable(models_summ_df_colon %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
digits=3,caption="Elastic net results")| alpha | lambda | auc_optimism_corrected | auc_optimism_corrected_CIL | auc_optimism_corrected_CIU | |
|---|---|---|---|---|---|
| pre_ltx vs healthy ASV colon | 0.2 | 0.007 | 0.944 | 0.879 | 0.987 |
| pre_ltx vs post_ltx ASV colon | 1.0 | 0.004 | 0.788 | 0.716 | 0.843 |
| post_ltx vs healthy ASV colon | 0.4 | 0.002 | 0.969 | 0.937 | 0.987 |
| pre_ltx vs healthy genus colon | 0.0 | 0.043 | 0.934 | 0.882 | 0.972 |
| pre_ltx vs post_ltx genus colon | 0.0 | 0.052 | 0.757 | 0.676 | 0.833 |
| post_ltx vs healthy genus colon | 0.6 | 0.002 | 0.961 | 0.923 | 0.985 |
ROC - Genus level
p <- roc_curve_all_custom(roc_cs[c(10:12)],Q="Q1",
model_name="enet_model")
p4 Paper-ready visualizations
Alpha diversity
p_A <- alpha_div_plots$`terminal_ileum Country` +
ggtitle("Terminal ileum")+
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15))
p_B <- alpha_div_plots$`colon Country` +
ggtitle("Colon") +
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15))
Q1_alpha <- ggarrange(p_A,ggplot() + theme_minimal(),p_B,nrow=1, ncol=3,widths = c(1,0.1,1))
Q1_alphaBeta diversity
pca_ti <- pca_plots_list$`terminal_ileum genus custom`
pca_colon <- pca_plots_list$`colon genus custom`
genus_Q1_beta <- ggarrange(pca_ti,
ggplot() + theme_minimal(),
pca_colon,ncol=3,
widths = c(1,0.1,1))
genus_Q1_betaAlpha + Beta diversity
alpha_beta <- ggarrange(Q1_alpha,genus_Q1_beta,
ncol = 1,nrow=2,labels = c("A","B"))
alpha_betaElastic net
Genus level
#models_to_plot <- c("knn_model","rf_model","gbm_model","enet_model")
#names(models_to_plot) <- c("kNN","RF","GBoost","ENET")
models_to_plot <- c("enet_model")
names(models_to_plot) <- c("ENET")
rocs_list <- list()
rocs_list[["enet_model"]] <- roc_cs
# ILEUM
plot_list_ileum <- list()
for (model_name in models_to_plot) {
plot_list_ileum[[model_name]] <-
roc_curve_all_custom(rocs_list[[model_name]][c(4:6)],
Q="Q1",
model_name=model_name,legend = FALSE) +
#ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) +
theme(plot.title = element_text(face = "bold",size = 10))
}
roc_curve_ti <- ggarrange(plotlist = plot_list_ileum)
# COLON
plot_list_colon <- list()
for (model_name in models_to_plot) {
plot_list_colon[[model_name]] <-
roc_curve_all_custom(rocs_list[[model_name]][c(10:12)],
Q="Q1",
model_name=model_name,legend = FALSE) +
#ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) +
theme(plot.title = element_text(face = "bold",size = 10))
}
roc_curve_colon <- ggarrange(plotlist = plot_list_colon)
roc_curve_plot <- ggarrange(roc_curve_ti,
ggplot() + theme_minimal(),
roc_curve_colon,
ncol=3, widths = c(1,0.1,1))
roc_curve_plot4.1 FIGURE 2
alpha_beta_elastic <- ggarrange(Q1_alpha,genus_Q1_beta,roc_curve_plot,
ncol = 1,nrow=3,labels = LETTERS,heights = c(1,1,0.8))
alpha_beta_elastic <- ggarrange(alpha_beta_elastic,ggplot() + theme_minimal(),ncol=2,
widths = c(1,0.15))
alpha_beta_elastic4.2 FIGURE 3
p_ileum <- dot_heatmap_ileum +
ggtitle("Terminal ileum") +
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
legend.position = "none")
p_colon <- dot_heatmap_colon +
ggtitle("Colon") +
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
legend.position = "none")
heatmap_plot <- ggarrange(p_ileum,
p_colon,
ncol = 2,labels=c("A","B"))
heatmap_plot 5 Session info
sessionInfo()R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 26100)
Matrix products: default
locale:
[1] LC_COLLATE=Czech_Czechia.utf8 LC_CTYPE=Czech_Czechia.utf8
[3] LC_MONETARY=Czech_Czechia.utf8 LC_NUMERIC=C
[5] LC_TIME=Czech_Czechia.utf8
time zone: Europe/Prague
tzcode source: internal
attached base packages:
[1] parallel stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] mice_3.17.0 picante_1.8.2 ape_5.8-1
[4] lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[7] tidyverse_2.0.0 kableExtra_1.4.0 tidyr_1.3.1
[10] gbm_2.2.2 doParallel_1.0.17 iterators_1.0.14
[13] foreach_1.5.2 ranger_0.17.0 ggvenn_0.1.15
[16] Maaslin2_1.16.0 purrr_1.0.2 pROC_1.18.5
[19] glmnet_4.1-8 MicrobiomeStat_1.2 caret_6.0-94
[22] openxlsx_4.2.8 magrittr_2.0.3 emmeans_1.11.2-8
[25] lmerTest_3.1-3 robustlmm_3.3-1 lme4_1.1-37
[28] Matrix_1.6-5 mgcv_1.9-1 nlme_3.1-164
[31] pheatmap_1.0.13 reshape2_1.4.4 vegan_2.6-10
[34] lattice_0.22-6 permute_0.9-8 ggplotify_0.1.2
[37] ggrepel_0.9.6 ggpubr_0.6.1 MicrobiotaProcess_1.14.1
[40] phyloseq_1.46.0 ggplot2_3.5.2 tibble_3.2.1
[43] dplyr_1.1.4 cowplot_1.2.0 readr_2.1.5
[46] igraph_2.1.4 ccrepe_1.38.1 data.table_1.15.4
loaded via a namespace (and not attached):
[1] fs_1.6.5 matrixStats_1.5.0
[3] bitops_1.0-9 RColorBrewer_1.1-3
[5] numDeriv_2016.8-1.1 tools_4.3.1
[7] backports_1.4.1 R6_2.6.1
[9] lazyeval_0.2.2 jomo_2.7-6
[11] rhdf5filters_1.14.1 withr_3.0.2
[13] gridExtra_2.3 cli_3.6.2
[15] Biobase_2.62.0 logging_0.10-108
[17] biglm_0.9-3 sandwich_3.1-1
[19] labeling_0.4.3 mvtnorm_1.3-3
[21] robustbase_0.99-4-1 pbapply_1.7-4
[23] systemfonts_1.2.2 yulab.utils_0.2.1
[25] svglite_2.1.3 parallelly_1.38.0
[27] rstudioapi_0.17.1 generics_0.1.4
[29] gridGraphics_0.5-1 shape_1.4.6.1
[31] car_3.1-3 zip_2.3.2
[33] biomformat_1.30.0 S4Vectors_0.40.2
[35] abind_1.4-8 infotheo_1.2.0.1
[37] lifecycle_1.0.4 multcomp_1.4-28
[39] yaml_2.3.10 carData_3.0-5
[41] SummarizedExperiment_1.32.0 Rtsne_0.17
[43] rhdf5_2.46.1 recipes_1.3.1
[45] SparseArray_1.2.4 grid_4.3.1
[47] mitml_0.4-5 crayon_1.5.3
[49] pillar_1.11.0 knitr_1.50
[51] optparse_1.7.5 GenomicRanges_1.54.1
[53] statip_0.2.3 boot_1.3-32
[55] estimability_1.5.1 future.apply_1.11.3
[57] codetools_0.2-20 pan_1.9
[59] glue_1.7.0 ggfun_0.2.0
[61] vctrs_0.6.5 treeio_1.26.0
[63] Rdpack_2.6.4 gtable_0.3.6
[65] maditr_0.8.6 gower_1.0.1
[67] xfun_0.52 rbibutils_2.3
[69] S4Arrays_1.2.1 prodlim_2024.06.25
[71] libcoin_1.0-10 coda_0.19-4.1
[73] reformulas_0.4.1 pcaPP_2.0-5
[75] modeest_2.4.0 survival_3.5-8
[77] timeDate_4041.110 hardhat_1.4.2
[79] lava_1.8.1 statmod_1.5.0
[81] TH.data_1.1-4 ipred_0.9-15
[83] ggtree_3.10.1 GenomeInfoDb_1.38.8
[85] fBasics_4041.97 rpart_4.1.23
[87] colorspace_2.1-1 BiocGenerics_0.48.1
[89] DBI_1.2.3 nnet_7.3-19
[91] ade4_1.7-23 tidyselect_1.2.1
[93] timeSeries_4041.111 compiler_4.3.1
[95] microbiome_1.24.0 xml2_1.3.8
[97] DelayedArray_0.28.0 scales_1.4.0
[99] DEoptimR_1.1-4 spatial_7.3-18
[101] rappdirs_0.3.3 digest_0.6.35
[103] minqa_1.2.8 rmarkdown_2.29
[105] XVector_0.42.0 htmltools_0.5.8.1
[107] pkgconfig_2.0.3 MatrixGenerics_1.14.0
[109] stabledist_0.7-2 fastmap_1.2.0
[111] rlang_1.1.3 htmlwidgets_1.6.4
[113] farver_2.1.2 zoo_1.8-13
[115] jsonlite_2.0.0 ModelMetrics_1.2.2.2
[117] RCurl_1.98-1.17 modeltools_0.2-24
[119] Formula_1.2-5 GenomeInfoDbData_1.2.11
[121] patchwork_1.3.2 Rhdf5lib_1.24.2
[123] Rcpp_1.0.12 ggnewscale_0.5.2
[125] fastGHQuad_1.0.1 stringi_1.8.3
[127] ggstar_1.0.4 stable_1.1.6
[129] zlibbioc_1.48.2 MASS_7.3-60.0.1
[131] plyr_1.8.9 listenv_0.9.1
[133] Biostrings_2.70.3 splines_4.3.1
[135] hash_2.2.6.3 multtest_2.58.0
[137] hms_1.1.3 ggtreeExtra_1.12.0
[139] ggsignif_0.6.4 stats4_4.3.1
[141] rmutil_1.1.10 evaluate_1.0.5
[143] nloptr_2.2.1 tzdb_0.5.0
[145] getopt_1.20.4 future_1.34.0
[147] clue_0.3-66 coin_1.4-3
[149] broom_1.0.9 xtable_1.8-4
[151] tidytree_0.4.6 rstatix_0.7.2
[153] viridisLite_0.4.2 class_7.3-22
[155] aplot_0.2.8 IRanges_2.36.0
[157] cluster_2.1.8.1 timechange_0.3.0
[159] globals_0.18.0