Geography-independent mucosal microbiota alterations in primary sclerosing cholangitis persist after liver transplantation

2. Hypothesis: PSC recurrence is associated with a specific composition of the gut microbiome


1 Introduction

1.1 About

rPSC vs non-rPSC vs Healthy analysis on merged data

  • Alpha diversity –> group effect, country effect, interaction effect

  • Beta diversity – PERMANOVA, PCA -> group effect, country effect, interaction effect

  • DAA ->

    • Group effect – linDA + MaAsLin2 intersection

    • Country effect – taxa with significant interaction effect were excluded based on individual post-hoc analysis. Taxa had to have the same direction in both countries

  • ML models, trained and validated using out-of-sample bootstrap (500 resamplings):

  • o ENET

  • o kNN

  • o GBoost

  • o RF


Pairwise comparisons:

rPSC vs non-rPSC –by this comparison, we will find the effect of recurrence (whether the microbial composition is associated with it)

non-rPSC vs Healthy – by this comparison, we will find if the non-rPSC samples are „closer“ to the healthy samples then rPSC samples in terms of the microbial composition. In other words, does non-rPSC samples have „healthy microbiome“?

rPSC vs Healthy – by this comparison, we will find how much are the rPSC samples different from healthy samples.

However, the rPSC and non-rPSC samples are not different at all (see results). Therefore, we can do EXCLUSIVE LEFT JOIN of differentially abundant taxa from the analyses (rPSC vs Healthy, non-rPSC vs Healthy)

Importing libraries and custom functions built for this analysis

source("custom_functions.R")

1.2 Data Import

Importing ASV, taxa and metadata tables for both Czech and Norway samples.

Czech

path = "../../../../data/analysis_ready_data/ikem/"
asv_tab_ikem <- as.data.frame(fread(file.path(path,"asv_table_ikem.csv"),
                                    check.names = FALSE))
taxa_tab_ikem <- as.data.frame(fread(file.path(path,"taxa_table_ikem.csv"),
                                     check.names = FALSE))
metadata_ikem <- as.data.frame(fread(file.path(path,"metadata_ikem.csv"),
                                     check.names = FALSE))

Norway

path = "../../../../data/analysis_ready_data/norway/"
asv_tab_norway <- as.data.frame(fread(file.path(path,"asv_table_norway.csv"),
                                    check.names = FALSE))
taxa_tab_norway <- as.data.frame(fread(file.path(path,"taxa_table_norway.csv"),
                                    check.names = FALSE))
metadata_norway <- as.data.frame(fread(file.path(path,"metadata_norway.csv"),
                                    check.names = FALSE))

1.3 Merging data

Merging two countries to create whole dataset

asv_tab <- merge(asv_tab_ikem,asv_tab_norway,by="SeqID",all=TRUE)
taxa_tab <- merging_taxa_tables(taxa_tab_ikem,taxa_tab_norway)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment

Merging two countries based on the different matrices - Ileum, Colon.

Terminal ileum

ileum_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="TI",Q="Q2")
Removing 1498 ASV(s)
Removing 1834 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
Removing 641 ASV(s)
ileum_asv_tab <- ileum_data[[1]]
ileum_taxa_tab <- ileum_data[[2]]
ileum_metadata <- ileum_data[[3]]

Colon

colon_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="colon",Q="Q2")
Removing 739 ASV(s)
Removing 266 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
Removing 1096 ASV(s)
colon_asv_tab <- colon_data[[1]]
colon_taxa_tab <- colon_data[[2]]
colon_metadata <- colon_data[[3]]

2 Data Analysis - Terminal ileum

segment="terminal_ileum"

2.1 Filtering

Rules:

  • sequencing depth > 10000

  • nearZeroVar() with default settings

Rarefaction Curve

path="../../../intermediate_files/rarecurves"
seq_depth_threshold <- 10000
ps <- construct_phyloseq(ileum_asv_tab,ileum_taxa_tab,ileum_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_ileum.Rdata"))
load(file.path(path,"rarefaction_ileum.Rdata"))

prare <- ggrarecurve(obj=rareres,
                      factorNames="Country",
                      indexNames=c("Observe")) + 
  theme_bw() +
  theme(axis.text=element_text(size=8), panel.grid=element_blank(),
        strip.background = element_rect(colour=NA,fill="grey"),
        strip.text.x = element_text(face="bold")) + 
  geom_vline(xintercept = seq_depth_threshold, 
             linetype="dashed", 
             color = "red") + 
  xlim(0, 20000)
The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)
prare

Library size

read_counts(ileum_asv_tab, line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.1 Sequencing depth

data_filt <- seq_depth_filtering(ileum_asv_tab,
                                 ileum_taxa_tab,
                                 ileum_metadata,
                                 seq_depth_threshold = 10000)
Removing 104 ASV(s)
filt_ileum_asv_tab <- data_filt[[1]]; alpha_ileum_asv_tab <- filt_ileum_asv_tab
filt_ileum_taxa_tab <- data_filt[[2]]; alpha_ileum_taxa_tab <- filt_ileum_taxa_tab
filt_ileum_metadata <- data_filt[[3]]; alpha_ileum_metadata <- filt_ileum_metadata

seq_step <- dim(filt_ileum_asv_tab)[1]

Library size

read_counts(filt_ileum_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.2 NearZeroVar

data_filt <- nearzerovar_filtering(filt_ileum_asv_tab, 
                                   filt_ileum_taxa_tab,
                                   filt_ileum_metadata)

filt_ileum_asv_tab <- data_filt[[1]]
filt_ileum_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_ileum_asv_tab)[1]

Library size

read_counts(filt_ileum_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.3 Final Counts

final_counts_filtering(ileum_asv_tab,
                       filt_ileum_asv_tab,
                       filt_ileum_metadata,
                       seq_step, 0, nearzero_step) %>% `colnames<-`("Count")
                            Count
Raw data: ASVs               4491
Raw data: Samples             220
Sequencing depth filt: ASVs  4387
Prevalence filt: ASVs           0
NearZeroVar filt: ASVs        575
Filt data: ASVs               575
Filt data: Samples            211
Filt data: Patients           211
Filt data: Patients.1           0
Filtered samples                9
Matrices                       TI
healthy                        73
non-rPSC                      105
rPSC                           33
pre_ltx                         0
post_ltx                        0
ETOH                            0

2.2 Alpha diversity

path = "../../../results/merged_sites/main_analysis/Q2/alpha_diversity"

Calculation

# Construct MPSE object
alpha_ileum_metadata$Sample <- alpha_ileum_metadata$SampleID
ileum_mpse <- as.MPSE(construct_phyloseq(alpha_ileum_asv_tab,
                                         alpha_ileum_taxa_tab,
                                         alpha_ileum_metadata))

ileum_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)

# Calculate alpha diversity - rarefied counts
ileum_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)
alpha_div_plots <- list()

# preparing data frame
alpha_data <- data.frame(SampleID=ileum_mpse$Sample.x,
                         Observe=ileum_mpse$Observe,
                         Shannon=ileum_mpse$Shannon,
                         Simpson=ileum_mpse$Simpson,
                         Pielou=ileum_mpse$Pielou,
                         Group=ileum_mpse$Group,
                         Country=ileum_mpse$Country,
                         Patient=ileum_mpse$Patient)

write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
          row.names = FALSE)

2.2.1 Plots

p_boxplot_alpha <- alpha_diversity_countries(alpha_data)
Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha

# see the results
p_boxplot_alpha

2.2.2 Linear models

path = "../../../results/merged_sites/main_analysis/Q2/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))

Richness

results_model <- pairwise.lm(formula = "Observe ~ Group * Country",
                             factors=alpha_data$Group,
                             data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_observe <- results_model[[1]]
  results_model_observe_emeans <- results_model[[2]]
} else {
  results_model_observe <- results_model
  results_model_observe_emeans <- NA
}

# save the results
pc_observed <- list(); 
pc_observed[[segment]] <- results_model_observe
# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")
Raw results of linear model of richness estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -12.199 11.972 -1.019 0.310 0.349
non-rPSC , rPSC - CZ vs NO -21.486 11.452 -1.876 0.063 0.141
non-rPSC vs GrouprPSC:CountryNO 9.347 23.088 0.405 0.686 0.686
healthy vs GrouprPSC -37.741 12.284 -3.072 0.003 0.025 *
healthy , rPSC - CZ vs NO 11.203 10.973 1.021 0.310 0.349
healthy vs GrouprPSC:CountryNO -23.342 21.355 -1.093 0.277 0.349
healthy vs Groupnon-rPSC -25.542 9.322 -2.740 0.007 0.031 *
healthy , non-rPSC - CZ vs NO 11.203 10.933 1.025 0.307 0.349
healthy vs Groupnon-rPSC:CountryNO -32.689 15.108 -2.164 0.032 0.096
knitr::kable(results_model_observe_emeans,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
contrast Country estimate SE df t.ratio p.value
(non-rPSC) - healthy CZ -25.542 9.322 174 -2.740 0.007
(non-rPSC) - healthy NO -58.231 11.889 174 -4.898 0.000

Shannon

results_model <- pairwise.lm(formula = "Shannon ~ Group * Country",
                             factors=alpha_data$Group,
                             data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_shannon <- results_model[[1]]
  results_model_shannon_emeans <- results_model[[2]]
} else {
  results_model_shannon <- results_model
  results_model_shannon_emeans <- NA
}

# save the results
pc_shannon <- list(); 
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)
# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")
Raw results of linear model of Shannon estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.102 0.146 -0.696 0.487 0.731
non-rPSC , rPSC - CZ vs NO -0.349 0.140 -2.500 0.014 0.123
non-rPSC vs GrouprPSC:CountryNO -0.006 0.281 -0.021 0.983 0.983
healthy vs GrouprPSC -0.210 0.140 -1.501 0.137 0.321
healthy , rPSC - CZ vs NO 0.005 0.125 0.043 0.966 0.983
healthy vs GrouprPSC:CountryNO -0.360 0.244 -1.478 0.142 0.321
healthy vs Groupnon-rPSC -0.109 0.115 -0.948 0.344 0.620
healthy , non-rPSC - CZ vs NO 0.005 0.135 0.040 0.968 0.983
healthy vs Groupnon-rPSC:CountryNO -0.354 0.186 -1.905 0.058 0.263
knitr::kable(results_model_shannon_emeans,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
contrast Country estimate SE df t.ratio p.value
(non-rPSC) - healthy CZ -0.109 0.115 174 -0.948 0.344
(non-rPSC) - healthy NO -0.463 0.146 174 -3.164 0.002

Simpson

results_model <- pairwise.lm(formula = "Simpson ~ Group * Country",
                                     factors=alpha_data$Group,
                                     data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_simpson <- results_model[[1]]
  results_model_simpson_emeans <- results_model[[2]]
} else {
  results_model_simpson <- results_model
  results_model_simpson_emeans <- NA
}

# save the results
pc_simpson <- list(); 
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)
# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")
Raw results of linear model of Simpson estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC 0.000 0.023 0.019 0.985 0.985
non-rPSC , rPSC - CZ vs NO -0.034 0.022 -1.573 0.118 0.531
non-rPSC vs GrouprPSC:CountryNO -0.029 0.043 -0.661 0.510 0.767
healthy vs GrouprPSC -0.009 0.018 -0.500 0.618 0.767
healthy , rPSC - CZ vs NO -0.008 0.016 -0.501 0.617 0.767
healthy vs GrouprPSC:CountryNO -0.055 0.031 -1.779 0.078 0.531
healthy vs Groupnon-rPSC -0.009 0.016 -0.564 0.574 0.767
healthy , non-rPSC - CZ vs NO -0.008 0.019 -0.410 0.682 0.767
healthy vs Groupnon-rPSC:CountryNO -0.026 0.027 -0.975 0.331 0.767
knitr::kable(results_model_simpson_emeans,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
contrast Country estimate SE df t.ratio p.value
rPSC - healthy CZ -0.009 0.018 102 -0.500 0.618
rPSC - healthy NO -0.064 0.025 102 -2.526 0.013

Pielou

results_model <- pairwise.lm(formula = "Pielou ~ Group * Country",
                                     factors=alpha_data$Group,
                                     data=alpha_data)

# check interaction

if (!is.data.frame(results_model)){
  results_model_pielou <- results_model[[1]]
  results_model_pielou_emeans <- results_model[[2]]
} else {
  results_model_pielou <- results_model
  results_model_pielou_emeans <- NA
}

# save the results
pc_pielou <- list(); 
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)
# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")
Raw results of linear model of Pielou estimation.
Estimate Std..Error t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.004 0.022 -0.184 0.854 0.946
non-rPSC , rPSC - CZ vs NO -0.047 0.021 -2.260 0.025 0.229
non-rPSC vs GrouprPSC:CountryNO -0.005 0.042 -0.108 0.914 0.946
healthy vs GrouprPSC -0.001 0.021 -0.067 0.946 0.946
healthy , rPSC - CZ vs NO -0.009 0.018 -0.463 0.645 0.946
healthy vs GrouprPSC:CountryNO -0.043 0.036 -1.195 0.235 0.705
healthy vs Groupnon-rPSC 0.003 0.018 0.146 0.884 0.946
healthy , non-rPSC - CZ vs NO -0.009 0.021 -0.409 0.683 0.946
healthy vs Groupnon-rPSC:CountryNO -0.038 0.029 -1.329 0.185 0.705
knitr::kable(results_model_pielou_emeans,digits = 3,
caption = "Raw results of independent country analysis")

Table: Raw results of independent country analysis

2.2.3 Saving results

alpha_list <- list(
  Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
  Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
  Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
  Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
                   
write.xlsx(alpha_list, 
           file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))

2.3 Beta diversity

Calculating Aitchison distance (euclidean distance on clr-transformed data), both at ASV and genus level.

2.3.1 Main analysis

Genus level, Aitchison distance

level="genus"
path = "../../../results/merged_sites/main_analysis/Q2/beta_diversity"

pairwise_aitchison_raw <- list()
pca_plots_list <- list()

Aggregation, filtering

# Aggregation
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)
# Filtration
filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             ileum_metadata,
                             seq_depth_threshold=10000)
Removing 2 ASV(s)
filt_ileum_genus_tab <- filt_data[[1]]
filt_ileum_genus_taxa <- filt_data[[2]]
filt_ileum_metadata <- filt_data[[3]]
2.3.1.0.1 PERMANOVA
pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_ileum_metadata$Group,
                           covariate = filt_ileum_metadata$Country, sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "robust.aitchison", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 215.579 1.208 0.009 0.107 0.107
rPSC vs healthy 1 561.696 3.345 0.030 0.001 0.002 **
non-rPSC vs healthy 1 760.242 4.431 0.024 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC , Country 1 601.929 3.372 0.024 0.001 0.001 ***
rPSC vs healthy , Country 1 496.841 2.959 0.027 0.001 0.001 ***
non-rPSC vs healthy , Country 1 624.474 3.640 0.020 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC : Country 1 156.176 0.874 0.006 0.788 0.788
rPSC vs healthy : Country 1 164.743 0.981 0.009 0.464 0.696
non-rPSC vs healthy : Country 1 209.954 1.225 0.007 0.089 0.267

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
  for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2)
  print(result_list)
}
}
2.3.1.0.2 Plots
p <- pca_plot_custom(filt_ileum_genus_tab,
                                 filt_ileum_genus_taxa,
                                 filt_ileum_metadata,
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend=FALSE)

# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p

# see the results
p

2.3.1.1 Saving results

write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]], 
           file = file.path(path,
           paste0("beta_diversity_results_", segment,".xlsx")))

2.4 Univariate Analysis

2.4.1 Main analysis

level="genus"

# needed paths
path = "../../../results/merged_sites/main_analysis/Q2/univariate_analysis"
path_maaslin=file.path("../../../intermediate_files/maaslin/Q2",level)

# variables
raw_linda_results_genus <- list();
raw_linda_results_genus[[segment]] <- list()
linda_results_genus <- list(); 
linda_results_genus[[segment]] <- list()

# country and interaction problems
list_country_union <- list()
list_intersections <- list()
list_venns <- list()
uni_statistics <- list()

# workbook for final df
wb <- createWorkbook()

# rPSC effect
rpsc_effect <- list()

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

ileum_genus_asv_taxa_tab <- create_asv_taxa_table(ileum_genus_tab,
                                                  ileum_genus_taxa_tab)

2.4.1.1 rPSC vs non-rPSC

group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
2.4.1.1.1 linDA
taxa_to_test <- read.xlsx("../../../results/merged_sites/main_analysis/Q1/univariate_analysis/significant_taxa_terminal_ileum.xlsx", sheet=" genus healthy vs pre_ltx")$SeqID

ileum_genus_tab_2 <- ileum_genus_tab %>% dplyr::filter(SeqID %in% taxa_to_test)
ileum_genus_taxa_tab_2 <- ileum_genus_taxa_tab %>% dplyr::filter(SeqID %in% taxa_to_test)

# prepare the data
linda_data <- binomial_prep(ileum_genus_tab_2,
                            ileum_genus_taxa_tab_2,
                            ileum_metadata,
                            group, usage="linDA", filtering = FALSE)

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  56  samples and  25  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

2.4.1.1.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

volcano

2.4.1.1.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.1.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
2.4.1.1.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "non-rPSC"
[1] "rPSC"
[1] "non-rPSC"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
2.4.1.1.6 Prevalence testing
uni_df$prevalence_diff <- abs(uni_df$`PREVALENCE_ rPSC` - uni_df$`PREVALENCE_ non-rPSC`)

#uni_df %<>% dplyr::filter(prevalence_diff >=0.2)
#filt_colon_uni_data <- filt_colon_uni_data[uni_df$SeqID,]
filt_ileum_uni_data <- ifelse(filt_ileum_uni_data>0,"YES","NO")
filt_ileum_uni_data %<>% t() %>% as.data.frame()
#filt_colon_uni_taxa %<>% dplyr::filter(SeqID %in% uni_df$SeqID)

filt_ileum_uni_data <- merge(filt_ileum_uni_data %>% rownames_to_column("SampleID"),filt_ileum_metadata,by="SampleID",all.x=TRUE) %>% column_to_rownames("SampleID")

p_values <- c()
for (taxon in uni_df$SeqID){
data <- filt_ileum_uni_data %>%
  group_by(!!sym(taxon), Group) %>%
  count() %>%
  group_by(!!sym(taxon), Group) %>%
  summarise(n = sum(n), .groups = "drop") %>%
  pivot_wider(names_from = Group, values_from = n) %>%
  column_to_rownames(taxon) %>%
  as.matrix()

if (nrow(data)>1){
  data[is.na(data)] <- 0
chi_res <- chisq.test(data)
chi_res$expected
#p_values <- c(p_values,chi_res$p.value)

test <- fisher.test(data)
test$p.value
p_values <- c(p_values,test$p.value)
} else p_values <- c(p_values,NA)

}
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
p_values_adjusted <- p.adjust(p_values,method = "BH")
names(p_values_adjusted) <- uni_df$SeqID
names(p_values) <- uni_df$SeqID
p_values_adjusted[which(p_values_adjusted<0.05)]
named numeric(0)
asv_tab <- as.data.frame(rrarefy(
    linda_data[[1]] %>% t(), 
    sample=10000)) %>% t() %>% as.data.frame()

asv_tab <- ifelse(asv_tab>10,"YES","NO")
asv_tab %<>% t() %>% as.data.frame()

asv_tab <- merge(asv_tab %>% rownames_to_column("SampleID"),filt_ileum_metadata,by="SampleID",all.x=TRUE) %>% column_to_rownames("SampleID")

p_values <- c()
for (taxon in uni_df$SeqID){
data <- asv_tab %>%
  group_by(!!sym(taxon), Group) %>%
  count() %>%
  group_by(!!sym(taxon), Group) %>%
  summarise(n = sum(n), .groups = "drop") %>%
  pivot_wider(names_from = Group, values_from = n) %>%
  column_to_rownames(taxon) %>%
  as.matrix()

if (nrow(data)>1){
data[is.na(data)] <- 0
chi_res <- chisq.test(data)
chi_res$expected
test <- fisher.test(data)
test$p.value
chi_res$p.value
p_values <- c(p_values,test$p.value)
} else p_values <- c(p_values,NA)
}
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
p_values_adjusted <- p.adjust(p_values,method = "BH")
names(p_values_adjusted) <- uni_df$SeqID
names(p_values) <- uni_df$SeqID
p_values_adjusted[which(p_values_adjusted<0.05)]
named numeric(0)
df <- data.frame(SeqID=uni_df$SeqID,
                 p=p_values,
                 p_adj=p_values_adjusted)
prevalence_df <- uni_df[,c("SeqID","PREVALENCE_ non-rPSC","PREVALENCE_ rPSC")]
prevalence_df <- merge(prevalence_df,df,by="SeqID",all=TRUE)
write.xlsx(prevalence_df,file.path(path,paste0("supplements_prevalence_",segment,".xlsx")))

2.4.1.2 rPSC vs healthy

group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
2.4.1.2.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 67 ASV(s)
Removing 4 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  106  samples and  207  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
2.4.1.2.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

2.4.1.2.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.2.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
2.4.1.2.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "rPSC"
[1] "healthy"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

2.4.1.3 non-rPSC vs healthy

group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
2.4.1.3.1 linDA
# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")
Removing 24 ASV(s)
Removing 2 ASV(s)
filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')
0  features are filtered!
The filtered data has  178  samples and  179  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

2.4.1.3.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

2.4.1.3.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.3.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
2.4.1.3.5 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "non-rPSC"
[1] "healthy"
[1] "non-rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

2.4.1.4 Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(
  list_heatmap,                               uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
                                      ileum_taxa_tab) + xlab("") + ylab("")
min_clr -1.727234 
max_clr 7.029952 
min_log -4.215042 
max_log 3.331439 
dotheatmap_linda

Horizontal bar plot

p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))
Using SeqID as id variables
p_prevalence_final <- ggarrange(p_prevalence,
                                ggplot() + theme_minimal(),
                                nrow = 2,heights = c(1,0.085))

p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),p_prevalence_final,ncol=2,widths = c(1,0.3))

p

dot_heatmap_ileum <- p

2.4.1.5 rPSC effect

pre_LTx vs Healthy and Post_LTx vs Healthy intersection

A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]


rpsc_effect[[paste(segment,level)]] <- df
  
# see the results
rpsc_effect[[paste(segment,level)]] 
                            SeqID
6                     Collinsella
8                   Adlercreutzia
15                    Bacteroides
16                    Barnesiella
29                Parabacteroides
32                      Bilophila
41                     Holdemania
75   Lachnospiraceae_ND3007_group
76  Lachnospiraceae_NK4A136_group
84                     Sellimonas
99                 Butyricicoccus
134      Family_XIII_AD3011_group
155                   Pseudomonas
169            Lacticaseibacillus
188               Veillonellaceae
199                    Klebsiella
                                                                                                                          Taxonomy
6                        k__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;f__Coriobacteriaceae;g__Collinsella
8                        k__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;f__Eggerthellaceae;g__Adlercreutzia
15                                    k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides
16                                   k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Barnesiellaceae;g__Barnesiella
29                                k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides
32                   k__Bacteria;p__Desulfobacterota;c__Desulfovibrionia;o__Desulfovibrionales;f__Desulfovibrionaceae;g__Bilophila
41                                 k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Holdemania
75                    k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_ND3007_group
76                   k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_NK4A136_group
84                                      k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Sellimonas
99                               k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Butyricicoccaceae;g__Butyricicoccus
134 k__Bacteria;p__Firmicutes;c__Clostridia;o__Peptostreptococcales-Tissierellales;f__Anaerovoracaceae;g__Family_XIII_AD3011_group
155                     k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas
169                              k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lacticaseibacillus
188             k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonellaceae
199                   k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Klebsiella
    log2FoldChange      p_value         padj
6        -2.349468 1.092155e-02 0.0443286425
8        -1.638315 7.387504e-03 0.0332437679
15       -2.119974 3.932211e-04 0.0040698386
16       -4.215042 1.576685e-05 0.0004079671
29       -3.049283 2.871549e-05 0.0005403732
32       -2.837220 4.305534e-05 0.0007427046
41       -1.686545 1.351704e-03 0.0099929559
75       -2.216009 4.785012e-03 0.0260657213
76       -2.596492 1.644208e-04 0.0020100837
84        1.773614 1.044921e-02 0.0441425798
99       -1.751985 3.166657e-03 0.0204843095
134      -2.011614 1.298386e-04 0.0019186055
155       1.025212 8.916234e-03 0.0384512596
169       1.281555 1.747899e-04 0.0020100837
188       1.244317 1.665793e-04 0.0020100837
199       2.461632 4.570063e-03 0.0260223513

2.4.1.6 Saving results

# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
             overwrite = TRUE)

# PSC effect
write.xlsx(rpsc_effect[[paste(segment,level)]],file.path(path,paste0("rpsc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa

write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("significant_taxa_",segment,".xlsx")))

2.5 Machine learning

path = "../../../results/merged_sites/main_analysis/Q2/models"

2.5.1 ElasticNet

model="enet"

2.5.1.1 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]
2.5.1.1.1 rPSC vs non-rPSC - all
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ <- list()
models_cm <- list()
roc_cs <- list()
betas <- list()

models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.4000000
lambda                          0.3119645
auc                             0.6432900
auc_czech                       0.6567842
auc_no                          0.6069959
auc_optimism_corrected          0.5736934
auc_optimism_corrected_CIL      0.3905857
auc_optimism_corrected_CIU      0.7322617
accuracy                        0.7608696
accuracy_czech                        NaN
accuracy_no                     0.7500000
accuracy_optimism_corrected     0.7353858
accuracy_optimism_corrected_CIL 0.6288683
accuracy_optimism_corrected_CIU 0.8277572
enet_model$conf_matrices
$original
    0  
0 105 0
1  33 0

$czech
   0  
0 78 0
1 24 0

$no
   0  
0 27 0
1  9 0
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

2.5.1.1.2 rPSC vs non-rPSC - subset
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2]," subset")
taxa_to_test <- read.xlsx("../../../results/merged_sites/main_analysis/Q1/univariate_analysis/significant_taxa_terminal_ileum.xlsx", sheet=" genus healthy vs pre_ltx")

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                taxa_to_test,
                                                usage="ml_clr")
Removing 55 ASV(s)
Removing 2 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                     [,1]
alpha                           0.2000000
lambda                          0.5017248
auc                             0.5160173
auc_czech                       0.5224359
auc_no                          0.5000000
auc_optimism_corrected          0.5599297
auc_optimism_corrected_CIL      0.3863553
auc_optimism_corrected_CIU      0.7232201
accuracy                        0.7608696
accuracy_czech                        NaN
accuracy_no                     0.7500000
accuracy_optimism_corrected     0.7539422
accuracy_optimism_corrected_CIL 0.6525954
accuracy_optimism_corrected_CIU 0.8442308
enet_model$conf_matrices
$original
    0  
0 105 0
1  33 0

$czech
   0  
0 78 0
1 24 0

$no
   0  
0 27 0
1  9 0
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

2.5.1.2 Saving results

models_summ_df_ileum <- do.call(rbind, 
  models_summ[grep(segment,names(models_summ),value = TRUE)])

write.csv(models_summ_df_ileum,file.path(path,paste0("elastic_net_",segment,".csv")))

2.6 Results overview

2.6.0.1 Alpha diversity

Models

knitr::kable(pc_observed[[segment]],
             digits = 3,
             caption = "Results of linear model testing ASV Richness")
Results of linear model testing ASV Richness
Estimate Std..Error t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -12.199 11.972 -1.019 0.310 0.349
non-rPSC , rPSC - CZ vs NO -21.486 11.452 -1.876 0.063 0.141
non-rPSC vs GrouprPSC:CountryNO 9.347 23.088 0.405 0.686 0.686
healthy vs GrouprPSC -37.741 12.284 -3.072 0.003 0.025 *
healthy , rPSC - CZ vs NO 11.203 10.973 1.021 0.310 0.349
healthy vs GrouprPSC:CountryNO -23.342 21.355 -1.093 0.277 0.349
healthy vs Groupnon-rPSC -25.542 9.322 -2.740 0.007 0.031 *
healthy , non-rPSC - CZ vs NO 11.203 10.933 1.025 0.307 0.349
healthy vs Groupnon-rPSC:CountryNO -32.689 15.108 -2.164 0.032 0.096
knitr::kable(pc_shannon[[segment]],
             digits = 3,
             caption = "Results of linear model testing Shannon index")
Results of linear model testing Shannon index
Estimate Std..Error t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.102 0.146 -0.696 0.487 0.731
non-rPSC , rPSC - CZ vs NO -0.349 0.140 -2.500 0.014 0.123
non-rPSC vs GrouprPSC:CountryNO -0.006 0.281 -0.021 0.983 0.983
healthy vs GrouprPSC -0.210 0.140 -1.501 0.137 0.321
healthy , rPSC - CZ vs NO 0.005 0.125 0.043 0.966 0.983
healthy vs GrouprPSC:CountryNO -0.360 0.244 -1.478 0.142 0.321
healthy vs Groupnon-rPSC -0.109 0.115 -0.948 0.344 0.620
healthy , non-rPSC - CZ vs NO 0.005 0.135 0.040 0.968 0.983
healthy vs Groupnon-rPSC:CountryNO -0.354 0.186 -1.905 0.058 0.263
knitr::kable(pc_simpson[[segment]],
             digits = 3,
             caption = "Results of linear model testing Simpson index")
Results of linear model testing Simpson index
Estimate Std..Error t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC 0.000 0.023 0.019 0.985 0.985
non-rPSC , rPSC - CZ vs NO -0.034 0.022 -1.573 0.118 0.531
non-rPSC vs GrouprPSC:CountryNO -0.029 0.043 -0.661 0.510 0.767
healthy vs GrouprPSC -0.009 0.018 -0.500 0.618 0.767
healthy , rPSC - CZ vs NO -0.008 0.016 -0.501 0.617 0.767
healthy vs GrouprPSC:CountryNO -0.055 0.031 -1.779 0.078 0.531
healthy vs Groupnon-rPSC -0.009 0.016 -0.564 0.574 0.767
healthy , non-rPSC - CZ vs NO -0.008 0.019 -0.410 0.682 0.767
healthy vs Groupnon-rPSC:CountryNO -0.026 0.027 -0.975 0.331 0.767
knitr::kable(pc_pielou[[segment]],
             digits = 3,
             caption = "Results of linear model testing Pielou index")
Results of linear model testing Pielou index
Estimate Std..Error t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.004 0.022 -0.184 0.854 0.946
non-rPSC , rPSC - CZ vs NO -0.047 0.021 -2.260 0.025 0.229
non-rPSC vs GrouprPSC:CountryNO -0.005 0.042 -0.108 0.914 0.946
healthy vs GrouprPSC -0.001 0.021 -0.067 0.946 0.946
healthy , rPSC - CZ vs NO -0.009 0.018 -0.463 0.645 0.946
healthy vs GrouprPSC:CountryNO -0.043 0.036 -1.195 0.235 0.705
healthy vs Groupnon-rPSC 0.003 0.018 0.146 0.884 0.946
healthy , non-rPSC - CZ vs NO -0.009 0.021 -0.409 0.683 0.946
healthy vs Groupnon-rPSC:CountryNO -0.038 0.029 -1.329 0.185 0.705

Plots

alpha_div_plots[[paste(segment,"Country")]]

2.6.0.2 Beta diversity

Main results

knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
             digits = 3,
             caption = "Results of PERMANOVA - robust aitchison distance")
Results of PERMANOVA - robust aitchison distance
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 215.579 1.208 0.009 0.107 0.107
rPSC vs healthy 1 561.696 3.345 0.030 0.001 0.002 **
non-rPSC vs healthy 1 760.242 4.431 0.024 0.001 0.002 **
rPSC vs non-rPSC , Country 1 601.929 3.372 0.024 0.001 0.001 ***
rPSC vs healthy , Country 1 496.841 2.959 0.027 0.001 0.001 ***
non-rPSC vs healthy , Country 1 624.474 3.640 0.020 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 156.176 0.874 0.006 0.788 0.788
rPSC vs healthy : Country 1 164.743 0.981 0.009 0.464 0.696
non-rPSC vs healthy : Country 1 209.954 1.225 0.007 0.089 0.267

PCA

pca_plots_list[[paste(segment,"genus custom")]]

2.6.0.3 Univariate analysis

Number of significant taxa

knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
      as.data.frame(lapply(rpsc_effect,nrow))) %>% t() %>% 
  `colnames<-`("Count") %>% 
  `rownames<-`(c(names(list_intersections),"PSC effect Genus")),caption="Number of significant taxa")
Number of significant taxa
Count
terminal_ileum genus non-rPSC vs rPSC 0
terminal_ileum genus healthy vs rPSC 34
terminal_ileum genus healthy vs non-rPSC 25
PSC effect Genus 16

2.6.0.4 Machine learning

Main models

knitr::kable(models_summ_df_ileum %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
             digits=2,caption="Elastic net results")
Elastic net results
alpha lambda auc_optimism_corrected auc_optimism_corrected_CIL auc_optimism_corrected_CIU
rPSC vs non-rPSC genus terminal_ileum 0.4 0.31 0.57 0.39 0.73
rPSC vs non-rPSC subset genus terminal_ileum 0.2 0.50 0.56 0.39 0.72

ROC - Genus level

p <- roc_curve_all_custom(roc_cs[c(1:2)],Q="Q2",
                     model_name="enet_model",legend = FALSE)

p

3 Data Analysis - Colon

segment="colon"

3.1 Filtering

Rules: - sequencing depth > 10000

  • nearZeroVar() with default settings

Rarefaction Curve

path="../../../intermediate_files/rarecurves"
seq_depth_threshold <- 10000
ps <- construct_phyloseq(colon_asv_tab,colon_taxa_tab,colon_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_colon.Rdata"))
load(file.path(path,"rarefaction_colon.Rdata"))
seq_depth_threshold <- 10000
prare <- ggrarecurve(obj=rareres,
                      factorNames="Country",
                      indexNames=c("Observe")) + 
        theme_bw()+
        theme(axis.text=element_text(size=8), 
              panel.grid=element_blank(),
              strip.background = element_rect(colour=NA,fill="grey"),
              strip.text.x = element_text(face="bold")) + 
        geom_vline(xintercept = seq_depth_threshold, 
                   linetype="dashed", color = "red") + 
        xlim(0, 20000)
The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)
prare

Library size

read_counts(colon_asv_tab, line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.1 Sequencing depth

data_filt <- seq_depth_filtering(colon_asv_tab,
                                 colon_taxa_tab,
                                 colon_metadata,
                                 seq_depth_threshold = 10000)
Removing 50 ASV(s)
filt_colon_asv_tab <- data_filt[[1]]; alpha_colon_asv_tab <- filt_colon_asv_tab
filt_colon_taxa_tab <- data_filt[[2]]; alpha_colon_taxa_tab <- filt_colon_taxa_tab
filt_colon_metadata <- data_filt[[3]]; alpha_colon_metadata <- filt_colon_metadata

seq_step <- dim(filt_colon_asv_tab)[1]

Library size

read_counts(filt_colon_asv_tab,line = c(10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.2 NearZeroVar

data_filt <- nearzerovar_filtering(filt_colon_asv_tab,
                                   filt_colon_taxa_tab,
                                   filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]
filt_colon_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_colon_asv_tab)[1]

Library size

read_counts(filt_colon_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

Check zero depth

data_filt <- check_zero_depth(filt_colon_asv_tab, 
                              filt_colon_taxa_tab, 
                              filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]; 
filt_colon_taxa_tab <- data_filt[[2]]; 
filt_colon_metadata <- data_filt[[3]]; 

Library size

read_counts(filt_colon_asv_tab,line = c(5000,10000))
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.3 Final Counts

final_counts_filtering(colon_asv_tab,
                       filt_colon_asv_tab,
                       filt_colon_metadata,
                       seq_step, 0, nearzero_step)
                                               V1
Raw data: ASVs                               5840
Raw data: Samples                             527
Sequencing depth filt: ASVs                  5790
Prevalence filt: ASVs                           0
NearZeroVar filt: ASVs                        444
Filt data: ASVs                               444
Filt data: Samples                            511
Filt data: Patients                           250
Filt data: Patients.1                           0
Filtered samples                               16
Matrices                    Cecum;Rectum;CD;CA;SI
healthy                                       161
non-rPSC                                      263
rPSC                                           87
pre_ltx                                         0
post_ltx                                        0
ETOH                                            0

3.2 Alpha diversity

path = "../../../results/merged_sites/main_analysis/Q2/alpha_diversity"

Calculation

# Construct MPSE object
alpha_colon_metadata$Sample <- alpha_colon_metadata$SampleID
colon_mpse <- as.MPSE(construct_phyloseq(alpha_colon_asv_tab,
                                         alpha_colon_taxa_tab,
                                         alpha_colon_metadata))

colon_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)

# Calculate alpha diversity - rarefied counts
colon_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)
alpha_data <- data.frame(SampleID=colon_mpse$Sample.x,
                         Observe=colon_mpse$Observe,
                         Shannon=colon_mpse$Shannon,
                         Simpson=colon_mpse$Simpson,
                         Pielou=colon_mpse$Pielou,
                         Group=colon_mpse$Group,
                         Country=colon_mpse$Country,
                         Patient=colon_mpse$Patient)

write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
          row.names = FALSE)

3.2.1 Plots

p_boxplot_alpha <- alpha_diversity_countries(alpha_data)
Using SampleID, Group, Country, Patient as id variables
Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha

# see the results
p_boxplot_alpha

3.2.2 Linear Models

path = "../../../results/merged_sites/main_analysis/Q2/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))

Richness

results_model <- pairwise.lmer(
  formula = "Observe ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_observe <- results_model[[1]]
  results_model_observe_detailed <- results_model[[2]]
} else {
  results_model_observe <- results_model
  results_model_observe_detailed <- NA
}

# save the results
pc_observed[[segment]] <- results_model_observe
# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")
Raw results of linear model of richness estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -17.252 12.469 156.883 -1.384 0.168 0.192
non-rPSC vs rPSC - CZ vs NO -17.941 10.644 153.303 -1.686 0.094 0.192
non-rPSC vs GrouprPSC:CountryNO 3.483 21.371 152.108 0.163 0.871 0.871
healthy vs GrouprPSC -40.307 11.924 125.516 -3.380 0.001 0.009 **
healthy vs rPSC - CZ vs NO 14.783 10.544 126.613 1.402 0.163 0.192
healthy vs GrouprPSC:CountryNO -29.232 19.769 123.028 -1.479 0.142 0.192
healthy vs Groupnon-rPSC -22.976 8.905 211.354 -2.580 0.011 0.047 *
healthy vs non-rPSC - CZ vs NO 14.961 10.892 212.956 1.374 0.171 0.192
healthy vs Groupnon-rPSC:CountryNO -32.911 14.701 208.353 -2.239 0.026 0.079
knitr::kable(results_model_observe_detailed,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 171.439 6.808 137.314 25.182 0.000 NA
Groupnon-rPSC -22.949 8.722 135.888 -2.631 0.009 0.019
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_NO 186.365 8.682 75.472 21.467 0 NA
Groupnon-rPSC -55.850 12.120 73.619 -4.608 0 0
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 171.468 6.275 89.628 27.325 0.000 NA
CountryNO 14.904 9.836 89.468 1.515 0.133 0.133
Estimate Std. Error df t value Pr(>|t|) p.adjusted
non-rPSC_CZ 148.477 5.932 120.279 25.028 0.000 NA
CountryNO -17.950 10.519 117.183 -1.706 0.091 0.121

Shannon

results_model <- pairwise.lmer(
  formula = "Shannon ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_shannon <- results_model[[1]]
  results_model_shannon_detailed <- results_model[[2]]
} else {
  results_model_shannon <- results_model
  results_model_shannon_detailed <- NA
}

# save the results
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)
# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")
Raw results of linear model of Shannon estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.146 0.161 154.485 -0.907 0.366 0.549
non-rPSC vs rPSC - CZ vs NO -0.355 0.138 151.788 -2.581 0.011 0.051
non-rPSC vs GrouprPSC:CountryNO 0.008 0.277 150.856 0.030 0.976 0.976
healthy vs GrouprPSC -0.330 0.128 126.201 -2.572 0.011 0.051
healthy vs rPSC - CZ vs NO 0.017 0.113 127.217 0.146 0.884 0.976
healthy vs GrouprPSC:CountryNO -0.364 0.213 123.907 -1.709 0.090 0.162
healthy vs Groupnon-rPSC -0.182 0.101 207.051 -1.795 0.074 0.162
healthy vs non-rPSC - CZ vs NO 0.018 0.124 208.617 0.145 0.884 0.976
healthy vs Groupnon-rPSC:CountryNO -0.375 0.168 204.129 -2.240 0.026 0.078
knitr::kable(results_model_shannon_detailed,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 3.683 0.065 77.802 56.300 0.000 NA
GrouprPSC -0.330 0.116 76.679 -2.846 0.006 0.011
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_NO 3.700 0.099 48.693 37.40 0.000 NA
GrouprPSC -0.693 0.193 47.827 -3.59 0.001 0.003
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 3.683 0.054 89.061 68.505 0.000 NA
CountryNO 0.019 0.084 88.828 0.231 0.818 0.818
Estimate Std. Error df t value Pr(>|t|) p.adjusted
rPSC_CZ 3.354 0.153 36.206 21.939 0.000 NA
CountryNO -0.347 0.261 35.912 -1.332 0.191 0.255
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 3.684 0.081 133.503 45.582 0.000 NA
Groupnon-rPSC -0.184 0.104 132.547 -1.772 0.079 0.105
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_NO 3.702 0.093 77.069 39.853 0 NA
Groupnon-rPSC -0.559 0.129 74.201 -4.324 0 0
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 3.683 0.054 89.061 68.505 0.000 NA
CountryNO 0.019 0.084 88.828 0.231 0.818 0.818
Estimate Std. Error df t value Pr(>|t|) p.adjusted
non-rPSC_CZ 3.501 0.075 117.089 46.588 0.000 NA
CountryNO -0.357 0.133 114.586 -2.673 0.009 0.017

Simpson

results_model <- pairwise.lmer(
  formula = "Simpson ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_simpson <- results_model[[1]]
  results_model_simpson_detailed <- results_model[[2]]
} else {
  results_model_simpson <- results_model
  results_model_simpson_detailed <- NA
}

# save the results
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)
# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")
Raw results of linear model of Simpson estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.005 0.029 153.608 -0.169 0.866 0.866
non-rPSC vs rPSC - CZ vs NO -0.032 0.024 151.251 -1.296 0.197 0.443
non-rPSC vs GrouprPSC:CountryNO -0.027 0.049 150.427 -0.554 0.581 0.806
healthy vs GrouprPSC -0.031 0.020 127.660 -1.523 0.130 0.391
healthy vs rPSC - CZ vs NO -0.007 0.018 128.411 -0.402 0.688 0.806
healthy vs GrouprPSC:CountryNO -0.052 0.034 125.982 -1.540 0.126 0.391
healthy vs Groupnon-rPSC -0.026 0.016 201.924 -1.600 0.111 0.391
healthy vs non-rPSC - CZ vs NO -0.007 0.020 203.469 -0.363 0.717 0.806
healthy vs Groupnon-rPSC:CountryNO -0.025 0.026 199.046 -0.952 0.342 0.616
knitr::kable(results_model_simpson_detailed,digits = 3,
caption = "Raw results of independent country analysis")

Table: Raw results of independent country analysis

Pielou

results_model <- pairwise.lmer(
  formula = "Pielou ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_pielou <- results_model[[1]]
  results_model_pielou_detailed <- results_model[[2]]
} else {
  results_model_pielou <- results_model
  results_model_pielou_detailed <- NA
}

# save the results
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)
# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")
Raw results of linear model of Pielou estimation.
Estimate Std..Error df t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.007 0.026 153.863 -0.292 0.771 0.843
non-rPSC vs rPSC - CZ vs NO -0.052 0.022 150.735 -2.372 0.019 0.171
non-rPSC vs GrouprPSC:CountryNO -0.009 0.044 149.674 -0.198 0.843 0.843
healthy vs GrouprPSC -0.023 0.020 127.665 -1.183 0.239 0.538
healthy vs rPSC - CZ vs NO -0.008 0.017 129.013 -0.457 0.648 0.843
healthy vs GrouprPSC:CountryNO -0.052 0.033 124.565 -1.605 0.111 0.333
healthy vs Groupnon-rPSC -0.015 0.015 204.651 -1.006 0.316 0.568
healthy vs non-rPSC - CZ vs NO -0.008 0.019 206.473 -0.416 0.678 0.843
healthy vs Groupnon-rPSC:CountryNO -0.044 0.025 201.171 -1.745 0.083 0.333
knitr::kable(results_model_pielou_detailed,digits = 3,
caption = "Raw results of independent country analysis")
Raw results of independent country analysis
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 0.720 0.013 132.894 56.00 0.000 NA
Groupnon-rPSC -0.016 0.016 132.038 -0.97 0.334 0.445
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_NO 0.713 0.013 79.537 55.174 0.000 NA
Groupnon-rPSC -0.060 0.018 74.995 -3.379 0.001 0.005
Estimate Std. Error df t value Pr(>|t|) p.adjusted
healthy_CZ 0.720 0.007 89.185 99.650 0.000 NA
CountryNO -0.008 0.011 88.856 -0.667 0.506 0.506
Estimate Std. Error df t value Pr(>|t|) p.adjusted
non-rPSC_CZ 0.705 0.012 115.555 60.129 0.000 NA
CountryNO -0.052 0.021 112.732 -2.499 0.014 0.028

3.2.3 Saving results

alpha_list <- list(
  Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
  Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
  Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
  Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
                   
write.xlsx(alpha_list, 
           file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))

3.3 Beta diversity

Calculating Aitchison distance (euclidean distance on clr-transformed data), both at ASV and genus level.

3.3.1 Main analysis

Genus level, Aitchison distance

level="genus"

# needed path
path = "../../../results/merged_sites/main_analysis/Q2/beta_diversity"

Aggregation, filtering

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)

filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             colon_metadata,
                            seq_depth_threshold=10000)
Removing 5 ASV(s)
filt_colon_genus_tab <- filt_data[[1]]
filt_colon_genus_taxa <- filt_data[[2]]
filt_colon_genus_metadata <- filt_data[[3]]
3.3.1.0.1 PERMANOVA
pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
                           covariate = filt_colon_genus_metadata$Country, 
                           patients = filt_colon_genus_metadata$Patient,
                           sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
                          covariate = filt_colon_genus_metadata$Country, 
                          interaction = TRUE, 
                          patients = filt_colon_genus_metadata$Patient,
                          sim.method = "robust.aitchison", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <-rbind(pp_factor,pp_cov,pp_fac.cov)
# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")
PERMANOVA, GROUP separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 419.973 2.475 0.007 0.415 0.415
rPSC vs healthy 1 1473.966 8.988 0.035 0.001 0.002 **
non-rPSC vs healthy 1 1742.331 10.401 0.024 0.001 0.002 **
knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")
PERMANOVA, COUNTRY separation
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC , Country 1 1556.933 9.177 0.026 0.001 0.001 ***
rPSC vs healthy , Country 1 883.284 5.386 0.021 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1558.672 9.304 0.021 0.001 0.001 ***
knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")
PERMANOVA, INTERACTION GROUP:Country
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC : Country 1 318.175 1.880 0.005 0.970 0.970
rPSC vs healthy : Country 1 319.594 1.956 0.008 0.820 0.970
non-rPSC vs healthy : Country 1 407.191 2.439 0.006 0.163 0.489

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
 for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_genus_metadata$Group,
                      covariate = filt_colon_genus_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      patients = filt_colon_genus_metadata$Patient)
  print(result_list)
} 
}
3.3.1.0.2 Plots
p <- pca_plot_custom(filt_colon_genus_tab,
                                 filt_colon_genus_taxa,
                                 filt_colon_genus_metadata,
                                 show_boxplots = TRUE,
                                 variable = "Group", size=2, 
                                 show_legend=FALSE)

# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p

# see the results
p

3.3.1.1 Saving results

write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]], 
           file = file.path(path,
           paste0("beta_diversity_results_", segment,".xlsx")))

3.4 Univariate Analysis

3.4.1 Main - Genus level

level="genus"

# needed paths
path = "../../../results/merged_sites/main_analysis/Q2/univariate_analysis"
path_maaslin=file.path("../../../intermediate_files/maaslin/Q2",level)

# variables
raw_linda_results_genus[[segment]] <- list()
linda_results_genus[[segment]] <- list()

# country and interaction problems
list_country_union <- list()
list_venns <- list()

# workbook for final df
wb <- createWorkbook()

# PSC effect
rpsc_effect <- list()

3.4.1.1 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

colon_genus_asv_taxa_tab <- create_asv_taxa_table(colon_genus_tab,
                                                  colon_genus_taxa_tab)
3.4.1.1.1 rPSC vs non-rPSC
3.4.1.1.1.1 linDA
group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
taxa_to_test <- read.xlsx("../../../results/merged_sites/main_analysis/Q1/univariate_analysis/significant_taxa_colon.xlsx", sheet=" genus healthy vs pre_ltx")$SeqID

colon_genus_tab_2 <-  colon_genus_tab %>% dplyr::filter(SeqID %in% taxa_to_test)
colon_genus_taxa_tab_2 <-  colon_genus_taxa_tab %>% dplyr::filter(SeqID %in% taxa_to_test)

# prepare the data
linda_data <- binomial_prep(colon_genus_tab_2,
                            colon_genus_taxa_tab_2,
                            colon_metadata,
                            group, usage="linDA",filtering = FALSE)

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  231  samples and  33  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")


for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.1.1.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

volcano

3.4.1.1.1.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.1.1.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
3.4.1.1.2 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "non-rPSC"
[1] "rPSC"
[1] "non-rPSC"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.1.1.3 Prevalence testing
uni_df$prevalence_diff <- abs(uni_df$`PREVALENCE_ rPSC` - uni_df$`PREVALENCE_ non-rPSC`)

#uni_df %<>% dplyr::filter(prevalence_diff >=0.2)
#filt_colon_uni_data <- filt_colon_uni_data[uni_df$SeqID,]
filt_colon_uni_data <- ifelse(filt_colon_uni_data>0,"YES","NO")
filt_colon_uni_data %<>% t() %>% as.data.frame()
#filt_colon_uni_taxa %<>% dplyr::filter(SeqID %in% uni_df$SeqID)
filt_colon_uni_data <- merge(filt_colon_uni_data %>% rownames_to_column("SampleID"),filt_colon_metadata,by="SampleID",all.x=TRUE) %>% column_to_rownames("SampleID") %>% distinct(Patient, .keep_all = TRUE) 

p_values <- c()
for (taxon in uni_df$SeqID){
data <- filt_colon_uni_data %>%
  group_by(!!sym(taxon), Group) %>%
  count() %>%
  group_by(!!sym(taxon), Group) %>%
  summarise(n = sum(n), .groups = "drop") %>%
  pivot_wider(names_from = Group, values_from = n) %>%
  column_to_rownames(taxon) %>%
  as.matrix()

data
chi_res <- chisq.test(data)
chi_res$expected
#p_values <- c(p_values,chi_res$p.value)

test <- fisher.test(data)
test$p.value
p_values <- c(p_values,test$p.value)
}
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
p_values_adjusted <- p.adjust(p_values,method = "BH")
names(p_values_adjusted) <- uni_df$SeqID
names(p_values) <- uni_df$SeqID
p_values_adjusted[which(p_values_adjusted<0.05)]
 Bilophila 
0.01992028 
asv_tab <- as.data.frame(rrarefy(
    linda_data[[1]] %>% t(), 
    sample=10000)) %>% t() %>% as.data.frame()

asv_tab <- ifelse(asv_tab>10,"YES","NO")
asv_tab %<>% t() %>% as.data.frame()

set.seed(123)
asv_tab <- merge(asv_tab %>% rownames_to_column("SampleID"),filt_colon_metadata,by="SampleID",all.x=TRUE) %>% column_to_rownames("SampleID") %>% distinct(Patient, .keep_all = TRUE) 

p_values <- c()
for (taxon in uni_df$SeqID){
data <- asv_tab %>%
  group_by(!!sym(taxon), Group) %>%
  count() %>%
  group_by(!!sym(taxon), Group) %>%
  summarise(n = sum(n), .groups = "drop") %>%
  pivot_wider(names_from = Group, values_from = n) %>%
  column_to_rownames(taxon) %>%
  as.matrix()

data2 <- asv_tab %>%
  group_by(!!sym(taxon), Group,Country) %>%
  count() 


data[is.na(data)] <- 0
chi_res <- chisq.test(data)
chi_res$expected
test <- fisher.test(data)
test$p.value
chi_res$p.value
p_values <- c(p_values,test$p.value)
}
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
p_values_adjusted <- p.adjust(p_values,method = "BH")
names(p_values_adjusted) <- uni_df$SeqID
names(p_values) <- uni_df$SeqID
p_values_adjusted[which(p_values_adjusted<0.05)]
 Bilophila 
0.04557538 
df <- data.frame(SeqID=uni_df$SeqID,
                 p=p_values,
                 p_adj=p_values_adjusted)
prevalence_df <- uni_df[,c("SeqID","PREVALENCE_ non-rPSC","PREVALENCE_ rPSC")]
prevalence_df <- merge(prevalence_df,df,by="SeqID",all=TRUE)
write.xlsx(prevalence_df,file.path(path,paste0("supplements_prevalence_",segment,".xlsx")))

3.4.1.2 rPSC vs healthy

group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
3.4.1.2.1 linDA
# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")
Removing 98 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  248  samples and  178  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.2.1.1 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

3.4.1.2.1.2 Group - Intersection
intersection_results <- group_intersection(group, 
                                           list_intersections, 
                                           list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.2.1.3 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
3.4.1.2.2 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "rPSC"
[1] "healthy"
[1] "rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.1.2.3 non-rPSC vs healthy
group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
3.4.1.2.3.1 linDA
# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,group,
                            usage="linDA")
Removing 39 ASV(s)
Removing 6 ASV(s)
filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')
0  features are filtered!
The filtered data has  424  samples and  159  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.
linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}
# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.2.3.2 MaAsLin2
volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

3.4.1.2.3.3 Group - Intersection
intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.2.3.4 Interaction effect
list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]
NULL
## results for czech cohort
list_interaction_significant[[2]]
[1] NA
## results for norwegian cohort
list_interaction_significant[[3]]
[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)
3.4.1.2.4 Basic statistics
uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)
[1] "healthy"
[1] "non-rPSC"
[1] "healthy"
[1] "non-rPSC"
uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)
3.4.1.2.5 Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$colon[grepl(level,names(uni_statistics$colon))],
                                      colon_taxa_tab) + xlab("") + ylab("")
min_clr -2.080021 
max_clr 6.901023 
min_log -4.181218 
max_log 3.872081 
dotheatmap_linda

Horizontal bar plot

p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))
Using SeqID as id variables
p_prevalence_final <- ggarrange(p_prevalence,
                                ggplot() + theme_minimal(),
                                nrow = 2,heights = c(1,0.085))
p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),p_prevalence_final,ncol=2,widths = c(1,0.3))
p

dot_heatmap_colon <- p

3.4.1.3 Saving results

# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
             overwrite = TRUE)

# PSC effect
write.xlsx(rpsc_effect[[paste(segment,level)]],file.path(path,paste0("rpsc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa

write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("significant_taxa_",segment,".xlsx")))

3.5 Machine learning

path = "../../../results/merged_sites/main_analysis/Q2/models"

3.5.1 ElasticNet

model="enet"

3.5.1.1 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]
3.5.1.1.1 rPSC vs non-rPSC - all
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])
model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.00000000
lambda                          0.03550553
auc                             0.99707181
auc_czech                       0.99495342
auc_no                          1.00000000
auc_optimism_corrected          0.62839790
auc_optimism_corrected_CIL      0.49485088
auc_optimism_corrected_CIU      0.76009870
accuracy                        0.98571429
accuracy_czech                         NaN
accuracy_no                     0.98581560
accuracy_optimism_corrected     0.70988684
accuracy_optimism_corrected_CIL 0.61932014
accuracy_optimism_corrected_CIU 0.80479243
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 262   1
   1   4  83

$czech
    Predicted
True   0   1
   0 160   1
   1   2  46

$no
    Predicted
True   0   1
   0 102   0
   1   2  37
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

3.5.1.1.2 rPSC vs non-rPSC - subset
group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2]," subset")
taxa_to_test <- read.xlsx("../../../results/merged_sites/main_analysis/Q1/univariate_analysis/significant_taxa_colon.xlsx", sheet=" genus healthy vs pre_ltx")

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                taxa_to_test,
                                                usage="ml_clr",patient = TRUE)
Removing 28 ASV(s)
Removing 5 ASV(s)
# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              reuse=TRUE,clust_var = "Patient",
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()
                                      [,1]
alpha                           0.80000000
lambda                          0.03321672
auc                             0.83344259
auc_czech                       0.78312629
auc_no                          0.92383107
auc_optimism_corrected          0.65974941
auc_optimism_corrected_CIL      0.49897616
auc_optimism_corrected_CIU      0.79007241
accuracy                        0.79714286
accuracy_czech                         NaN
accuracy_no                     0.82269504
accuracy_optimism_corrected     0.73877807
accuracy_optimism_corrected_CIL 0.63177895
accuracy_optimism_corrected_CIU 0.82893728
enet_model$conf_matrices
$original
    Predicted
True   0   1
   0 258   5
   1  66  21

$czech
    Predicted
True   0   1
   0 156   5
   1  41   7

$no
    Predicted
True   0   1
   0 102   0
   1  25  14
enet_model$plot
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

roc_c

3.5.1.2 Saving results

models_summ_df_colon <- do.call(rbind, 
  models_summ[grep(segment,names(models_summ),value = TRUE)])

write.csv(models_summ_df_colon,file.path(path,paste0("elastic_net_",segment,".csv")))

3.6 Results overview

3.6.0.1 Alpha diversity

Linear models

knitr::kable(pc_observed[[segment]],
             digits = 3,
             caption = "Results of linear model testing ASV Richness")
Results of linear model testing ASV Richness
Estimate Std..Error df t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -17.252 12.469 156.883 -1.384 0.168 0.192
non-rPSC vs rPSC - CZ vs NO -17.941 10.644 153.303 -1.686 0.094 0.192
non-rPSC vs GrouprPSC:CountryNO 3.483 21.371 152.108 0.163 0.871 0.871
healthy vs GrouprPSC -40.307 11.924 125.516 -3.380 0.001 0.009 **
healthy vs rPSC - CZ vs NO 14.783 10.544 126.613 1.402 0.163 0.192
healthy vs GrouprPSC:CountryNO -29.232 19.769 123.028 -1.479 0.142 0.192
healthy vs Groupnon-rPSC -22.976 8.905 211.354 -2.580 0.011 0.047 *
healthy vs non-rPSC - CZ vs NO 14.961 10.892 212.956 1.374 0.171 0.192
healthy vs Groupnon-rPSC:CountryNO -32.911 14.701 208.353 -2.239 0.026 0.079
knitr::kable(pc_shannon[[segment]],
             digits = 3,
             caption = "Results of linear model testing Shannon index")
Results of linear model testing Shannon index
Estimate Std..Error df t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.146 0.161 154.485 -0.907 0.366 0.549
non-rPSC vs rPSC - CZ vs NO -0.355 0.138 151.788 -2.581 0.011 0.051
non-rPSC vs GrouprPSC:CountryNO 0.008 0.277 150.856 0.030 0.976 0.976
healthy vs GrouprPSC -0.330 0.128 126.201 -2.572 0.011 0.051
healthy vs rPSC - CZ vs NO 0.017 0.113 127.217 0.146 0.884 0.976
healthy vs GrouprPSC:CountryNO -0.364 0.213 123.907 -1.709 0.090 0.162
healthy vs Groupnon-rPSC -0.182 0.101 207.051 -1.795 0.074 0.162
healthy vs non-rPSC - CZ vs NO 0.018 0.124 208.617 0.145 0.884 0.976
healthy vs Groupnon-rPSC:CountryNO -0.375 0.168 204.129 -2.240 0.026 0.078
knitr::kable(pc_simpson[[segment]],
             digits = 3,
             caption = "Results of linear model testing Simpson index")
Results of linear model testing Simpson index
Estimate Std..Error df t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.005 0.029 153.608 -0.169 0.866 0.866
non-rPSC vs rPSC - CZ vs NO -0.032 0.024 151.251 -1.296 0.197 0.443
non-rPSC vs GrouprPSC:CountryNO -0.027 0.049 150.427 -0.554 0.581 0.806
healthy vs GrouprPSC -0.031 0.020 127.660 -1.523 0.130 0.391
healthy vs rPSC - CZ vs NO -0.007 0.018 128.411 -0.402 0.688 0.806
healthy vs GrouprPSC:CountryNO -0.052 0.034 125.982 -1.540 0.126 0.391
healthy vs Groupnon-rPSC -0.026 0.016 201.924 -1.600 0.111 0.391
healthy vs non-rPSC - CZ vs NO -0.007 0.020 203.469 -0.363 0.717 0.806
healthy vs Groupnon-rPSC:CountryNO -0.025 0.026 199.046 -0.952 0.342 0.616
knitr::kable(pc_pielou[[segment]],
             digits = 3,
             caption = "Results of linear model testing Pielou index")
Results of linear model testing Pielou index
Estimate Std..Error df t.value Pr…t.. p.adj sig
non-rPSC vs GrouprPSC -0.007 0.026 153.863 -0.292 0.771 0.843
non-rPSC vs rPSC - CZ vs NO -0.052 0.022 150.735 -2.372 0.019 0.171
non-rPSC vs GrouprPSC:CountryNO -0.009 0.044 149.674 -0.198 0.843 0.843
healthy vs GrouprPSC -0.023 0.020 127.665 -1.183 0.239 0.538
healthy vs rPSC - CZ vs NO -0.008 0.017 129.013 -0.457 0.648 0.843
healthy vs GrouprPSC:CountryNO -0.052 0.033 124.565 -1.605 0.111 0.333
healthy vs Groupnon-rPSC -0.015 0.015 204.651 -1.006 0.316 0.568
healthy vs non-rPSC - CZ vs NO -0.008 0.019 206.473 -0.416 0.678 0.843
healthy vs Groupnon-rPSC:CountryNO -0.044 0.025 201.171 -1.745 0.083 0.333

Plots

alpha_div_plots[[paste(segment,"Country")]]

3.6.0.2 Beta diversity

Main results

knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
             digits = 3,
             caption = "Results of PERMANOVA - robust aitchison distance")
Results of PERMANOVA - robust aitchison distance
pairs Df SumsOfSqs F.Model R2 p.value p.adjusted sig
rPSC vs non-rPSC 1 419.973 2.475 0.007 0.415 0.415
rPSC vs healthy 1 1473.966 8.988 0.035 0.001 0.002 **
non-rPSC vs healthy 1 1742.331 10.401 0.024 0.001 0.002 **
rPSC vs non-rPSC , Country 1 1556.933 9.177 0.026 0.001 0.001 ***
rPSC vs healthy , Country 1 883.284 5.386 0.021 0.001 0.001 ***
non-rPSC vs healthy , Country 1 1558.672 9.304 0.021 0.001 0.001 ***
rPSC vs non-rPSC : Country 1 318.175 1.880 0.005 0.970 0.970
rPSC vs healthy : Country 1 319.594 1.956 0.008 0.820 0.970
non-rPSC vs healthy : Country 1 407.191 2.439 0.006 0.163 0.489

PCA

pca_plots_list[[paste(segment,"genus custom")]]

3.6.0.3 Univariate analysis

Number of significant taxa

knitr::kable(as.data.frame(lapply(list_intersections,nrow)) %>% t() %>% 
  `colnames<-`("Count") )
Count
terminal_ileum.genus.non.rPSC.vs.rPSC 0
terminal_ileum.genus.healthy.vs.rPSC 34
terminal_ileum.genus.healthy.vs.non.rPSC 25
colon.genus.non.rPSC.vs.rPSC 0
colon.genus.healthy.vs.rPSC 56
colon.genus.healthy.vs.non.rPSC 30

3.6.0.4 Machine learning

Main models

knitr::kable(models_summ_df_colon %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
             digits=3,caption="Elastic net results")
Elastic net results
alpha lambda auc_optimism_corrected auc_optimism_corrected_CIL auc_optimism_corrected_CIU
rPSC vs non-rPSC genus colon 0.0 0.036 0.628 0.495 0.76
rPSC vs non-rPSC subset genus colon 0.8 0.033 0.660 0.499 0.79

ROC - Genus level

p <- roc_curve_all_custom(roc_cs[3:4],Q="Q2",
                     model_name="enet_model",legend=FALSE)

4 Paper-ready visualizations

Alpha diversity

p_A <- alpha_div_plots$`terminal_ileum Country` +
  ggtitle("Terminal ileum")+
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) 

p_B <-  alpha_div_plots$`colon Country` +
  ggtitle("Colon") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) 

Q2_alpha <- ggarrange(p_A,ggplot() + theme_minimal(),p_B,nrow=1, ncol=3,
                      widths = c(1,0.1,1))
Q2_alpha

Beta diversity

pca_ti <- pca_plots_list$`terminal_ileum genus custom` 
pca_colon <- pca_plots_list$`colon genus custom` 

genus_Q2_beta <- ggarrange(pca_ti,
                           ggplot() + theme_minimal(),
                           pca_colon,ncol=3,
                           widths = c(1,0.1,1))
genus_Q2_beta

Alpha + Beta diversity

alpha_beta <- ggarrange(Q2_alpha,genus_Q2_beta,
                        ncol = 1,nrow=2,labels = c("A","B"))
alpha_beta

Elastic net

models_to_plot <- c("knn_model","rf_model","gbm_model","enet_model")
names(models_to_plot) <- c("kNN","RF","GBoost","ENet")

models_to_plot <- c("enet_model")
rocs_list <- list()
rocs_list[["enet_model"]] <- roc_cs
#names(models_to_plot) <- c("kNN","RF","GBoost","ENet")

# ILEUM
plot_list_ileum <- list()
for (model_name in models_to_plot) {
  plot_list_ileum[[model_name]] <- 
    roc_curve_all_custom(rocs_list[[model_name]][c(1:2)],
                       Q="Q2",
                       model_name=model_name,legend = FALSE) + 
    #ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) + 
    theme(plot.title = element_text(face = "bold",size = 10)) 
}

roc_curve_ti <- ggarrange(plotlist = plot_list_ileum)

# COLON

plot_list_colon <- list()
for (model_name in models_to_plot) {
  plot_list_colon[[model_name]] <- 
    roc_curve_all_custom(rocs_list[[model_name]][c(3:4)],
                       Q="Q2",
                       model_name=model_name,legend = FALSE) + 
    #ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) + 
    theme(plot.title = element_text(face = "bold",size = 10))  
}

roc_curve_colon <- ggarrange(plotlist = plot_list_colon)


roc_curve_plot <- ggarrange(roc_curve_ti,
                            ggplot() + theme_minimal(),
                            roc_curve_colon,
                            ncol=3, widths = c(1,0.1,1))
roc_curve_plot

4.1 Figure 4

alpha_beta_elastic <- ggarrange(Q2_alpha,genus_Q2_beta,roc_curve_plot,
                        ncol = 1,nrow=3,labels = LETTERS,heights = c(1,1,0.8))
alpha_beta_elastic

alpha_beta_elastic <- ggarrange(alpha_beta_elastic,ggplot() + theme_minimal(),ncol=2,
                                widths = c(1,0.15))

alpha_beta_elastic

4.2 Supplementary Figure - DAA

p_ileum <- dot_heatmap_ileum + 
  ggtitle("Terminal ileum") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
        legend.position = "none")


p_colon <- dot_heatmap_colon  +
  ggtitle("Colon") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
        legend.position = "none")

heatmap_plot <- ggarrange(p_ileum,
                          p_colon,
                          ncol = 2,labels=c("A","B"))
heatmap_plot

5 Session info

sessionInfo()
R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=Czech_Czechia.utf8  LC_CTYPE=Czech_Czechia.utf8   
[3] LC_MONETARY=Czech_Czechia.utf8 LC_NUMERIC=C                  
[5] LC_TIME=Czech_Czechia.utf8    

time zone: Europe/Prague
tzcode source: internal

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] mice_3.17.0              picante_1.8.2            ape_5.8-1               
 [4] lubridate_1.9.3          forcats_1.0.0            stringr_1.5.1           
 [7] tidyverse_2.0.0          kableExtra_1.4.0         tidyr_1.3.1             
[10] gbm_2.2.2                doParallel_1.0.17        iterators_1.0.14        
[13] foreach_1.5.2            ranger_0.17.0            ggvenn_0.1.15           
[16] Maaslin2_1.16.0          purrr_1.0.2              pROC_1.18.5             
[19] glmnet_4.1-8             MicrobiomeStat_1.2       caret_6.0-94            
[22] openxlsx_4.2.8           magrittr_2.0.3           emmeans_1.11.2-8        
[25] lmerTest_3.1-3           robustlmm_3.3-1          lme4_1.1-37             
[28] Matrix_1.6-5             mgcv_1.9-1               nlme_3.1-164            
[31] pheatmap_1.0.13          reshape2_1.4.4           vegan_2.6-10            
[34] lattice_0.22-6           permute_0.9-8            ggplotify_0.1.2         
[37] ggrepel_0.9.6            ggpubr_0.6.1             MicrobiotaProcess_1.14.1
[40] phyloseq_1.46.0          ggplot2_3.5.2            tibble_3.2.1            
[43] dplyr_1.1.4              cowplot_1.2.0            readr_2.1.5             
[46] igraph_2.1.4             ccrepe_1.38.1            data.table_1.15.4       

loaded via a namespace (and not attached):
  [1] fs_1.6.5                    matrixStats_1.5.0          
  [3] bitops_1.0-9                RColorBrewer_1.1-3         
  [5] numDeriv_2016.8-1.1         tools_4.3.1                
  [7] backports_1.4.1             R6_2.6.1                   
  [9] lazyeval_0.2.2              jomo_2.7-6                 
 [11] rhdf5filters_1.14.1         withr_3.0.2                
 [13] gridExtra_2.3               cli_3.6.2                  
 [15] Biobase_2.62.0              logging_0.10-108           
 [17] biglm_0.9-3                 sandwich_3.1-1             
 [19] labeling_0.4.3              mvtnorm_1.3-3              
 [21] robustbase_0.99-4-1         pbapply_1.7-4              
 [23] systemfonts_1.2.2           yulab.utils_0.2.1          
 [25] svglite_2.1.3               parallelly_1.38.0          
 [27] rstudioapi_0.17.1           generics_0.1.4             
 [29] gridGraphics_0.5-1          shape_1.4.6.1              
 [31] car_3.1-3                   zip_2.3.2                  
 [33] biomformat_1.30.0           S4Vectors_0.40.2           
 [35] abind_1.4-8                 infotheo_1.2.0.1           
 [37] lifecycle_1.0.4             multcomp_1.4-28            
 [39] yaml_2.3.10                 carData_3.0-5              
 [41] SummarizedExperiment_1.32.0 Rtsne_0.17                 
 [43] rhdf5_2.46.1                recipes_1.3.1              
 [45] SparseArray_1.2.4           grid_4.3.1                 
 [47] mitml_0.4-5                 crayon_1.5.3               
 [49] pillar_1.11.0               knitr_1.50                 
 [51] optparse_1.7.5              GenomicRanges_1.54.1       
 [53] statip_0.2.3                boot_1.3-32                
 [55] estimability_1.5.1          future.apply_1.11.3        
 [57] codetools_0.2-20            pan_1.9                    
 [59] glue_1.7.0                  ggfun_0.2.0                
 [61] vctrs_0.6.5                 treeio_1.26.0              
 [63] Rdpack_2.6.4                gtable_0.3.6               
 [65] maditr_0.8.6                gower_1.0.1                
 [67] xfun_0.52                   rbibutils_2.3              
 [69] S4Arrays_1.2.1              prodlim_2024.06.25         
 [71] libcoin_1.0-10              coda_0.19-4.1              
 [73] reformulas_0.4.1            pcaPP_2.0-5                
 [75] modeest_2.4.0               survival_3.5-8             
 [77] timeDate_4041.110           hardhat_1.4.2              
 [79] lava_1.8.1                  statmod_1.5.0              
 [81] TH.data_1.1-4               ipred_0.9-15               
 [83] ggtree_3.10.1               GenomeInfoDb_1.38.8        
 [85] fBasics_4041.97             rpart_4.1.23               
 [87] colorspace_2.1-1            BiocGenerics_0.48.1        
 [89] DBI_1.2.3                   nnet_7.3-19                
 [91] ade4_1.7-23                 tidyselect_1.2.1           
 [93] timeSeries_4041.111         compiler_4.3.1             
 [95] microbiome_1.24.0           xml2_1.3.8                 
 [97] DelayedArray_0.28.0         scales_1.4.0               
 [99] DEoptimR_1.1-4              spatial_7.3-18             
[101] rappdirs_0.3.3              digest_0.6.35              
[103] minqa_1.2.8                 rmarkdown_2.29             
[105] XVector_0.42.0              htmltools_0.5.8.1          
[107] pkgconfig_2.0.3             MatrixGenerics_1.14.0      
[109] stabledist_0.7-2            fastmap_1.2.0              
[111] rlang_1.1.3                 htmlwidgets_1.6.4          
[113] farver_2.1.2                zoo_1.8-13                 
[115] jsonlite_2.0.0              ModelMetrics_1.2.2.2       
[117] RCurl_1.98-1.17             modeltools_0.2-24          
[119] Formula_1.2-5               GenomeInfoDbData_1.2.11    
[121] patchwork_1.3.2             Rhdf5lib_1.24.2            
[123] Rcpp_1.0.12                 ggnewscale_0.5.2           
[125] fastGHQuad_1.0.1            stringi_1.8.3              
[127] ggstar_1.0.4                stable_1.1.6               
[129] zlibbioc_1.48.2             MASS_7.3-60.0.1            
[131] plyr_1.8.9                  listenv_0.9.1              
[133] Biostrings_2.70.3           splines_4.3.1              
[135] hash_2.2.6.3                multtest_2.58.0            
[137] hms_1.1.3                   ggtreeExtra_1.12.0         
[139] ggsignif_0.6.4              stats4_4.3.1               
[141] rmutil_1.1.10               evaluate_1.0.5             
[143] nloptr_2.2.1                tzdb_0.5.0                 
[145] getopt_1.20.4               future_1.34.0              
[147] clue_0.3-66                 coin_1.4-3                 
[149] broom_1.0.9                 xtable_1.8-4               
[151] tidytree_0.4.6              rstatix_0.7.2              
[153] viridisLite_0.4.2           class_7.3-22               
[155] aplot_0.2.8                 IRanges_2.36.0             
[157] cluster_2.1.8.1             timechange_0.3.0           
[159] globals_0.18.0