Geography-independent mucosal microbiota alterations in primary sclerosing cholangitis persist after liver transplantation

2. Hypothesis: PSC recurrence is associated with a specific composition of the gut microbiome

1 Introduction

1.1 About

rPSC vs non-rPSC vs Healthy analysis on merged data

Alpha diversity –> group effect, country effect, interaction effect
Beta diversity – PERMANOVA, PCA -> group effect, country effect, interaction effect
DAA ->
- Group effect – linDA + MaAsLin2 intersection
- Country effect – taxa with significant interaction effect were excluded based on individual post-hoc analysis. Taxa had to have the same direction in both countries
ML models, trained and validated using out-of-sample bootstrap (500 resamplings):
o ENET
o kNN
o GBoost
o RF

Pairwise comparisons:

rPSC vs non-rPSC –by this comparison, we will find the effect of recurrence (whether the microbial composition is associated with it)

non-rPSC vs Healthy – by this comparison, we will find if the non-rPSC samples are „closer“ to the healthy samples then rPSC samples in terms of the microbial composition. In other words, does non-rPSC samples have „healthy microbiome“?

rPSC vs Healthy – by this comparison, we will find how much are the rPSC samples different from healthy samples.

However, the rPSC and non-rPSC samples are not different at all (see results). Therefore, we can do EXCLUSIVE LEFT JOIN of differentially abundant taxa from the analyses (rPSC vs Healthy, non-rPSC vs Healthy)

Importing libraries and custom functions built for this analysis

source("custom_functions.R")

1.2 Data Import

Importing ASV, taxa and metadata tables for both Czech and Norway samples.

Czech

path = "../../../../data/analysis_ready_data/ikem/"
asv_tab_ikem <- as.data.frame(fread(file.path(path,"asv_table_ikem.csv"),
                                    check.names = FALSE))
taxa_tab_ikem <- as.data.frame(fread(file.path(path,"taxa_table_ikem.csv"),
                                     check.names = FALSE))
metadata_ikem <- as.data.frame(fread(file.path(path,"metadata_ikem.csv"),
                                     check.names = FALSE))

Norway

path = "../../../../data/analysis_ready_data/norway/"
asv_tab_norway <- as.data.frame(fread(file.path(path,"asv_table_norway.csv"),
                                    check.names = FALSE))
taxa_tab_norway <- as.data.frame(fread(file.path(path,"taxa_table_norway.csv"),
                                    check.names = FALSE))
metadata_norway <- as.data.frame(fread(file.path(path,"metadata_norway.csv"),
                                    check.names = FALSE))

1.3 Merging data

Merging two countries to create whole dataset

asv_tab <- merge(asv_tab_ikem,asv_tab_norway,by="SeqID",all=TRUE)
taxa_tab <- merging_taxa_tables(taxa_tab_ikem,taxa_tab_norway)

Merging at ASV level

Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment

Merging two countries based on the different matrices - Ileum, Colon.

Terminal ileum

ileum_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="TI",Q="Q2")

Removing 1498 ASV(s)

Removing 1834 ASV(s)

Merging at ASV level

Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment

Removing 641 ASV(s)

ileum_asv_tab <- ileum_data[[1]]
ileum_taxa_tab <- ileum_data[[2]]
ileum_metadata <- ileum_data[[3]]

Colon

colon_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="colon",Q="Q2")

Removing 739 ASV(s)

Removing 266 ASV(s)

Merging at ASV level

Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment

Removing 1096 ASV(s)

colon_asv_tab <- colon_data[[1]]
colon_taxa_tab <- colon_data[[2]]
colon_metadata <- colon_data[[3]]

2 Data Analysis - Terminal ileum

segment="terminal_ileum"

2.1 Filtering

Rules:

sequencing depth > 10000
nearZeroVar() with default settings

Rarefaction Curve

path="../../../intermediate_files/rarecurves"
seq_depth_threshold <- 10000

ps <- construct_phyloseq(ileum_asv_tab,ileum_taxa_tab,ileum_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_ileum.Rdata"))

load(file.path(path,"rarefaction_ileum.Rdata"))

prare <- ggrarecurve(obj=rareres,
                      factorNames="Country",
                      indexNames=c("Observe")) + 
  theme_bw() +
  theme(axis.text=element_text(size=8), panel.grid=element_blank(),
        strip.background = element_rect(colour=NA,fill="grey"),
        strip.text.x = element_text(face="bold")) + 
  geom_vline(xintercept = seq_depth_threshold, 
             linetype="dashed", 
             color = "red") + 
  xlim(0, 20000)

The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)

prare

Library size

read_counts(ileum_asv_tab, line = c(5000,10000))

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.1 Sequencing depth

data_filt <- seq_depth_filtering(ileum_asv_tab,
                                 ileum_taxa_tab,
                                 ileum_metadata,
                                 seq_depth_threshold = 10000)

Removing 104 ASV(s)

filt_ileum_asv_tab <- data_filt[[1]]; alpha_ileum_asv_tab <- filt_ileum_asv_tab
filt_ileum_taxa_tab <- data_filt[[2]]; alpha_ileum_taxa_tab <- filt_ileum_taxa_tab
filt_ileum_metadata <- data_filt[[3]]; alpha_ileum_metadata <- filt_ileum_metadata

seq_step <- dim(filt_ileum_asv_tab)[1]

Library size

read_counts(filt_ileum_asv_tab,line = c(5000,10000))

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.2 NearZeroVar

data_filt <- nearzerovar_filtering(filt_ileum_asv_tab, 
                                   filt_ileum_taxa_tab,
                                   filt_ileum_metadata)

filt_ileum_asv_tab <- data_filt[[1]]
filt_ileum_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_ileum_asv_tab)[1]

Library size

read_counts(filt_ileum_asv_tab,line = c(5000,10000))

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

2.1.3 Final Counts

final_counts_filtering(ileum_asv_tab,
                       filt_ileum_asv_tab,
                       filt_ileum_metadata,
                       seq_step, 0, nearzero_step) %>% `colnames<-`("Count")

                            Count
Raw data: ASVs               4491
Raw data: Samples             220
Sequencing depth filt: ASVs  4387
Prevalence filt: ASVs           0
NearZeroVar filt: ASVs        575
Filt data: ASVs               575
Filt data: Samples            211
Filt data: Patients           211
Filt data: Patients.1           0
Filtered samples                9
Matrices                       TI
healthy                        73
non-rPSC                      105
rPSC                           33
pre_ltx                         0
post_ltx                        0
ETOH                            0

2.2 Alpha diversity

path = "../../../results/merged_sites/main_analysis/Q2/alpha_diversity"

Calculation

# Construct MPSE object
alpha_ileum_metadata$Sample <- alpha_ileum_metadata$SampleID
ileum_mpse <- as.MPSE(construct_phyloseq(alpha_ileum_asv_tab,
                                         alpha_ileum_taxa_tab,
                                         alpha_ileum_metadata))

ileum_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)

# Calculate alpha diversity - rarefied counts
ileum_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)

alpha_div_plots <- list()

# preparing data frame
alpha_data <- data.frame(SampleID=ileum_mpse$Sample.x,
                         Observe=ileum_mpse$Observe,
                         Shannon=ileum_mpse$Shannon,
                         Simpson=ileum_mpse$Simpson,
                         Pielou=ileum_mpse$Pielou,
                         Group=ileum_mpse$Group,
                         Country=ileum_mpse$Country,
                         Patient=ileum_mpse$Patient)

write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
          row.names = FALSE)

2.2.1 Plots

p_boxplot_alpha <- alpha_diversity_countries(alpha_data)

Using SampleID, Group, Country, Patient as id variables

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha

# see the results
p_boxplot_alpha

2.2.2 Linear models

path = "../../../results/merged_sites/main_analysis/Q2/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))

Richness

results_model <- pairwise.lm(formula = "Observe ~ Group * Country",
                             factors=alpha_data$Group,
                             data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_observe <- results_model[[1]]
  results_model_observe_emeans <- results_model[[2]]
} else {
  results_model_observe <- results_model
  results_model_observe_emeans <- NA
}

# save the results
pc_observed <- list(); 
pc_observed[[segment]] <- results_model_observe

# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")

Raw results of linear model of richness estimation.
	Estimate	Std..Error	t.value	Pr…t..	p.adj	sig
non-rPSC vs GrouprPSC	-12.199	11.972	-1.019	0.310	0.349
non-rPSC , rPSC - CZ vs NO	-21.486	11.452	-1.876	0.063	0.141
non-rPSC vs GrouprPSC:CountryNO	9.347	23.088	0.405	0.686	0.686
healthy vs GrouprPSC	-37.741	12.284	-3.072	0.003	0.025	*
healthy , rPSC - CZ vs NO	11.203	10.973	1.021	0.310	0.349
healthy vs GrouprPSC:CountryNO	-23.342	21.355	-1.093	0.277	0.349
healthy vs Groupnon-rPSC	-25.542	9.322	-2.740	0.007	0.031	*
healthy , non-rPSC - CZ vs NO	11.203	10.933	1.025	0.307	0.349
healthy vs Groupnon-rPSC:CountryNO	-32.689	15.108	-2.164	0.032	0.096

knitr::kable(results_model_observe_emeans,digits = 3,
caption = "Raw results of independent country analysis")

Raw results of independent country analysis
contrast	Country	estimate	SE	df	t.ratio	p.value
(non-rPSC) - healthy	CZ	-25.542	9.322	174	-2.740	0.007
(non-rPSC) - healthy	NO	-58.231	11.889	174	-4.898	0.000

Shannon

results_model <- pairwise.lm(formula = "Shannon ~ Group * Country",
                             factors=alpha_data$Group,
                             data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_shannon <- results_model[[1]]
  results_model_shannon_emeans <- results_model[[2]]
} else {
  results_model_shannon <- results_model
  results_model_shannon_emeans <- NA
}

# save the results
pc_shannon <- list(); 
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)

# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")

Raw results of linear model of Shannon estimation.
	Estimate	Std..Error	t.value	Pr…t..	p.adj
non-rPSC vs GrouprPSC	-0.102	0.146	-0.696	0.487	0.731
non-rPSC , rPSC - CZ vs NO	-0.349	0.140	-2.500	0.014	0.123
non-rPSC vs GrouprPSC:CountryNO	-0.006	0.281	-0.021	0.983	0.983
healthy vs GrouprPSC	-0.210	0.140	-1.501	0.137	0.321
healthy , rPSC - CZ vs NO	0.005	0.125	0.043	0.966	0.983
healthy vs GrouprPSC:CountryNO	-0.360	0.244	-1.478	0.142	0.321
healthy vs Groupnon-rPSC	-0.109	0.115	-0.948	0.344	0.620
healthy , non-rPSC - CZ vs NO	0.005	0.135	0.040	0.968	0.983
healthy vs Groupnon-rPSC:CountryNO	-0.354	0.186	-1.905	0.058	0.263

knitr::kable(results_model_shannon_emeans,digits = 3,
caption = "Raw results of independent country analysis")

Raw results of independent country analysis
contrast	Country	estimate	SE	df	t.ratio	p.value
(non-rPSC) - healthy	CZ	-0.109	0.115	174	-0.948	0.344
(non-rPSC) - healthy	NO	-0.463	0.146	174	-3.164	0.002

Simpson

results_model <- pairwise.lm(formula = "Simpson ~ Group * Country",
                                     factors=alpha_data$Group,
                                     data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_simpson <- results_model[[1]]
  results_model_simpson_emeans <- results_model[[2]]
} else {
  results_model_simpson <- results_model
  results_model_simpson_emeans <- NA
}

# save the results
pc_simpson <- list(); 
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)

# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")

Raw results of linear model of Simpson estimation.
	Estimate	Std..Error	t.value	Pr…t..	p.adj
non-rPSC vs GrouprPSC	0.000	0.023	0.019	0.985	0.985
non-rPSC , rPSC - CZ vs NO	-0.034	0.022	-1.573	0.118	0.531
non-rPSC vs GrouprPSC:CountryNO	-0.029	0.043	-0.661	0.510	0.767
healthy vs GrouprPSC	-0.009	0.018	-0.500	0.618	0.767
healthy , rPSC - CZ vs NO	-0.008	0.016	-0.501	0.617	0.767
healthy vs GrouprPSC:CountryNO	-0.055	0.031	-1.779	0.078	0.531
healthy vs Groupnon-rPSC	-0.009	0.016	-0.564	0.574	0.767
healthy , non-rPSC - CZ vs NO	-0.008	0.019	-0.410	0.682	0.767
healthy vs Groupnon-rPSC:CountryNO	-0.026	0.027	-0.975	0.331	0.767

knitr::kable(results_model_simpson_emeans,digits = 3,
caption = "Raw results of independent country analysis")

Raw results of independent country analysis
contrast	Country	estimate	SE	df	t.ratio	p.value
rPSC - healthy	CZ	-0.009	0.018	102	-0.500	0.618
rPSC - healthy	NO	-0.064	0.025	102	-2.526	0.013

Pielou

results_model <- pairwise.lm(formula = "Pielou ~ Group * Country",
                                     factors=alpha_data$Group,
                                     data=alpha_data)

# check interaction

if (!is.data.frame(results_model)){
  results_model_pielou <- results_model[[1]]
  results_model_pielou_emeans <- results_model[[2]]
} else {
  results_model_pielou <- results_model
  results_model_pielou_emeans <- NA
}

# save the results
pc_pielou <- list(); 
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)

# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")

Raw results of linear model of Pielou estimation.
	Estimate	Std..Error	t.value	Pr…t..	p.adj
non-rPSC vs GrouprPSC	-0.004	0.022	-0.184	0.854	0.946
non-rPSC , rPSC - CZ vs NO	-0.047	0.021	-2.260	0.025	0.229
non-rPSC vs GrouprPSC:CountryNO	-0.005	0.042	-0.108	0.914	0.946
healthy vs GrouprPSC	-0.001	0.021	-0.067	0.946	0.946
healthy , rPSC - CZ vs NO	-0.009	0.018	-0.463	0.645	0.946
healthy vs GrouprPSC:CountryNO	-0.043	0.036	-1.195	0.235	0.705
healthy vs Groupnon-rPSC	0.003	0.018	0.146	0.884	0.946
healthy , non-rPSC - CZ vs NO	-0.009	0.021	-0.409	0.683	0.946
healthy vs Groupnon-rPSC:CountryNO	-0.038	0.029	-1.329	0.185	0.705

knitr::kable(results_model_pielou_emeans,digits = 3,
caption = "Raw results of independent country analysis")

Table: Raw results of independent country analysis

2.2.3 Saving results

alpha_list <- list(
  Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
  Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
  Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
  Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
                   
write.xlsx(alpha_list, 
           file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))

2.3 Beta diversity

Calculating Aitchison distance (euclidean distance on clr-transformed data), both at ASV and genus level.

2.3.1 Main analysis

Genus level, Aitchison distance

level="genus"
path = "../../../results/merged_sites/main_analysis/Q2/beta_diversity"

pairwise_aitchison_raw <- list()
pca_plots_list <- list()

Aggregation, filtering

# Aggregation
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)
# Filtration
filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             ileum_metadata,
                             seq_depth_threshold=10000)

Removing 2 ASV(s)

filt_ileum_genus_tab <- filt_data[[1]]
filt_ileum_genus_taxa <- filt_data[[2]]
filt_ileum_metadata <- filt_data[[3]]

2.3.1.0.1 PERMANOVA

pairwise_df <- filt_ileum_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,
                           filt_ileum_metadata$Group,
                           covariate = filt_ileum_metadata$Country, sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_ileum_metadata$Group,covariate = filt_ileum_metadata$Country, interaction = TRUE, sim.method = "robust.aitchison", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <- rbind(pp_factor,pp_cov,pp_fac.cov)

# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")

PERMANOVA, GROUP separation
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted	sig
rPSC vs non-rPSC	1	215.579	1.208	0.009	0.107	0.107
rPSC vs healthy	1	561.696	3.345	0.030	0.001	0.002	**
non-rPSC vs healthy	1	760.242	4.431	0.024	0.001	0.002	**

knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")

PERMANOVA, COUNTRY separation
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted	sig
rPSC vs non-rPSC , Country	1	601.929	3.372	0.024	0.001	0.001	***
rPSC vs healthy , Country	1	496.841	2.959	0.027	0.001	0.001	***
non-rPSC vs healthy , Country	1	624.474	3.640	0.020	0.001	0.001	***

knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")

PERMANOVA, INTERACTION GROUP:Country
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted
rPSC vs non-rPSC : Country	1	156.176	0.874	0.006	0.788	0.788
rPSC vs healthy : Country	1	164.743	0.981	0.009	0.464	0.696
non-rPSC vs healthy : Country	1	209.954	1.225	0.007	0.089	0.267

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
  for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_ileum_metadata$Group,
                      covariate = filt_ileum_metadata$Country, 
                      group1 = group1,
                      group2 = group2)
  print(result_list)
}
}

2.3.1.0.2 Plots

p <- pca_plot_custom(filt_ileum_genus_tab,
                                 filt_ileum_genus_taxa,
                                 filt_ileum_metadata,
                                 show_boxplots = TRUE,
                                 variable = "Group", size=3, show_legend=FALSE)

# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p

# see the results
p

2.3.1.1 Saving results

write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]], 
           file = file.path(path,
           paste0("beta_diversity_results_", segment,".xlsx")))

2.4 Univariate Analysis

2.4.1 Main analysis

level="genus"

# needed paths
path = "../../../results/merged_sites/main_analysis/Q2/univariate_analysis"
path_maaslin=file.path("../../../intermediate_files/maaslin/Q2",level)

# variables
raw_linda_results_genus <- list();
raw_linda_results_genus[[segment]] <- list()
linda_results_genus <- list(); 
linda_results_genus[[segment]] <- list()

# country and interaction problems
list_country_union <- list()
list_intersections <- list()
list_venns <- list()
uni_statistics <- list()

# workbook for final df
wb <- createWorkbook()

# rPSC effect
rpsc_effect <- list()

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

ileum_genus_asv_taxa_tab <- create_asv_taxa_table(ileum_genus_tab,
                                                  ileum_genus_taxa_tab)

2.4.1.1 rPSC vs non-rPSC

group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

2.4.1.1.1 linDA

taxa_to_test <- read.xlsx("../../../results/merged_sites/main_analysis/Q1/univariate_analysis/significant_taxa_terminal_ileum.xlsx", sheet=" genus healthy vs pre_ltx")$SeqID

ileum_genus_tab_2 <- ileum_genus_tab %>% dplyr::filter(SeqID %in% taxa_to_test)
ileum_genus_taxa_tab_2 <- ileum_genus_taxa_tab %>% dplyr::filter(SeqID %in% taxa_to_test)

# prepare the data
linda_data <- binomial_prep(ileum_genus_tab_2,
                            ileum_genus_taxa_tab_2,
                            ileum_metadata,
                            group, usage="linDA", filtering = FALSE)

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

0  features are filtered!
The filtered data has  56  samples and  25  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

2.4.1.1.2 MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

volcano

2.4.1.1.3 Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.1.4 Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

NULL

## results for czech cohort
list_interaction_significant[[2]]

[1] NA

## results for norwegian cohort
list_interaction_significant[[3]]

[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

2.4.1.1.5 Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

[1] "non-rPSC"
[1] "rPSC"
[1] "non-rPSC"
[1] "rPSC"

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

2.4.1.1.6 Prevalence testing

uni_df$prevalence_diff <- abs(uni_df$`PREVALENCE_ rPSC` - uni_df$`PREVALENCE_ non-rPSC`)

#uni_df %<>% dplyr::filter(prevalence_diff >=0.2)
#filt_colon_uni_data <- filt_colon_uni_data[uni_df$SeqID,]
filt_ileum_uni_data <- ifelse(filt_ileum_uni_data>0,"YES","NO")
filt_ileum_uni_data %<>% t() %>% as.data.frame()
#filt_colon_uni_taxa %<>% dplyr::filter(SeqID %in% uni_df$SeqID)

filt_ileum_uni_data <- merge(filt_ileum_uni_data %>% rownames_to_column("SampleID"),filt_ileum_metadata,by="SampleID",all.x=TRUE) %>% column_to_rownames("SampleID")

p_values <- c()
for (taxon in uni_df$SeqID){
data <- filt_ileum_uni_data %>%
  group_by(!!sym(taxon), Group) %>%
  count() %>%
  group_by(!!sym(taxon), Group) %>%
  summarise(n = sum(n), .groups = "drop") %>%
  pivot_wider(names_from = Group, values_from = n) %>%
  column_to_rownames(taxon) %>%
  as.matrix()

if (nrow(data)>1){
  data[is.na(data)] <- 0
chi_res <- chisq.test(data)
chi_res$expected
#p_values <- c(p_values,chi_res$p.value)

test <- fisher.test(data)
test$p.value
p_values <- c(p_values,test$p.value)
} else p_values <- c(p_values,NA)

}

Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect

p_values_adjusted <- p.adjust(p_values,method = "BH")
names(p_values_adjusted) <- uni_df$SeqID
names(p_values) <- uni_df$SeqID
p_values_adjusted[which(p_values_adjusted<0.05)]

named numeric(0)

asv_tab <- as.data.frame(rrarefy(
    linda_data[[1]] %>% t(), 
    sample=10000)) %>% t() %>% as.data.frame()

asv_tab <- ifelse(asv_tab>10,"YES","NO")
asv_tab %<>% t() %>% as.data.frame()

asv_tab <- merge(asv_tab %>% rownames_to_column("SampleID"),filt_ileum_metadata,by="SampleID",all.x=TRUE) %>% column_to_rownames("SampleID")

p_values <- c()
for (taxon in uni_df$SeqID){
data <- asv_tab %>%
  group_by(!!sym(taxon), Group) %>%
  count() %>%
  group_by(!!sym(taxon), Group) %>%
  summarise(n = sum(n), .groups = "drop") %>%
  pivot_wider(names_from = Group, values_from = n) %>%
  column_to_rownames(taxon) %>%
  as.matrix()

if (nrow(data)>1){
data[is.na(data)] <- 0
chi_res <- chisq.test(data)
chi_res$expected
test <- fisher.test(data)
test$p.value
chi_res$p.value
p_values <- c(p_values,test$p.value)
} else p_values <- c(p_values,NA)
}

Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect

p_values_adjusted <- p.adjust(p_values,method = "BH")
names(p_values_adjusted) <- uni_df$SeqID
names(p_values) <- uni_df$SeqID
p_values_adjusted[which(p_values_adjusted<0.05)]

named numeric(0)

df <- data.frame(SeqID=uni_df$SeqID,
                 p=p_values,
                 p_adj=p_values_adjusted)

prevalence_df <- uni_df[,c("SeqID","PREVALENCE_ non-rPSC","PREVALENCE_ rPSC")]
prevalence_df <- merge(prevalence_df,df,by="SeqID",all=TRUE)
write.xlsx(prevalence_df,file.path(path,paste0("supplements_prevalence_",segment,".xlsx")))

2.4.1.2 rPSC vs healthy

group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

2.4.1.2.1 linDA

# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

Removing 67 ASV(s)

Removing 4 ASV(s)

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

0  features are filtered!
The filtered data has  106  samples and  207  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

2.4.1.2.2 MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

2.4.1.2.3 Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.2.4 Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

NULL

## results for czech cohort
list_interaction_significant[[2]]

[1] NA

## results for norwegian cohort
list_interaction_significant[[3]]

[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

2.4.1.2.5 Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

[1] "healthy"
[1] "rPSC"
[1] "healthy"
[1] "rPSC"

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

2.4.1.3 non-rPSC vs healthy

group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

2.4.1.3.1 linDA

# prepare the data
linda_data <- binomial_prep(ileum_genus_tab,
                            ileum_genus_taxa_tab,
                            ileum_metadata,
                            group, usage="linDA")

Removing 24 ASV(s)

Removing 2 ASV(s)

filt_ileum_uni_data <- linda_data[[1]]
filt_ileum_uni_taxa <- linda_data[[2]]
filt_ileum_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_ileum_uni_data, 
                   filt_ileum_uni_metadata, 
                   formula = '~ Group * Country')

0  features are filtered!
The filtered data has  178  samples and  179  features will be tested!
Imputation approach is used.
Fit linear models ...
Completed.

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_ileum_uni_data,
                filt_ileum_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_ileum_uni_data,
             filt_ileum_uni_taxa)
}

# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, 
                                group1, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_ileum_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_ileum_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)
volcano

2.4.1.3.2 MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa) + 
            ggtitle(comparison_title)

volcano2 <- volcano_plot_maaslin(fit_data,filt_ileum_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

2.4.1.3.3 Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

2.4.1.3.4 Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_ileum_uni_data,
                                          filt_ileum_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

NULL

## results for czech cohort
list_interaction_significant[[2]]

[1] NA

## results for norwegian cohort
list_interaction_significant[[3]]

[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

2.4.1.3.5 Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

[1] "healthy"
[1] "non-rPSC"
[1] "healthy"
[1] "non-rPSC"

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

2.4.1.4 Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,ileum_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(
  list_heatmap,                               uni_statistics$terminal_ileum[grepl(level,names(uni_statistics$terminal_ileum))],
                                      ileum_taxa_tab) + xlab("") + ylab("")

min_clr -1.727234 
max_clr 7.029952 
min_log -4.215042 
max_log 3.331439

dotheatmap_linda

Horizontal bar plot

p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))

Using SeqID as id variables

p_prevalence_final <- ggarrange(p_prevalence,
                                ggplot() + theme_minimal(),
                                nrow = 2,heights = c(1,0.085))

p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),p_prevalence_final,ncol=2,widths = c(1,0.3))

p

dot_heatmap_ileum <- p

2.4.1.5 rPSC effect

pre_LTx vs Healthy and Post_LTx vs Healthy intersection

A <- list_intersections[[paste(segment,level,"healthy vs rPSC")]]
B <- list_intersections[[paste(segment,level,"healthy vs non-rPSC")]]
df <- A[!(A$SeqID %in% B$SeqID),]


rpsc_effect[[paste(segment,level)]] <- df
  
# see the results
rpsc_effect[[paste(segment,level)]]

                            SeqID
6                     Collinsella
8                   Adlercreutzia
15                    Bacteroides
16                    Barnesiella
29                Parabacteroides
32                      Bilophila
41                     Holdemania
75   Lachnospiraceae_ND3007_group
76  Lachnospiraceae_NK4A136_group
84                     Sellimonas
99                 Butyricicoccus
134      Family_XIII_AD3011_group
155                   Pseudomonas
169            Lacticaseibacillus
188               Veillonellaceae
199                    Klebsiella
                                                                                                                          Taxonomy
6                        k__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;f__Coriobacteriaceae;g__Collinsella
8                        k__Bacteria;p__Actinobacteriota;c__Coriobacteriia;o__Coriobacteriales;f__Eggerthellaceae;g__Adlercreutzia
15                                    k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Bacteroidaceae;g__Bacteroides
16                                   k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Barnesiellaceae;g__Barnesiella
29                                k__Bacteria;p__Bacteroidota;c__Bacteroidia;o__Bacteroidales;f__Tannerellaceae;g__Parabacteroides
32                   k__Bacteria;p__Desulfobacterota;c__Desulfovibrionia;o__Desulfovibrionales;f__Desulfovibrionaceae;g__Bilophila
41                                 k__Bacteria;p__Firmicutes;c__Bacilli;o__Erysipelotrichales;f__Erysipelotrichaceae;g__Holdemania
75                    k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_ND3007_group
76                   k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Lachnospiraceae_NK4A136_group
84                                      k__Bacteria;p__Firmicutes;c__Clostridia;o__Lachnospirales;f__Lachnospiraceae;g__Sellimonas
99                               k__Bacteria;p__Firmicutes;c__Clostridia;o__Oscillospirales;f__Butyricicoccaceae;g__Butyricicoccus
134 k__Bacteria;p__Firmicutes;c__Clostridia;o__Peptostreptococcales-Tissierellales;f__Anaerovoracaceae;g__Family_XIII_AD3011_group
155                     k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Pseudomonadales;f__Pseudomonadaceae;g__Pseudomonas
169                              k__Bacteria;p__Firmicutes;c__Bacilli;o__Lactobacillales;f__Lactobacillaceae;g__Lacticaseibacillus
188             k__Bacteria;p__Firmicutes;c__Negativicutes;o__Veillonellales-Selenomonadales;f__Veillonellaceae;g__Veillonellaceae
199                   k__Bacteria;p__Proteobacteria;c__Gammaproteobacteria;o__Enterobacterales;f__Enterobacteriaceae;g__Klebsiella
    log2FoldChange      p_value         padj
6        -2.349468 1.092155e-02 0.0443286425
8        -1.638315 7.387504e-03 0.0332437679
15       -2.119974 3.932211e-04 0.0040698386
16       -4.215042 1.576685e-05 0.0004079671
29       -3.049283 2.871549e-05 0.0005403732
32       -2.837220 4.305534e-05 0.0007427046
41       -1.686545 1.351704e-03 0.0099929559
75       -2.216009 4.785012e-03 0.0260657213
76       -2.596492 1.644208e-04 0.0020100837
84        1.773614 1.044921e-02 0.0441425798
99       -1.751985 3.166657e-03 0.0204843095
134      -2.011614 1.298386e-04 0.0019186055
155       1.025212 8.916234e-03 0.0384512596
169       1.281555 1.747899e-04 0.0020100837
188       1.244317 1.665793e-04 0.0020100837
199       2.461632 4.570063e-03 0.0260223513

2.4.1.6 Saving results

# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
             overwrite = TRUE)

# PSC effect
write.xlsx(rpsc_effect[[paste(segment,level)]],file.path(path,paste0("rpsc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa

write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("significant_taxa_",segment,".xlsx")))

2.5 Machine learning

path = "../../../results/merged_sites/main_analysis/Q2/models"

2.5.1 ElasticNet

model="enet"

2.5.1.1 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level = level)

ileum_genus_tab <- genus_data[[1]]
ileum_genus_taxa_tab <- genus_data[[2]]

2.5.1.1.1 rPSC vs non-rPSC - all

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep(ileum_genus_tab,
                                     ileum_genus_taxa_tab,
                                     ileum_metadata,
                                     group, 
                                     usage="ml_clr")

Removing 55 ASV(s)

Removing 2 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ <- list()
models_cm <- list()
roc_cs <- list()
betas <- list()

models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                     [,1]
alpha                           0.4000000
lambda                          0.3119645
auc                             0.6432900
auc_czech                       0.6567842
auc_no                          0.6069959
auc_optimism_corrected          0.5736934
auc_optimism_corrected_CIL      0.3905857
auc_optimism_corrected_CIU      0.7322617
accuracy                        0.7608696
accuracy_czech                        NaN
accuracy_no                     0.7500000
accuracy_optimism_corrected     0.7353858
accuracy_optimism_corrected_CIL 0.6288683
accuracy_optimism_corrected_CIU 0.8277572

enet_model$conf_matrices

$original
    0  
0 105 0
1  33 0

$czech
   0  
0 78 0
1 24 0

$no
   0  
0 27 0
1  9 0

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

roc_c

2.5.1.1.2 rPSC vs non-rPSC - subset

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2]," subset")

taxa_to_test <- read.xlsx("../../../results/merged_sites/main_analysis/Q1/univariate_analysis/significant_taxa_terminal_ileum.xlsx", sheet=" genus healthy vs pre_ltx")

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_ileum_uni_data <- binomial_prep_psc_effect(ileum_genus_tab,
                                                ileum_genus_taxa_tab, 
                                                ileum_metadata,
                                                taxa_to_test,
                                                usage="ml_clr")

Removing 55 ASV(s)

Removing 2 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_ileum_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                     [,1]
alpha                           0.2000000
lambda                          0.5017248
auc                             0.5160173
auc_czech                       0.5224359
auc_no                          0.5000000
auc_optimism_corrected          0.5599297
auc_optimism_corrected_CIL      0.3863553
auc_optimism_corrected_CIU      0.7232201
accuracy                        0.7608696
accuracy_czech                        NaN
accuracy_no                     0.7500000
accuracy_optimism_corrected     0.7539422
accuracy_optimism_corrected_CIL 0.6525954
accuracy_optimism_corrected_CIU 0.8442308

enet_model$conf_matrices

$original
    0  
0 105 0
1  33 0

$czech
   0  
0 78 0
1 24 0

$no
   0  
0 27 0
1  9 0

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

roc_c

2.5.1.2 Saving results

models_summ_df_ileum <- do.call(rbind, 
  models_summ[grep(segment,names(models_summ),value = TRUE)])

write.csv(models_summ_df_ileum,file.path(path,paste0("elastic_net_",segment,".csv")))

2.6 Results overview

2.6.0.1 Alpha diversity

Models

knitr::kable(pc_observed[[segment]],
             digits = 3,
             caption = "Results of linear model testing ASV Richness")

Results of linear model testing ASV Richness
	Estimate	Std..Error	t.value	Pr…t..	p.adj	sig
non-rPSC vs GrouprPSC	-12.199	11.972	-1.019	0.310	0.349
non-rPSC , rPSC - CZ vs NO	-21.486	11.452	-1.876	0.063	0.141
non-rPSC vs GrouprPSC:CountryNO	9.347	23.088	0.405	0.686	0.686
healthy vs GrouprPSC	-37.741	12.284	-3.072	0.003	0.025	*
healthy , rPSC - CZ vs NO	11.203	10.973	1.021	0.310	0.349
healthy vs GrouprPSC:CountryNO	-23.342	21.355	-1.093	0.277	0.349
healthy vs Groupnon-rPSC	-25.542	9.322	-2.740	0.007	0.031	*
healthy , non-rPSC - CZ vs NO	11.203	10.933	1.025	0.307	0.349
healthy vs Groupnon-rPSC:CountryNO	-32.689	15.108	-2.164	0.032	0.096

knitr::kable(pc_shannon[[segment]],
             digits = 3,
             caption = "Results of linear model testing Shannon index")

Results of linear model testing Shannon index
	Estimate	Std..Error	t.value	Pr…t..	p.adj
non-rPSC vs GrouprPSC	-0.102	0.146	-0.696	0.487	0.731
non-rPSC , rPSC - CZ vs NO	-0.349	0.140	-2.500	0.014	0.123
non-rPSC vs GrouprPSC:CountryNO	-0.006	0.281	-0.021	0.983	0.983
healthy vs GrouprPSC	-0.210	0.140	-1.501	0.137	0.321
healthy , rPSC - CZ vs NO	0.005	0.125	0.043	0.966	0.983
healthy vs GrouprPSC:CountryNO	-0.360	0.244	-1.478	0.142	0.321
healthy vs Groupnon-rPSC	-0.109	0.115	-0.948	0.344	0.620
healthy , non-rPSC - CZ vs NO	0.005	0.135	0.040	0.968	0.983
healthy vs Groupnon-rPSC:CountryNO	-0.354	0.186	-1.905	0.058	0.263

knitr::kable(pc_simpson[[segment]],
             digits = 3,
             caption = "Results of linear model testing Simpson index")

Results of linear model testing Simpson index
	Estimate	Std..Error	t.value	Pr…t..	p.adj
non-rPSC vs GrouprPSC	0.000	0.023	0.019	0.985	0.985
non-rPSC , rPSC - CZ vs NO	-0.034	0.022	-1.573	0.118	0.531
non-rPSC vs GrouprPSC:CountryNO	-0.029	0.043	-0.661	0.510	0.767
healthy vs GrouprPSC	-0.009	0.018	-0.500	0.618	0.767
healthy , rPSC - CZ vs NO	-0.008	0.016	-0.501	0.617	0.767
healthy vs GrouprPSC:CountryNO	-0.055	0.031	-1.779	0.078	0.531
healthy vs Groupnon-rPSC	-0.009	0.016	-0.564	0.574	0.767
healthy , non-rPSC - CZ vs NO	-0.008	0.019	-0.410	0.682	0.767
healthy vs Groupnon-rPSC:CountryNO	-0.026	0.027	-0.975	0.331	0.767

knitr::kable(pc_pielou[[segment]],
             digits = 3,
             caption = "Results of linear model testing Pielou index")

Results of linear model testing Pielou index
	Estimate	Std..Error	t.value	Pr…t..	p.adj
non-rPSC vs GrouprPSC	-0.004	0.022	-0.184	0.854	0.946
non-rPSC , rPSC - CZ vs NO	-0.047	0.021	-2.260	0.025	0.229
non-rPSC vs GrouprPSC:CountryNO	-0.005	0.042	-0.108	0.914	0.946
healthy vs GrouprPSC	-0.001	0.021	-0.067	0.946	0.946
healthy , rPSC - CZ vs NO	-0.009	0.018	-0.463	0.645	0.946
healthy vs GrouprPSC:CountryNO	-0.043	0.036	-1.195	0.235	0.705
healthy vs Groupnon-rPSC	0.003	0.018	0.146	0.884	0.946
healthy , non-rPSC - CZ vs NO	-0.009	0.021	-0.409	0.683	0.946
healthy vs Groupnon-rPSC:CountryNO	-0.038	0.029	-1.329	0.185	0.705

Plots

alpha_div_plots[[paste(segment,"Country")]]

2.6.0.2 Beta diversity

Main results

knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
             digits = 3,
             caption = "Results of PERMANOVA - robust aitchison distance")

Results of PERMANOVA - robust aitchison distance
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted	sig
rPSC vs non-rPSC	1	215.579	1.208	0.009	0.107	0.107
rPSC vs healthy	1	561.696	3.345	0.030	0.001	0.002	**
non-rPSC vs healthy	1	760.242	4.431	0.024	0.001	0.002	**
rPSC vs non-rPSC , Country	1	601.929	3.372	0.024	0.001	0.001	***
rPSC vs healthy , Country	1	496.841	2.959	0.027	0.001	0.001	***
non-rPSC vs healthy , Country	1	624.474	3.640	0.020	0.001	0.001	***
rPSC vs non-rPSC : Country	1	156.176	0.874	0.006	0.788	0.788
rPSC vs healthy : Country	1	164.743	0.981	0.009	0.464	0.696
non-rPSC vs healthy : Country	1	209.954	1.225	0.007	0.089	0.267

PCA

pca_plots_list[[paste(segment,"genus custom")]]

2.6.0.3 Univariate analysis

Number of significant taxa

knitr::kable(cbind(as.data.frame(lapply(list_intersections,nrow)),
      as.data.frame(lapply(rpsc_effect,nrow))) %>% t() %>% 
  `colnames<-`("Count") %>% 
  `rownames<-`(c(names(list_intersections),"PSC effect Genus")),caption="Number of significant taxa")

Number of significant taxa
	Count
terminal_ileum genus non-rPSC vs rPSC	0
terminal_ileum genus healthy vs rPSC	34
terminal_ileum genus healthy vs non-rPSC	25
PSC effect Genus	16

2.6.0.4 Machine learning

Main models

knitr::kable(models_summ_df_ileum %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
             digits=2,caption="Elastic net results")

Elastic net results
	alpha	lambda	auc_optimism_corrected	auc_optimism_corrected_CIL	auc_optimism_corrected_CIU
rPSC vs non-rPSC genus terminal_ileum	0.4	0.31	0.57	0.39	0.73
rPSC vs non-rPSC subset genus terminal_ileum	0.2	0.50	0.56	0.39	0.72

ROC - Genus level

p <- roc_curve_all_custom(roc_cs[c(1:2)],Q="Q2",
                     model_name="enet_model",legend = FALSE)

p

3 Data Analysis - Colon

segment="colon"

3.1 Filtering

Rules: - sequencing depth > 10000

nearZeroVar() with default settings

Rarefaction Curve

path="../../../intermediate_files/rarecurves"
seq_depth_threshold <- 10000

ps <- construct_phyloseq(colon_asv_tab,colon_taxa_tab,colon_metadata)
rareres <- get_rarecurve(obj=ps, chunks=500)
save(rareres,file = file.path(path,"rarefaction_colon.Rdata"))

load(file.path(path,"rarefaction_colon.Rdata"))
seq_depth_threshold <- 10000
prare <- ggrarecurve(obj=rareres,
                      factorNames="Country",
                      indexNames=c("Observe")) + 
        theme_bw()+
        theme(axis.text=element_text(size=8), 
              panel.grid=element_blank(),
              strip.background = element_rect(colour=NA,fill="grey"),
              strip.text.x = element_text(face="bold")) + 
        geom_vline(xintercept = seq_depth_threshold, 
                   linetype="dashed", color = "red") + 
        xlim(0, 20000)

The color has been set automatically, you can reset it manually by adding scale_color_manual(values=yourcolors)

prare

Library size

read_counts(colon_asv_tab, line = c(5000,10000))

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.1 Sequencing depth

data_filt <- seq_depth_filtering(colon_asv_tab,
                                 colon_taxa_tab,
                                 colon_metadata,
                                 seq_depth_threshold = 10000)

Removing 50 ASV(s)

filt_colon_asv_tab <- data_filt[[1]]; alpha_colon_asv_tab <- filt_colon_asv_tab
filt_colon_taxa_tab <- data_filt[[2]]; alpha_colon_taxa_tab <- filt_colon_taxa_tab
filt_colon_metadata <- data_filt[[3]]; alpha_colon_metadata <- filt_colon_metadata

seq_step <- dim(filt_colon_asv_tab)[1]

Library size

read_counts(filt_colon_asv_tab,line = c(10000))

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.2 NearZeroVar

data_filt <- nearzerovar_filtering(filt_colon_asv_tab,
                                   filt_colon_taxa_tab,
                                   filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]
filt_colon_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_colon_asv_tab)[1]

Library size

read_counts(filt_colon_asv_tab,line = c(5000,10000))

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

Check zero depth

data_filt <- check_zero_depth(filt_colon_asv_tab, 
                              filt_colon_taxa_tab, 
                              filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]; 
filt_colon_taxa_tab <- data_filt[[2]]; 
filt_colon_metadata <- data_filt[[3]];

Library size

read_counts(filt_colon_asv_tab,line = c(5000,10000))

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

3.1.3 Final Counts

final_counts_filtering(colon_asv_tab,
                       filt_colon_asv_tab,
                       filt_colon_metadata,
                       seq_step, 0, nearzero_step)

                                               V1
Raw data: ASVs                               5840
Raw data: Samples                             527
Sequencing depth filt: ASVs                  5790
Prevalence filt: ASVs                           0
NearZeroVar filt: ASVs                        444
Filt data: ASVs                               444
Filt data: Samples                            511
Filt data: Patients                           250
Filt data: Patients.1                           0
Filtered samples                               16
Matrices                    Cecum;Rectum;CD;CA;SI
healthy                                       161
non-rPSC                                      263
rPSC                                           87
pre_ltx                                         0
post_ltx                                        0
ETOH                                            0

3.2 Alpha diversity

path = "../../../results/merged_sites/main_analysis/Q2/alpha_diversity"

Calculation

# Construct MPSE object
alpha_colon_metadata$Sample <- alpha_colon_metadata$SampleID
colon_mpse <- as.MPSE(construct_phyloseq(alpha_colon_asv_tab,
                                         alpha_colon_taxa_tab,
                                         alpha_colon_metadata))

colon_mpse %<>% mp_rrarefy(raresize = 10000,seed = 123)

# Calculate alpha diversity - rarefied counts
colon_mpse %<>% mp_cal_alpha(.abundance=RareAbundance, force=TRUE)

alpha_data <- data.frame(SampleID=colon_mpse$Sample.x,
                         Observe=colon_mpse$Observe,
                         Shannon=colon_mpse$Shannon,
                         Simpson=colon_mpse$Simpson,
                         Pielou=colon_mpse$Pielou,
                         Group=colon_mpse$Group,
                         Country=colon_mpse$Country,
                         Patient=colon_mpse$Patient)

write.csv(alpha_data,file.path(path,paste0("alpha_indices_",segment,".csv")),
          row.names = FALSE)

3.2.1 Plots

p_boxplot_alpha <- alpha_diversity_countries(alpha_data)

Using SampleID, Group, Country, Patient as id variables

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.
Using SampleID, Group, Country, Patient as id variables

Scale for y is already present.
Adding another scale for y, which will replace the existing scale.

# save the results
alpha_div_plots[[paste(segment,"Country")]] <- p_boxplot_alpha

# see the results
p_boxplot_alpha

3.2.2 Linear Models

path = "../../../results/merged_sites/main_analysis/Q2/alpha_diversity"
alpha_data <- read.csv(file.path(path,paste0("alpha_indices_",segment,".csv")))

Richness

results_model <- pairwise.lmer(
  formula = "Observe ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_observe <- results_model[[1]]
  results_model_observe_detailed <- results_model[[2]]
} else {
  results_model_observe <- results_model
  results_model_observe_detailed <- NA
}

# save the results
pc_observed[[segment]] <- results_model_observe

# see the results
knitr::kable(results_model_observe,digits = 3,
caption = "Raw results of linear model of richness estimation.")

Raw results of linear model of richness estimation.
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj	sig
non-rPSC vs GrouprPSC	-17.252	12.469	156.883	-1.384	0.168	0.192
non-rPSC vs rPSC - CZ vs NO	-17.941	10.644	153.303	-1.686	0.094	0.192
non-rPSC vs GrouprPSC:CountryNO	3.483	21.371	152.108	0.163	0.871	0.871
healthy vs GrouprPSC	-40.307	11.924	125.516	-3.380	0.001	0.009	**
healthy vs rPSC - CZ vs NO	14.783	10.544	126.613	1.402	0.163	0.192
healthy vs GrouprPSC:CountryNO	-29.232	19.769	123.028	-1.479	0.142	0.192
healthy vs Groupnon-rPSC	-22.976	8.905	211.354	-2.580	0.011	0.047	*
healthy vs non-rPSC - CZ vs NO	14.961	10.892	212.956	1.374	0.171	0.192
healthy vs Groupnon-rPSC:CountryNO	-32.911	14.701	208.353	-2.239	0.026	0.079

knitr::kable(results_model_observe_detailed,digits = 3,
caption = "Raw results of independent country analysis")

Raw results of independent country analysis

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_CZ	171.439	6.808	137.314	25.182	0.000	NA
Groupnon-rPSC	-22.949	8.722	135.888	-2.631	0.009	0.019

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_NO	186.365	8.682	75.472	21.467	0	NA
Groupnon-rPSC	-55.850	12.120	73.619	-4.608	0	0

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_CZ	171.468	6.275	89.628	27.325	0.000	NA
CountryNO	14.904	9.836	89.468	1.515	0.133	0.133

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
non-rPSC_CZ	148.477	5.932	120.279	25.028	0.000	NA
CountryNO	-17.950	10.519	117.183	-1.706	0.091	0.121

Shannon

results_model <- pairwise.lmer(
  formula = "Shannon ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_shannon <- results_model[[1]]
  results_model_shannon_detailed <- results_model[[2]]
} else {
  results_model_shannon <- results_model
  results_model_shannon_detailed <- NA
}

# save the results
pc_shannon[[segment]] <- as.data.frame(results_model_shannon)

# see the results
knitr::kable(results_model_shannon,digits = 3,
caption = "Raw results of linear model of Shannon estimation.")

Raw results of linear model of Shannon estimation.
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj
non-rPSC vs GrouprPSC	-0.146	0.161	154.485	-0.907	0.366	0.549
non-rPSC vs rPSC - CZ vs NO	-0.355	0.138	151.788	-2.581	0.011	0.051
non-rPSC vs GrouprPSC:CountryNO	0.008	0.277	150.856	0.030	0.976	0.976
healthy vs GrouprPSC	-0.330	0.128	126.201	-2.572	0.011	0.051
healthy vs rPSC - CZ vs NO	0.017	0.113	127.217	0.146	0.884	0.976
healthy vs GrouprPSC:CountryNO	-0.364	0.213	123.907	-1.709	0.090	0.162
healthy vs Groupnon-rPSC	-0.182	0.101	207.051	-1.795	0.074	0.162
healthy vs non-rPSC - CZ vs NO	0.018	0.124	208.617	0.145	0.884	0.976
healthy vs Groupnon-rPSC:CountryNO	-0.375	0.168	204.129	-2.240	0.026	0.078

knitr::kable(results_model_shannon_detailed,digits = 3,
caption = "Raw results of independent country analysis")

Raw results of independent country analysis

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_CZ	3.683	0.065	77.802	56.300	0.000	NA
GrouprPSC	-0.330	0.116	76.679	-2.846	0.006	0.011

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_NO	3.700	0.099	48.693	37.40	0.000	NA
GrouprPSC	-0.693	0.193	47.827	-3.59	0.001	0.003

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_CZ	3.683	0.054	89.061	68.505	0.000	NA
CountryNO	0.019	0.084	88.828	0.231	0.818	0.818

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
rPSC_CZ	3.354	0.153	36.206	21.939	0.000	NA
CountryNO	-0.347	0.261	35.912	-1.332	0.191	0.255

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_CZ	3.684	0.081	133.503	45.582	0.000	NA
Groupnon-rPSC	-0.184	0.104	132.547	-1.772	0.079	0.105

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_NO	3.702	0.093	77.069	39.853	0	NA
Groupnon-rPSC	-0.559	0.129	74.201	-4.324	0	0

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_CZ	3.683	0.054	89.061	68.505	0.000	NA
CountryNO	0.019	0.084	88.828	0.231	0.818	0.818

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
non-rPSC_CZ	3.501	0.075	117.089	46.588	0.000	NA
CountryNO	-0.357	0.133	114.586	-2.673	0.009	0.017

Simpson

results_model <- pairwise.lmer(
  formula = "Simpson ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_simpson <- results_model[[1]]
  results_model_simpson_detailed <- results_model[[2]]
} else {
  results_model_simpson <- results_model
  results_model_simpson_detailed <- NA
}

# save the results
pc_simpson[[segment]] <- as.data.frame(results_model_simpson)

# see the results
knitr::kable(results_model_simpson,digits = 3,
caption = "Raw results of linear model of Simpson estimation.")

Raw results of linear model of Simpson estimation.
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj
non-rPSC vs GrouprPSC	-0.005	0.029	153.608	-0.169	0.866	0.866
non-rPSC vs rPSC - CZ vs NO	-0.032	0.024	151.251	-1.296	0.197	0.443
non-rPSC vs GrouprPSC:CountryNO	-0.027	0.049	150.427	-0.554	0.581	0.806
healthy vs GrouprPSC	-0.031	0.020	127.660	-1.523	0.130	0.391
healthy vs rPSC - CZ vs NO	-0.007	0.018	128.411	-0.402	0.688	0.806
healthy vs GrouprPSC:CountryNO	-0.052	0.034	125.982	-1.540	0.126	0.391
healthy vs Groupnon-rPSC	-0.026	0.016	201.924	-1.600	0.111	0.391
healthy vs non-rPSC - CZ vs NO	-0.007	0.020	203.469	-0.363	0.717	0.806
healthy vs Groupnon-rPSC:CountryNO	-0.025	0.026	199.046	-0.952	0.342	0.616

knitr::kable(results_model_simpson_detailed,digits = 3,
caption = "Raw results of independent country analysis")

Table: Raw results of independent country analysis

Pielou

results_model <- pairwise.lmer(
  formula = "Pielou ~ Group * Country + (1|Patient)",
  factors=alpha_data$Group,
  data=alpha_data)

# check interaction
if (!is.data.frame(results_model)){
  results_model_pielou <- results_model[[1]]
  results_model_pielou_detailed <- results_model[[2]]
} else {
  results_model_pielou <- results_model
  results_model_pielou_detailed <- NA
}

# save the results
pc_pielou[[segment]] <- as.data.frame(results_model_pielou)

# see the results
knitr::kable(results_model_pielou,digits = 3,
caption = "Raw results of linear model of Pielou estimation.")

Raw results of linear model of Pielou estimation.
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj
non-rPSC vs GrouprPSC	-0.007	0.026	153.863	-0.292	0.771	0.843
non-rPSC vs rPSC - CZ vs NO	-0.052	0.022	150.735	-2.372	0.019	0.171
non-rPSC vs GrouprPSC:CountryNO	-0.009	0.044	149.674	-0.198	0.843	0.843
healthy vs GrouprPSC	-0.023	0.020	127.665	-1.183	0.239	0.538
healthy vs rPSC - CZ vs NO	-0.008	0.017	129.013	-0.457	0.648	0.843
healthy vs GrouprPSC:CountryNO	-0.052	0.033	124.565	-1.605	0.111	0.333
healthy vs Groupnon-rPSC	-0.015	0.015	204.651	-1.006	0.316	0.568
healthy vs non-rPSC - CZ vs NO	-0.008	0.019	206.473	-0.416	0.678	0.843
healthy vs Groupnon-rPSC:CountryNO	-0.044	0.025	201.171	-1.745	0.083	0.333

knitr::kable(results_model_pielou_detailed,digits = 3,
caption = "Raw results of independent country analysis")

Raw results of independent country analysis

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_CZ	0.720	0.013	132.894	56.00	0.000	NA
Groupnon-rPSC	-0.016	0.016	132.038	-0.97	0.334	0.445

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_NO	0.713	0.013	79.537	55.174	0.000	NA
Groupnon-rPSC	-0.060	0.018	74.995	-3.379	0.001	0.005

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
healthy_CZ	0.720	0.007	89.185	99.650	0.000	NA
CountryNO	-0.008	0.011	88.856	-0.667	0.506	0.506

	Estimate	Std. Error	df	t value	Pr(>\|t\|)	p.adjusted
non-rPSC_CZ	0.705	0.012	115.555	60.129	0.000	NA
CountryNO	-0.052	0.021	112.732	-2.499	0.014	0.028

3.2.3 Saving results

alpha_list <- list(
  Richness=pc_observed[[segment]] %>% rownames_to_column("Comparison"),
  Shannon=pc_shannon[[segment]] %>% rownames_to_column("Comparison"),
  Simpson=pc_simpson[[segment]] %>% rownames_to_column("Comparison"),
  Pielou=pc_pielou[[segment]] %>% rownames_to_column("Comparison"))
                   
write.xlsx(alpha_list, 
           file = file.path(path,paste0("alpha_diversity_results_",segment,".xlsx")))

3.3 Beta diversity

Calculating Aitchison distance (euclidean distance on clr-transformed data), both at ASV and genus level.

3.3.1 Main analysis

Genus level, Aitchison distance

level="genus"

# needed path
path = "../../../results/merged_sites/main_analysis/Q2/beta_diversity"

Aggregation, filtering

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)

filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             colon_metadata,
                            seq_depth_threshold=10000)

Removing 5 ASV(s)

filt_colon_genus_tab <- filt_data[[1]]
filt_colon_genus_taxa <- filt_data[[2]]
filt_colon_genus_metadata <- filt_data[[3]]

3.3.1.0.1 PERMANOVA

pairwise_df <- filt_colon_genus_tab %>% column_to_rownames("SeqID") %>% t()

# main effect
pp_main <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
                           covariate = filt_colon_genus_metadata$Country, 
                           patients = filt_colon_genus_metadata$Patient,
                           sim.method = "robust.aitchison", p.adjust.m="BH")

# interaction
pp_int <- pairwise.adonis(pairwise_df,filt_colon_genus_metadata$Group,
                          covariate = filt_colon_genus_metadata$Country, 
                          interaction = TRUE, 
                          patients = filt_colon_genus_metadata$Patient,
                          sim.method = "robust.aitchison", p.adjust.m="BH")

# tidy the results
pp_factor <- pp_main[[1]]
pp_cov <- pp_main[[2]]
pp_fac.cov <- pp_int[[3]]

cols <- c("pairs","Df","SumsOfSqs", "F.Model","R2","p.value", "p.adjusted", "sig")
colnames(pp_factor) <- cols; colnames(pp_cov) <- cols; colnames(pp_fac.cov) <- cols; 

# save raw results
pairwise_aitchison_raw[[paste(level, segment)]] <-rbind(pp_factor,pp_cov,pp_fac.cov)

# see the results
knitr::kable(pp_factor,digits = 3,caption = "PERMANOVA, GROUP separation")

PERMANOVA, GROUP separation
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted	sig
rPSC vs non-rPSC	1	419.973	2.475	0.007	0.415	0.415
rPSC vs healthy	1	1473.966	8.988	0.035	0.001	0.002	**
non-rPSC vs healthy	1	1742.331	10.401	0.024	0.001	0.002	**

knitr::kable(pp_cov,digits = 3,caption = "PERMANOVA, COUNTRY separation")

PERMANOVA, COUNTRY separation
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted	sig
rPSC vs non-rPSC , Country	1	1556.933	9.177	0.026	0.001	0.001	***
rPSC vs healthy , Country	1	883.284	5.386	0.021	0.001	0.001	***
non-rPSC vs healthy , Country	1	1558.672	9.304	0.021	0.001	0.001	***

knitr::kable(pp_fac.cov,digits = 3,caption = "PERMANOVA, INTERACTION GROUP:Country")

PERMANOVA, INTERACTION GROUP:Country
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted
rPSC vs non-rPSC : Country	1	318.175	1.880	0.005	0.970	0.970
rPSC vs healthy : Country	1	319.594	1.956	0.008	0.820	0.970
non-rPSC vs healthy : Country	1	407.191	2.439	0.006	0.163	0.489

Interaction check

interaction_sig <- pp_fac.cov$pairs[pp_fac.cov$p.adjusted < 0.05]

if (length(interaction_sig)>0){
 for (i in 1:length(interaction_sig)){
  group1 <- unlist(strsplit(interaction_sig[i],split = " vs "))[1]
  group2 <- unlist(strsplit(interaction_sig[i],split = " vs "))[2]
  group2 <- unlist(strsplit(group2,split = " : "))[1]
  
  result_list <- adonis_postanalysis(x=pairwise_df,
                      factors = filt_colon_genus_metadata$Group,
                      covariate = filt_colon_genus_metadata$Country, 
                      group1 = group1,
                      group2 = group2,
                      patients = filt_colon_genus_metadata$Patient)
  print(result_list)
} 
}

3.3.1.0.2 Plots

p <- pca_plot_custom(filt_colon_genus_tab,
                                 filt_colon_genus_taxa,
                                 filt_colon_genus_metadata,
                                 show_boxplots = TRUE,
                                 variable = "Group", size=2, 
                                 show_legend=FALSE)

# save the results
pca_plots_list[[paste(segment,level,"custom")]] <- p

# see the results
p

3.3.1.1 Saving results

write.xlsx(pairwise_aitchison_raw[[paste(level, segment)]], 
           file = file.path(path,
           paste0("beta_diversity_results_", segment,".xlsx")))

3.4 Univariate Analysis

3.4.1 Main - Genus level

level="genus"

# needed paths
path = "../../../results/merged_sites/main_analysis/Q2/univariate_analysis"
path_maaslin=file.path("../../../intermediate_files/maaslin/Q2",level)

# variables
raw_linda_results_genus[[segment]] <- list()
linda_results_genus[[segment]] <- list()

# country and interaction problems
list_country_union <- list()
list_venns <- list()

# workbook for final df
wb <- createWorkbook()

# PSC effect
rpsc_effect <- list()

3.4.1.1 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

colon_genus_asv_taxa_tab <- create_asv_taxa_table(colon_genus_tab,
                                                  colon_genus_taxa_tab)

3.4.1.1.1 rPSC vs non-rPSC

3.4.1.1.1.1 linDA

group <- c("non-rPSC","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

taxa_to_test <- read.xlsx("../../../results/merged_sites/main_analysis/Q1/univariate_analysis/significant_taxa_colon.xlsx", sheet=" genus healthy vs pre_ltx")$SeqID

colon_genus_tab_2 <-  colon_genus_tab %>% dplyr::filter(SeqID %in% taxa_to_test)
colon_genus_taxa_tab_2 <-  colon_genus_taxa_tab %>% dplyr::filter(SeqID %in% taxa_to_test)

# prepare the data
linda_data <- binomial_prep(colon_genus_tab_2,
                            colon_genus_taxa_tab_2,
                            colon_metadata,
                            group, usage="linDA",filtering = FALSE)

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')

0  features are filtered!
The filtered data has  231  samples and  33  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")


for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}

# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.1.1.2 MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)

volcano

3.4.1.1.1.3 Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.1.1.4 Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

NULL

## results for czech cohort
list_interaction_significant[[2]]

[1] NA

## results for norwegian cohort
list_interaction_significant[[3]]

[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

3.4.1.1.2 Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

[1] "non-rPSC"
[1] "rPSC"
[1] "non-rPSC"
[1] "rPSC"

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

3.4.1.1.3 Prevalence testing

uni_df$prevalence_diff <- abs(uni_df$`PREVALENCE_ rPSC` - uni_df$`PREVALENCE_ non-rPSC`)

#uni_df %<>% dplyr::filter(prevalence_diff >=0.2)
#filt_colon_uni_data <- filt_colon_uni_data[uni_df$SeqID,]
filt_colon_uni_data <- ifelse(filt_colon_uni_data>0,"YES","NO")
filt_colon_uni_data %<>% t() %>% as.data.frame()
#filt_colon_uni_taxa %<>% dplyr::filter(SeqID %in% uni_df$SeqID)
filt_colon_uni_data <- merge(filt_colon_uni_data %>% rownames_to_column("SampleID"),filt_colon_metadata,by="SampleID",all.x=TRUE) %>% column_to_rownames("SampleID") %>% distinct(Patient, .keep_all = TRUE) 

p_values <- c()
for (taxon in uni_df$SeqID){
data <- filt_colon_uni_data %>%
  group_by(!!sym(taxon), Group) %>%
  count() %>%
  group_by(!!sym(taxon), Group) %>%
  summarise(n = sum(n), .groups = "drop") %>%
  pivot_wider(names_from = Group, values_from = n) %>%
  column_to_rownames(taxon) %>%
  as.matrix()

data
chi_res <- chisq.test(data)
chi_res$expected
#p_values <- c(p_values,chi_res$p.value)

test <- fisher.test(data)
test$p.value
p_values <- c(p_values,test$p.value)
}

Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect

p_values_adjusted <- p.adjust(p_values,method = "BH")
names(p_values_adjusted) <- uni_df$SeqID
names(p_values) <- uni_df$SeqID
p_values_adjusted[which(p_values_adjusted<0.05)]

 Bilophila 
0.01992028

asv_tab <- as.data.frame(rrarefy(
    linda_data[[1]] %>% t(), 
    sample=10000)) %>% t() %>% as.data.frame()

asv_tab <- ifelse(asv_tab>10,"YES","NO")
asv_tab %<>% t() %>% as.data.frame()

set.seed(123)
asv_tab <- merge(asv_tab %>% rownames_to_column("SampleID"),filt_colon_metadata,by="SampleID",all.x=TRUE) %>% column_to_rownames("SampleID") %>% distinct(Patient, .keep_all = TRUE) 

p_values <- c()
for (taxon in uni_df$SeqID){
data <- asv_tab %>%
  group_by(!!sym(taxon), Group) %>%
  count() %>%
  group_by(!!sym(taxon), Group) %>%
  summarise(n = sum(n), .groups = "drop") %>%
  pivot_wider(names_from = Group, values_from = n) %>%
  column_to_rownames(taxon) %>%
  as.matrix()

data2 <- asv_tab %>%
  group_by(!!sym(taxon), Group,Country) %>%
  count() 


data[is.na(data)] <- 0
chi_res <- chisq.test(data)
chi_res$expected
test <- fisher.test(data)
test$p.value
chi_res$p.value
p_values <- c(p_values,test$p.value)
}

Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect
Warning in chisq.test(data): Chi-squared approximation may be incorrect

p_values_adjusted <- p.adjust(p_values,method = "BH")
names(p_values_adjusted) <- uni_df$SeqID
names(p_values) <- uni_df$SeqID
p_values_adjusted[which(p_values_adjusted<0.05)]

 Bilophila 
0.04557538

df <- data.frame(SeqID=uni_df$SeqID,
                 p=p_values,
                 p_adj=p_values_adjusted)

prevalence_df <- uni_df[,c("SeqID","PREVALENCE_ non-rPSC","PREVALENCE_ rPSC")]
prevalence_df <- merge(prevalence_df,df,by="SeqID",all=TRUE)
write.xlsx(prevalence_df,file.path(path,paste0("supplements_prevalence_",segment,".xlsx")))

3.4.1.2 rPSC vs healthy

group <- c("healthy","rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

3.4.1.2.1 linDA

# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,
                            group, usage="linDA")

Removing 98 ASV(s)

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')

0  features are filtered!
The filtered data has  248  samples and  178  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}

# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.2.1.1 MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

3.4.1.2.1.2 Group - Intersection

intersection_results <- group_intersection(group, 
                                           list_intersections, 
                                           list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.2.1.3 Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

NULL

## results for czech cohort
list_interaction_significant[[2]]

[1] NA

## results for norwegian cohort
list_interaction_significant[[3]]

[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

3.4.1.2.2 Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

[1] "healthy"
[1] "rPSC"
[1] "healthy"
[1] "rPSC"

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

3.4.1.2.3 non-rPSC vs healthy

group <- c("healthy","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

3.4.1.2.3.1 linDA

# prepare the data
linda_data <- binomial_prep(colon_genus_tab,
                            colon_genus_taxa_tab,
                            colon_metadata,group,
                            usage="linDA")

Removing 39 ASV(s)

Removing 6 ASV(s)

filt_colon_uni_data <- linda_data[[1]]
filt_colon_uni_taxa <- linda_data[[2]]
filt_colon_uni_metadata <- linda_data[[3]]

# fit the model
linda.obj <- linda(filt_colon_uni_data,
                   filt_colon_uni_metadata,
                   formula = '~ Group * Country + (1|Patient)')

0  features are filtered!
The filtered data has  424  samples and  159  features will be tested!
Imputation approach is used.
Fit linear mixed effects models ...
Completed.

linda.output <- linda.obj$output
linda.output <- linda_renaming(linda.output, group)

# save the results
group1 <- paste0(group[1], " vs ","Group",group[2])
group2 <- paste0(group[1], " , ",group[2], " - CZ vs NO") 
group3 <- paste0(group[1], " vs ","Group",group[2], ":CountryNO")

for (grp in c(group1,group2,group3)){
  raw_linda_results_genus[[segment]][[grp]] <- 
    rawlinda.df(linda.output,
                grp,
                filt_colon_uni_data,
                filt_colon_uni_taxa)
  
  linda_results_genus[[segment]][[grp]] <- 
    linda.df(linda.output,
             grp,
             filt_colon_uni_data,
             filt_colon_uni_taxa)
}

# volcano plot
volcano_1 <- volcano_plot_linda(linda.output, group1, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle(paste(group,collapse=" vs "))

volcano_2  <- volcano_plot_linda(linda.output, group2, 
                                taxa_table = filt_colon_uni_taxa) + 
            ggtitle("NO vs CZ") 

volcano_3 <- volcano_plot_linda(linda.output, group3, 
                                taxa_table = filt_colon_uni_taxa) +
            ggtitle("Interaction effect")

volcano <- ggarrange(volcano_1,volcano_2,volcano_3, ncol=3)

# see the plot
volcano

3.4.1.2.3.2 MaAsLin2

volcano1 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa) + 
            ggtitle(paste(group[1], "vs", group[2]))

volcano2 <- volcano_plot_maaslin(fit_data,filt_colon_uni_taxa,variable="Country") + 
            ggtitle("NO vs CZ")

volcano <- ggarrange(volcano1,volcano2, ncol=2)
volcano

3.4.1.2.3.3 Group - Intersection

intersection_results <- group_intersection(group, list_intersections, list_venns,
                                           linda.output, fit_data,
                                           raw_linda_results_genus, 
                                           segment = segment,
                                           level=level)

list_intersections <- intersection_results[[1]]
list_venns <- intersection_results[[2]]
venn <- intersection_results[[3]]

# show the results
venn

3.4.1.2.3.4 Interaction effect

list_interaction_significant <- country_interaction(group,
                                                    linda.output, 
                                                    list_intersections,
                                          filt_colon_uni_data,
                                          filt_colon_uni_metadata,
                                          segment=segment,
                                          level=level)

# see the result
## significant interaction effect
list_interaction_significant[[1]]

NULL

## results for czech cohort
list_interaction_significant[[2]]

[1] NA

## results for norwegian cohort
list_interaction_significant[[3]]

[1] NA

Removing problematic taxa

list_intersections <- removing_interaction_problems(group,
                                                    list_interaction_significant,
                                                    list_intersections,
                                                    segment=segment,
                                                    level=level)

3.4.1.2.4 Basic statistics

uni_df <-  merge(basic_univariate_statistics(linda_data,group),
                 raw_linda_results_genus[[segment]][[group1]],
                 by="SeqID",all=TRUE)

[1] "healthy"
[1] "non-rPSC"
[1] "healthy"
[1] "non-rPSC"

uni_df[["final_sig"]] <- uni_df$SeqID %in% list_intersections[[paste(segment,level,comparison_name)]][["SeqID"]]
uni_statistics[[segment]][[paste(level,comparison_name)]] <- uni_df

# for comparison
new_name <- comparison_name
addWorksheet(wb, sheetName = new_name)
writeData(wb, sheet = new_name, uni_df, rowNames=FALSE)

3.4.1.2.5 Visualization

Heatmap visualizing the linDA’s logFoldChange for taxa with p < 0.1.

list_heatmap <- list_intersections[grep(paste(segment,level),
                                  names(list_intersections),value=TRUE)]

p_heatmap_linda <- heatmap_linda(list_heatmap,colon_taxa_tab)
p_heatmap_linda

Dot heatmap

dotheatmap_linda <- dot_heatmap_linda(list_heatmap,
                                      uni_statistics$colon[grepl(level,names(uni_statistics$colon))],
                                      colon_taxa_tab) + xlab("") + ylab("")

min_clr -2.080021 
max_clr 6.901023 
min_log -4.181218 
max_log 3.872081

dotheatmap_linda

Horizontal bar plot

p_prevalence <- horizontal_barplot(wb,taxa=levels(dotheatmap_linda$data$SeqID))

Using SeqID as id variables

p_prevalence_final <- ggarrange(p_prevalence,
                                ggplot() + theme_minimal(),
                                nrow = 2,heights = c(1,0.085))
p <- ggarrange(dotheatmap_linda + theme(legend.position = "none"),p_prevalence_final,ncol=2,widths = c(1,0.3))
p

dot_heatmap_colon <- p

3.4.1.3 Saving results

# ALL DATA
saveWorkbook(wb,file.path(path,paste0("uni_analysis_wb_",segment,".xlsx")),
             overwrite = TRUE)

# PSC effect
write.xlsx(rpsc_effect[[paste(segment,level)]],file.path(path,paste0("rpsc_effect_",segment,".xlsx")))

# SIGNIFICANT taxa

write.xlsx(list_intersections[grepl(segment,names(list_intersections))] %>%
            `names<-`(gsub(segment, "", names(
              list_intersections[grepl(segment,names(list_intersections))]))),
           file.path(path,paste0("significant_taxa_",segment,".xlsx")))

3.5 Machine learning

path = "../../../results/merged_sites/main_analysis/Q2/models"

3.5.1 ElasticNet

model="enet"

3.5.1.1 Genus level

level="genus"

Aggregate taxa

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level = level)

colon_genus_tab <- genus_data[[1]]
colon_genus_taxa_tab <- genus_data[[2]]

3.5.1.1.1 rPSC vs non-rPSC - all

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2])

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep(colon_genus_tab,
                                     colon_genus_taxa_tab,
                                     colon_metadata,
                                     group, 
                                     usage="ml_clr",
                                     patient = TRUE)

Removing 28 ASV(s)

Removing 5 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              clust_var="Patient",
                              reuse=TRUE,
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                      [,1]
alpha                           0.00000000
lambda                          0.03550553
auc                             0.99707181
auc_czech                       0.99495342
auc_no                          1.00000000
auc_optimism_corrected          0.62839790
auc_optimism_corrected_CIL      0.49485088
auc_optimism_corrected_CIU      0.76009870
accuracy                        0.98571429
accuracy_czech                         NaN
accuracy_no                     0.98581560
accuracy_optimism_corrected     0.70988684
accuracy_optimism_corrected_CIL 0.61932014
accuracy_optimism_corrected_CIU 0.80479243

enet_model$conf_matrices

$original
    Predicted
True   0   1
   0 262   1
   1   4  83

$czech
    Predicted
True   0   1
   0 160   1
   1   2  46

$no
    Predicted
True   0   1
   0 102   0
   1   2  37

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

roc_c

3.5.1.1.2 rPSC vs non-rPSC - subset

group <- c("rPSC","non-rPSC")
comparison_name <- paste0(group[1], " vs ",group[2]," subset")

taxa_to_test <- read.xlsx("../../../results/merged_sites/main_analysis/Q1/univariate_analysis/significant_taxa_colon.xlsx", sheet=" genus healthy vs pre_ltx")

model_name <- paste(comparison_name,level,segment)

# prepare the data
filt_colon_uni_data <- binomial_prep_psc_effect(colon_genus_tab,
                                                colon_genus_taxa_tab, 
                                                colon_metadata,
                                                taxa_to_test,
                                                usage="ml_clr",patient = TRUE)

Removing 28 ASV(s)

Removing 5 ASV(s)

# fit the model
enet_model <- glmnet_binomial(filt_colon_uni_data,
                              sample_method = "atypboot",
                              outcome="Group", 
                              N=500,
                              reuse=TRUE,clust_var = "Patient",
                              file=model_name,
                              Q="Q2")

# ROC
roc_c <- roc_curve(enet_model, group)

# save the results
models_summ[[model_name]] <- enet_model$model_summary
models_cm[[model_name]] <- enet_model$conf_matrices$original
roc_cs[[model_name]] <- enet_model$kfold_rocobjs
betas[[model_name]] <- as.matrix(enet_model$betas)

# see the results
enet_model$model_summary %>% t()

                                      [,1]
alpha                           0.80000000
lambda                          0.03321672
auc                             0.83344259
auc_czech                       0.78312629
auc_no                          0.92383107
auc_optimism_corrected          0.65974941
auc_optimism_corrected_CIL      0.49897616
auc_optimism_corrected_CIU      0.79007241
accuracy                        0.79714286
accuracy_czech                         NaN
accuracy_no                     0.82269504
accuracy_optimism_corrected     0.73877807
accuracy_optimism_corrected_CIL 0.63177895
accuracy_optimism_corrected_CIU 0.82893728

enet_model$conf_matrices

$original
    Predicted
True   0   1
   0 258   5
   1  66  21

$czech
    Predicted
True   0   1
   0 156   5
   1  41   7

$no
    Predicted
True   0   1
   0 102   0
   1  25  14

enet_model$plot

`geom_smooth()` using formula = 'y ~ x'

`geom_smooth()` using formula = 'y ~ x'

roc_c

3.5.1.2 Saving results

models_summ_df_colon <- do.call(rbind, 
  models_summ[grep(segment,names(models_summ),value = TRUE)])

write.csv(models_summ_df_colon,file.path(path,paste0("elastic_net_",segment,".csv")))

3.6 Results overview

3.6.0.1 Alpha diversity

Linear models

knitr::kable(pc_observed[[segment]],
             digits = 3,
             caption = "Results of linear model testing ASV Richness")

Results of linear model testing ASV Richness
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj	sig
non-rPSC vs GrouprPSC	-17.252	12.469	156.883	-1.384	0.168	0.192
non-rPSC vs rPSC - CZ vs NO	-17.941	10.644	153.303	-1.686	0.094	0.192
non-rPSC vs GrouprPSC:CountryNO	3.483	21.371	152.108	0.163	0.871	0.871
healthy vs GrouprPSC	-40.307	11.924	125.516	-3.380	0.001	0.009	**
healthy vs rPSC - CZ vs NO	14.783	10.544	126.613	1.402	0.163	0.192
healthy vs GrouprPSC:CountryNO	-29.232	19.769	123.028	-1.479	0.142	0.192
healthy vs Groupnon-rPSC	-22.976	8.905	211.354	-2.580	0.011	0.047	*
healthy vs non-rPSC - CZ vs NO	14.961	10.892	212.956	1.374	0.171	0.192
healthy vs Groupnon-rPSC:CountryNO	-32.911	14.701	208.353	-2.239	0.026	0.079

knitr::kable(pc_shannon[[segment]],
             digits = 3,
             caption = "Results of linear model testing Shannon index")

Results of linear model testing Shannon index
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj
non-rPSC vs GrouprPSC	-0.146	0.161	154.485	-0.907	0.366	0.549
non-rPSC vs rPSC - CZ vs NO	-0.355	0.138	151.788	-2.581	0.011	0.051
non-rPSC vs GrouprPSC:CountryNO	0.008	0.277	150.856	0.030	0.976	0.976
healthy vs GrouprPSC	-0.330	0.128	126.201	-2.572	0.011	0.051
healthy vs rPSC - CZ vs NO	0.017	0.113	127.217	0.146	0.884	0.976
healthy vs GrouprPSC:CountryNO	-0.364	0.213	123.907	-1.709	0.090	0.162
healthy vs Groupnon-rPSC	-0.182	0.101	207.051	-1.795	0.074	0.162
healthy vs non-rPSC - CZ vs NO	0.018	0.124	208.617	0.145	0.884	0.976
healthy vs Groupnon-rPSC:CountryNO	-0.375	0.168	204.129	-2.240	0.026	0.078

knitr::kable(pc_simpson[[segment]],
             digits = 3,
             caption = "Results of linear model testing Simpson index")

Results of linear model testing Simpson index
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj
non-rPSC vs GrouprPSC	-0.005	0.029	153.608	-0.169	0.866	0.866
non-rPSC vs rPSC - CZ vs NO	-0.032	0.024	151.251	-1.296	0.197	0.443
non-rPSC vs GrouprPSC:CountryNO	-0.027	0.049	150.427	-0.554	0.581	0.806
healthy vs GrouprPSC	-0.031	0.020	127.660	-1.523	0.130	0.391
healthy vs rPSC - CZ vs NO	-0.007	0.018	128.411	-0.402	0.688	0.806
healthy vs GrouprPSC:CountryNO	-0.052	0.034	125.982	-1.540	0.126	0.391
healthy vs Groupnon-rPSC	-0.026	0.016	201.924	-1.600	0.111	0.391
healthy vs non-rPSC - CZ vs NO	-0.007	0.020	203.469	-0.363	0.717	0.806
healthy vs Groupnon-rPSC:CountryNO	-0.025	0.026	199.046	-0.952	0.342	0.616

knitr::kable(pc_pielou[[segment]],
             digits = 3,
             caption = "Results of linear model testing Pielou index")

Results of linear model testing Pielou index
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj
non-rPSC vs GrouprPSC	-0.007	0.026	153.863	-0.292	0.771	0.843
non-rPSC vs rPSC - CZ vs NO	-0.052	0.022	150.735	-2.372	0.019	0.171
non-rPSC vs GrouprPSC:CountryNO	-0.009	0.044	149.674	-0.198	0.843	0.843
healthy vs GrouprPSC	-0.023	0.020	127.665	-1.183	0.239	0.538
healthy vs rPSC - CZ vs NO	-0.008	0.017	129.013	-0.457	0.648	0.843
healthy vs GrouprPSC:CountryNO	-0.052	0.033	124.565	-1.605	0.111	0.333
healthy vs Groupnon-rPSC	-0.015	0.015	204.651	-1.006	0.316	0.568
healthy vs non-rPSC - CZ vs NO	-0.008	0.019	206.473	-0.416	0.678	0.843
healthy vs Groupnon-rPSC:CountryNO	-0.044	0.025	201.171	-1.745	0.083	0.333

Plots

alpha_div_plots[[paste(segment,"Country")]]

3.6.0.2 Beta diversity

Main results

knitr::kable(pairwise_aitchison_raw[[paste("genus", segment)]],
             digits = 3,
             caption = "Results of PERMANOVA - robust aitchison distance")

Results of PERMANOVA - robust aitchison distance
pairs	Df	SumsOfSqs	F.Model	R2	p.value	p.adjusted	sig
rPSC vs non-rPSC	1	419.973	2.475	0.007	0.415	0.415
rPSC vs healthy	1	1473.966	8.988	0.035	0.001	0.002	**
non-rPSC vs healthy	1	1742.331	10.401	0.024	0.001	0.002	**
rPSC vs non-rPSC , Country	1	1556.933	9.177	0.026	0.001	0.001	***
rPSC vs healthy , Country	1	883.284	5.386	0.021	0.001	0.001	***
non-rPSC vs healthy , Country	1	1558.672	9.304	0.021	0.001	0.001	***
rPSC vs non-rPSC : Country	1	318.175	1.880	0.005	0.970	0.970
rPSC vs healthy : Country	1	319.594	1.956	0.008	0.820	0.970
non-rPSC vs healthy : Country	1	407.191	2.439	0.006	0.163	0.489

PCA

pca_plots_list[[paste(segment,"genus custom")]]

3.6.0.3 Univariate analysis

Number of significant taxa

knitr::kable(as.data.frame(lapply(list_intersections,nrow)) %>% t() %>% 
  `colnames<-`("Count") )

	Count
terminal_ileum.genus.non.rPSC.vs.rPSC	0
terminal_ileum.genus.healthy.vs.rPSC	34
terminal_ileum.genus.healthy.vs.non.rPSC	25
colon.genus.non.rPSC.vs.rPSC	0
colon.genus.healthy.vs.rPSC	56
colon.genus.healthy.vs.non.rPSC	30

3.6.0.4 Machine learning

Main models

knitr::kable(models_summ_df_colon %>% dplyr::select(
"alpha","lambda",
"auc_optimism_corrected",
"auc_optimism_corrected_CIL",
"auc_optimism_corrected_CIU"),
             digits=3,caption="Elastic net results")

Elastic net results
	alpha	lambda	auc_optimism_corrected	auc_optimism_corrected_CIL	auc_optimism_corrected_CIU
rPSC vs non-rPSC genus colon	0.0	0.036	0.628	0.495	0.76
rPSC vs non-rPSC subset genus colon	0.8	0.033	0.660	0.499	0.79

ROC - Genus level

p <- roc_curve_all_custom(roc_cs[3:4],Q="Q2",
                     model_name="enet_model",legend=FALSE)

4 Paper-ready visualizations

Alpha diversity

p_A <- alpha_div_plots$`terminal_ileum Country` +
  ggtitle("Terminal ileum")+
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) 

p_B <-  alpha_div_plots$`colon Country` +
  ggtitle("Colon") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)) 

Q2_alpha <- ggarrange(p_A,ggplot() + theme_minimal(),p_B,nrow=1, ncol=3,
                      widths = c(1,0.1,1))
Q2_alpha

Beta diversity

pca_ti <- pca_plots_list$`terminal_ileum genus custom` 
pca_colon <- pca_plots_list$`colon genus custom` 

genus_Q2_beta <- ggarrange(pca_ti,
                           ggplot() + theme_minimal(),
                           pca_colon,ncol=3,
                           widths = c(1,0.1,1))
genus_Q2_beta

Alpha + Beta diversity

alpha_beta <- ggarrange(Q2_alpha,genus_Q2_beta,
                        ncol = 1,nrow=2,labels = c("A","B"))
alpha_beta

Elastic net

models_to_plot <- c("knn_model","rf_model","gbm_model","enet_model")
names(models_to_plot) <- c("kNN","RF","GBoost","ENet")

models_to_plot <- c("enet_model")
rocs_list <- list()
rocs_list[["enet_model"]] <- roc_cs
#names(models_to_plot) <- c("kNN","RF","GBoost","ENet")

# ILEUM
plot_list_ileum <- list()
for (model_name in models_to_plot) {
  plot_list_ileum[[model_name]] <- 
    roc_curve_all_custom(rocs_list[[model_name]][c(1:2)],
                       Q="Q2",
                       model_name=model_name,legend = FALSE) + 
    #ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) + 
    theme(plot.title = element_text(face = "bold",size = 10)) 
}

roc_curve_ti <- ggarrange(plotlist = plot_list_ileum)

# COLON

plot_list_colon <- list()
for (model_name in models_to_plot) {
  plot_list_colon[[model_name]] <- 
    roc_curve_all_custom(rocs_list[[model_name]][c(3:4)],
                       Q="Q2",
                       model_name=model_name,legend = FALSE) + 
    #ggtitle(names(models_to_plot)[which(model_name==models_to_plot)]) + 
    theme(plot.title = element_text(face = "bold",size = 10))  
}

roc_curve_colon <- ggarrange(plotlist = plot_list_colon)


roc_curve_plot <- ggarrange(roc_curve_ti,
                            ggplot() + theme_minimal(),
                            roc_curve_colon,
                            ncol=3, widths = c(1,0.1,1))
roc_curve_plot

4.1 Figure 4

alpha_beta_elastic <- ggarrange(Q2_alpha,genus_Q2_beta,roc_curve_plot,
                        ncol = 1,nrow=3,labels = LETTERS,heights = c(1,1,0.8))
alpha_beta_elastic

alpha_beta_elastic <- ggarrange(alpha_beta_elastic,ggplot() + theme_minimal(),ncol=2,
                                widths = c(1,0.15))

alpha_beta_elastic

4.2 Supplementary Figure - DAA

p_ileum <- dot_heatmap_ileum + 
  ggtitle("Terminal ileum") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
        legend.position = "none")


p_colon <- dot_heatmap_colon  +
  ggtitle("Colon") +
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15),
        legend.position = "none")

heatmap_plot <- ggarrange(p_ileum,
                          p_colon,
                          ncol = 2,labels=c("A","B"))
heatmap_plot

5 Session info

sessionInfo()

R version 4.3.1 (2023-06-16 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 11 x64 (build 26100)

Matrix products: default


locale:
[1] LC_COLLATE=Czech_Czechia.utf8  LC_CTYPE=Czech_Czechia.utf8   
[3] LC_MONETARY=Czech_Czechia.utf8 LC_NUMERIC=C                  
[5] LC_TIME=Czech_Czechia.utf8    

time zone: Europe/Prague
tzcode source: internal

attached base packages:
[1] parallel  stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] mice_3.17.0              picante_1.8.2            ape_5.8-1               
 [4] lubridate_1.9.3          forcats_1.0.0            stringr_1.5.1           
 [7] tidyverse_2.0.0          kableExtra_1.4.0         tidyr_1.3.1             
[10] gbm_2.2.2                doParallel_1.0.17        iterators_1.0.14        
[13] foreach_1.5.2            ranger_0.17.0            ggvenn_0.1.15           
[16] Maaslin2_1.16.0          purrr_1.0.2              pROC_1.18.5             
[19] glmnet_4.1-8             MicrobiomeStat_1.2       caret_6.0-94            
[22] openxlsx_4.2.8           magrittr_2.0.3           emmeans_1.11.2-8        
[25] lmerTest_3.1-3           robustlmm_3.3-1          lme4_1.1-37             
[28] Matrix_1.6-5             mgcv_1.9-1               nlme_3.1-164            
[31] pheatmap_1.0.13          reshape2_1.4.4           vegan_2.6-10            
[34] lattice_0.22-6           permute_0.9-8            ggplotify_0.1.2         
[37] ggrepel_0.9.6            ggpubr_0.6.1             MicrobiotaProcess_1.14.1
[40] phyloseq_1.46.0          ggplot2_3.5.2            tibble_3.2.1            
[43] dplyr_1.1.4              cowplot_1.2.0            readr_2.1.5             
[46] igraph_2.1.4             ccrepe_1.38.1            data.table_1.15.4       

loaded via a namespace (and not attached):
  [1] fs_1.6.5                    matrixStats_1.5.0          
  [3] bitops_1.0-9                RColorBrewer_1.1-3         
  [5] numDeriv_2016.8-1.1         tools_4.3.1                
  [7] backports_1.4.1             R6_2.6.1                   
  [9] lazyeval_0.2.2              jomo_2.7-6                 
 [11] rhdf5filters_1.14.1         withr_3.0.2                
 [13] gridExtra_2.3               cli_3.6.2                  
 [15] Biobase_2.62.0              logging_0.10-108           
 [17] biglm_0.9-3                 sandwich_3.1-1             
 [19] labeling_0.4.3              mvtnorm_1.3-3              
 [21] robustbase_0.99-4-1         pbapply_1.7-4              
 [23] systemfonts_1.2.2           yulab.utils_0.2.1          
 [25] svglite_2.1.3               parallelly_1.38.0          
 [27] rstudioapi_0.17.1           generics_0.1.4             
 [29] gridGraphics_0.5-1          shape_1.4.6.1              
 [31] car_3.1-3                   zip_2.3.2                  
 [33] biomformat_1.30.0           S4Vectors_0.40.2           
 [35] abind_1.4-8                 infotheo_1.2.0.1           
 [37] lifecycle_1.0.4             multcomp_1.4-28            
 [39] yaml_2.3.10                 carData_3.0-5              
 [41] SummarizedExperiment_1.32.0 Rtsne_0.17                 
 [43] rhdf5_2.46.1                recipes_1.3.1              
 [45] SparseArray_1.2.4           grid_4.3.1                 
 [47] mitml_0.4-5                 crayon_1.5.3               
 [49] pillar_1.11.0               knitr_1.50                 
 [51] optparse_1.7.5              GenomicRanges_1.54.1       
 [53] statip_0.2.3                boot_1.3-32                
 [55] estimability_1.5.1          future.apply_1.11.3        
 [57] codetools_0.2-20            pan_1.9                    
 [59] glue_1.7.0                  ggfun_0.2.0                
 [61] vctrs_0.6.5                 treeio_1.26.0              
 [63] Rdpack_2.6.4                gtable_0.3.6               
 [65] maditr_0.8.6                gower_1.0.1                
 [67] xfun_0.52                   rbibutils_2.3              
 [69] S4Arrays_1.2.1              prodlim_2024.06.25         
 [71] libcoin_1.0-10              coda_0.19-4.1              
 [73] reformulas_0.4.1            pcaPP_2.0-5                
 [75] modeest_2.4.0               survival_3.5-8             
 [77] timeDate_4041.110           hardhat_1.4.2              
 [79] lava_1.8.1                  statmod_1.5.0              
 [81] TH.data_1.1-4               ipred_0.9-15               
 [83] ggtree_3.10.1               GenomeInfoDb_1.38.8        
 [85] fBasics_4041.97             rpart_4.1.23               
 [87] colorspace_2.1-1            BiocGenerics_0.48.1        
 [89] DBI_1.2.3                   nnet_7.3-19                
 [91] ade4_1.7-23                 tidyselect_1.2.1           
 [93] timeSeries_4041.111         compiler_4.3.1             
 [95] microbiome_1.24.0           xml2_1.3.8                 
 [97] DelayedArray_0.28.0         scales_1.4.0               
 [99] DEoptimR_1.1-4              spatial_7.3-18             
[101] rappdirs_0.3.3              digest_0.6.35              
[103] minqa_1.2.8                 rmarkdown_2.29             
[105] XVector_0.42.0              htmltools_0.5.8.1          
[107] pkgconfig_2.0.3             MatrixGenerics_1.14.0      
[109] stabledist_0.7-2            fastmap_1.2.0              
[111] rlang_1.1.3                 htmlwidgets_1.6.4          
[113] farver_2.1.2                zoo_1.8-13                 
[115] jsonlite_2.0.0              ModelMetrics_1.2.2.2       
[117] RCurl_1.98-1.17             modeltools_0.2-24          
[119] Formula_1.2-5               GenomeInfoDbData_1.2.11    
[121] patchwork_1.3.2             Rhdf5lib_1.24.2            
[123] Rcpp_1.0.12                 ggnewscale_0.5.2           
[125] fastGHQuad_1.0.1            stringi_1.8.3              
[127] ggstar_1.0.4                stable_1.1.6               
[129] zlibbioc_1.48.2             MASS_7.3-60.0.1            
[131] plyr_1.8.9                  listenv_0.9.1              
[133] Biostrings_2.70.3           splines_4.3.1              
[135] hash_2.2.6.3                multtest_2.58.0            
[137] hms_1.1.3                   ggtreeExtra_1.12.0         
[139] ggsignif_0.6.4              stats4_4.3.1               
[141] rmutil_1.1.10               evaluate_1.0.5             
[143] nloptr_2.2.1                tzdb_0.5.0                 
[145] getopt_1.20.4               future_1.34.0              
[147] clue_0.3-66                 coin_1.4-3                 
[149] broom_1.0.9                 xtable_1.8-4               
[151] tidytree_0.4.6              rstatix_0.7.2              
[153] viridisLite_0.4.2           class_7.3-22               
[155] aplot_0.2.8                 IRanges_2.36.0             
[157] cluster_2.1.8.1             timechange_0.3.0           
[159] globals_0.18.0