Mucosal microbiota alterations in primary sclerosis cholangitis persist after liver transplantation and are associated with clinical features independently of geography

Clinical analysis

1 Introduction

Following a comprehensive analysis of the microbiome data, we extended our investigation to clinical parameters, incorporating the previously computed Microbial Dysbiosis Index (MDI). Our analysis focused on evaluating the relationships between the MDI, individual microbial taxa, and relevant clinical variables.

Specifically, this script includes multiple statistical and visualization approaches to explore these associations. First, we present MDI distributions across different groups using boxplots and ROC curves to evaluate the discriminatory power of the MDI.

Additionally, we analyze the correlation between alpha diversity and MDI to assess whether overall microbiome diversity is linked to dysbiosis severity. These relationships are further explored through correlation analyses, where we visualize associations between MDI and clinical variables using heatmaps. Finally, we assess the correlation between specific PSC-associated bacterial taxa and clinical parameters, highlighting potential microbial biomarkers for disease progression and severity.

Importing libraries and custom functions built for this analysis

source("custom_functions.R")

2 Import clinical data

2.1 CZ

# clinical metadata
metadata_clinical <- read.csv("../../data/clinical_data/clinical_metadata_cz.csv")
metadata_clinical$PatientID <- as.character(metadata_clinical$PatientID)

# DYSBIOSIS
metadata_dysbiosis <- read.csv("../../data/clinical_data/dysbiosis_metadata.csv") %>%
  dplyr::filter(Country=="CZ")

# ALPHA DIVERSITY
metadata_alpha_ileum <- read.csv(
  "../results/Q1/alpha_diversity/alpha_indices_terminal_ileum.csv") %>%
  dplyr::filter(Country=="CZ")

metadata_alpha_colon <- read.csv(
  "../results/Q1/alpha_diversity/alpha_indices_colon.csv") %>%
  dplyr::filter(Country=="CZ")

metadata_alpha <- rbind(metadata_alpha_ileum,metadata_alpha_colon) %>%
  mutate(PatientID=Patient) %>%
  dplyr::select(-c(Patient, Group))

# MERGING
metadata_cz <- full_join(metadata_clinical, metadata_dysbiosis, by=c("SampleID","Matrix","PatientID","Group","Country"))

metadata_cz <- full_join(metadata_cz, metadata_alpha, by=c("SampleID","PatientID","Country"))

metadata_cz$Group <- factor(metadata_cz$Group,levels = c("healthy","pre_ltx","non-rPSC","rPSC"))

2.2 NO

# clinical metadata
metadata_clinical <- read.csv("../../data/clinical_data/clinical_metadata_no.csv")
metadata_clinical %<>% mutate(PatientID=subjectid,
                              Matrix=segment) %>%
  dplyr::select(-subjectid,-segment)
metadata_clinical$PatientID <- as.character(paste0("NO_",metadata_clinical$PatientID))

# DYSBIOSIS
metadata_dysbiosis <- read.csv("../../data/clinical_data/dysbiosis_metadata.csv") %>%
  dplyr::filter(Country=="NO")  %>%
  dplyr::select(-c(Patient))

# ALPHA DIVERSITY
metadata_alpha_ileum <- read.csv(
  "../results/Q1/alpha_diversity/alpha_indices_terminal_ileum.csv") %>%
  dplyr::filter(Country=="NO")

metadata_alpha_colon <- read.csv(
  "../results/Q1/alpha_diversity/alpha_indices_colon.csv") %>%
  dplyr::filter(Country=="NO")

metadata_alpha <- rbind(metadata_alpha_ileum,metadata_alpha_colon) %>%
  mutate(PatientID=Patient) %>%
  dplyr::select(-c(Patient, Group))

# MERGING
metadata_no <- full_join(metadata_clinical, metadata_dysbiosis, by=c("SampleID","Matrix","PatientID","Group","Country"))

metadata_no <- full_join(metadata_no, metadata_alpha, by=c("SampleID","PatientID","Country"))

metadata_no$Group <- factor(metadata_no$Group,levels = c("healthy","pre_ltx","non-rPSC","rPSC"))

2.3 Merging metadata

metadata_cz %<>% dplyr::mutate(Calprotectin=Fecal.calprotectin,
                             AOM=AOM_score,
                             APRI=APRI_score,
                             FIB4=FIB4_score) %>%
  dplyr::select(SampleID,Matrix,PatientID,Group,Country,Bilirubin,ALP,Calprotectin,
                MAYO_PSC,AOM,APRI,FIB4,Platelets,AST,Creatinine,
                Albumin,ALT,PSC_IBD,GGT,INR,Albumin,Nancy_max,eMAYO,MAYO_dai,
                dys_unfiltered_asv,dys_unfiltered_genus,
                dys_filtered_asv,dys_filtered_genus,
                Observe,Shannon,Simpson,Pielou)

metadata_no %<>% dplyr::mutate(Platelets=TRC,
                             Creatinine=Kreatinin,
                             MAYO_PSC=Mayo_score,
                             AST=ASAT/60,
                             ALT=ALAT/60,
                             PSC_IBD=IBD,
                             Bilirubin=Bilirubin*17.1,
                             ALP=ALP/60,
                             Albumin=Albumin*10) %>%
  dplyr::mutate(PSC_IBD = case_when(
      PSC_IBD == "no_ibd" ~ "0",
      PSC_IBD == "ibd" ~ "1",
      TRUE ~ Group  # Keep other values as is
    )) %>%
  dplyr::select(SampleID,Matrix,PatientID,Group,Country,Bilirubin,ALP,Calprotectin,
                MAYO_PSC,AOM,APRI,FIB4,Platelets,AST,Creatinine,
                Albumin,ALT,PSC_IBD,
                dys_unfiltered_asv,dys_unfiltered_genus,
                dys_filtered_asv,dys_filtered_genus,
                Observe,Shannon,Simpson,Pielou) 

metadata_final <- merge(metadata_cz,metadata_no,all = TRUE)

variables <- c("PatientID", "Group", "Country",
               "Bilirubin", "Albumin",
               "ALP", "Platelets",
               "Calprotectin",
               "MAYO_PSC","AOM", 
               "APRI","INR",
               "FIB4",
               "Creatinine", 
               "ALT","AST","GGT",
               "Nancy_max","eMAYO","MAYO_dai","PSC_IBD")

metadata_final$Calprotectin[metadata_final$Group=="healthy"] <- NA
metadata_final$Calprotectin[metadata_final$Calprotectin==">6000"] <- 6000
metadata_final$Calprotectin <- as.numeric(metadata_final$Calprotectin)
  
metadata_ileum <- metadata_final %>% subset(Matrix=="TI") %>% as.data.frame()
metadata_colon <- metadata_final %>% subset(Matrix!="TI") %>% as.data.frame()

metadata_for_boxplots <- metadata_final[,variables] %>%
  group_by(PatientID) %>%
  distinct(PatientID, .keep_all = TRUE) %>%
  as.data.frame()

3 Import microbiome data

3.1 Data Import

Importing ASV, taxa and metadata tables for both Czech and Norway samples.

Czech

path = "../../data/analysis_ready_data/ikem/"
asv_tab_ikem <- as.data.frame(fread(file.path(path,"asv_table_ikem.csv"),
                                    check.names = FALSE))
taxa_tab_ikem <- as.data.frame(fread(file.path(path,"taxa_table_ikem.csv"),
                                     check.names = FALSE))
metadata_ikem <- as.data.frame(fread(file.path(path,"metadata_ikem.csv"),
                                     check.names = FALSE))

Norway

path = "../../data/analysis_ready_data/norway/"
asv_tab_norway <- as.data.frame(fread(file.path(path,"asv_table_norway.csv"),
                                    check.names = FALSE))
taxa_tab_norway <- as.data.frame(fread(file.path(path,"taxa_table_norway.csv"),
                                     check.names = FALSE))
metadata_norway <- as.data.frame(fread(file.path(path,"metadata_norway.csv"),
                                     check.names = FALSE))

3.2 Spliting to segments

Terminal ileum

ileum_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="TI",Q="clinical")

Removing 1498 ASV(s)

Removing 1834 ASV(s)

Merging at ASV level

Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment

ileum_asv_tab <- ileum_data[[1]]
ileum_taxa_tab <- ileum_data[[2]]
ileum_metadata <- ileum_data[[3]]

Colon

colon_data <- merging_data(asv_tab_1=asv_tab_ikem,
                           asv_tab_2=asv_tab_norway,
                           taxa_tab_1=taxa_tab_ikem,
                           taxa_tab_2=taxa_tab_norway,
                           metadata_1=metadata_ikem,
                           metadata_2=metadata_norway,
                           segment="colon",Q="clinical")

Removing 739 ASV(s)

Removing 266 ASV(s)

Merging at ASV level

Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment

colon_asv_tab <- colon_data[[1]]
colon_taxa_tab <- colon_data[[2]]
colon_metadata <- colon_data[[3]]

3.3 Filtering

Rules: - sequencing depth > 10000 - nearZeroVar() with default settings

Terminal ileum

Sequencing depth

data_filt <- seq_depth_filtering(ileum_asv_tab,
                                 ileum_taxa_tab,
                                 ileum_metadata,
                                 seq_depth_threshold = 10000)

Removing 131 ASV(s)

filt_ileum_asv_tab <- data_filt[[1]]; alpha_ileum_asv_tab <- filt_ileum_asv_tab
filt_ileum_taxa_tab <- data_filt[[2]]; alpha_ileum_taxa_tab <- filt_ileum_taxa_tab
filt_ileum_metadata <- data_filt[[3]]; alpha_ileum_metadata <- filt_ileum_metadata

seq_step <- dim(filt_ileum_asv_tab)[1]

NearZeroVar

data_filt <- nearzerovar_filtering(filt_ileum_asv_tab, 
                                   filt_ileum_taxa_tab,
                                   filt_ileum_metadata)

filt_ileum_asv_tab <- data_filt[[1]]
filt_ileum_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_ileum_asv_tab)[1]

Colon

Sequencing depth

data_filt <- seq_depth_filtering(colon_asv_tab,
                                 colon_taxa_tab,
                                 colon_metadata,
                                 seq_depth_threshold = 10000)

Removing 75 ASV(s)

filt_colon_asv_tab <- data_filt[[1]]; alpha_colon_asv_tab <- filt_colon_asv_tab
filt_colon_taxa_tab <- data_filt[[2]]; alpha_colon_taxa_tab <- filt_colon_taxa_tab
filt_colon_metadata <- data_filt[[3]]; alpha_colon_metadata <- filt_colon_metadata

seq_step <- dim(filt_colon_asv_tab)[1]

NearZeroVar

data_filt <- nearzerovar_filtering(filt_colon_asv_tab, 
                                   filt_colon_taxa_tab,
                                   filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]
filt_colon_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_colon_asv_tab)[1]

Check zero depth

data_filt <- check_zero_depth(filt_colon_asv_tab, 
                              filt_colon_taxa_tab, 
                              filt_colon_metadata)

filt_colon_asv_tab <- data_filt[[1]]; 
filt_colon_taxa_tab <- data_filt[[2]]; 
filt_colon_metadata <- data_filt[[3]];

4 MDI

4.1 Terminal ileum

4.1.1 Boxplots

4.1.1.1 PSC groups

metadata_ileum_melted <- melt(metadata_ileum %>% 
                                dplyr::select("Group", 
                                              "dys_unfiltered_asv",
                                              "dys_unfiltered_genus",
                                              "dys_filtered_asv",
                                              "dys_filtered_genus"))

Using Group as id variables

p_il <- ggplot(metadata_ileum_melted) + 
  geom_boxplot(aes(x=Group, y=value),outliers = FALSE) + 
  geom_jitter(width = 0.2,height = 0,aes(x=Group, y=value, color=Group), 
              size=2) +
  facet_wrap(~variable, ncol = 4,scales = "free") + 
  theme_minimal() + 
    theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
          axis.ticks.x = element_line(size=0.3,color = "black"),
          axis.ticks.y = element_line(size=0.3,color="black"),
          axis.ticks.length = unit(4,"pt"),
          panel.grid = element_blank()) + 
  scale_fill_manual(values=c("#309f87","#f9c675","#F08080","#A00000")) + 
  scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000"))

p_il

FILTERED GENUS

dys_limit <- max(metadata_ileum$dys_filtered_genus,na.rm = TRUE) + 1.1*max(metadata_ileum$dys_filtered_genus,na.rm = TRUE)
dys_min_limit <-  min(metadata_ileum$dys_filtered_genus)

p_il_dys_genus <- ggplot(metadata_ileum %>%
                            mutate(`MDI`=dys_filtered_genus)) + 
  geom_boxplot(aes(x=Group, y=`MDI`),outliers = FALSE) + 
  geom_jitter(width = 0.3,height = 0,
              aes(x=Group, y=`MDI`, color=Group,shape=Country),
              size=1) +
  theme_minimal() + 
    theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
          axis.ticks.x = element_line(size=0.3,color = "black"),
          axis.ticks.y = element_line(size=0.3,color="black"),
          axis.ticks.length = unit(4,"pt"),
          panel.grid = element_blank()) + 
  ylim(dys_min_limit,dys_limit) + 
  scale_fill_manual(values=c("#309f87","#f9c675","#F08080","#A00000")) + 
  scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000")) + 
  theme(axis.text.x = element_text(angle = 45,vjust = 0.5))

p_il_dys_genus

4.1.1.2 IBD

metadata_ileum_melted <- melt(metadata_ileum %>% 
                                dplyr::filter(PSC_IBD %in% c(0,1)) %>%
                                dplyr::select("PSC_IBD", 
                                              "dys_unfiltered_asv",
                                              "dys_unfiltered_genus",
                                              "dys_filtered_asv",
                                              "dys_filtered_genus"))

Using PSC_IBD as id variables

p_il <- ggplot(metadata_ileum_melted) + 
  geom_boxplot(aes(x=PSC_IBD, y=value),outliers = FALSE) + 
  geom_jitter(width = 0.2,height = 0,aes(x=PSC_IBD, y=value, color=PSC_IBD), 
              size=2) +
  facet_wrap(~variable, ncol = 4,scales = "free") + 
  theme_minimal() + 
    theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
          axis.ticks.x = element_line(size=0.3,color = "black"),
          axis.ticks.y = element_line(size=0.3,color="black"),
          axis.ticks.length = unit(4,"pt"),
          panel.grid = element_blank()) + 
  scale_fill_manual(values=c("#A06A2C", "#B2182B")) + 
  scale_color_manual(values=c("#A06A2C", "#B2182B"))

p_il

FILTERED GENUS

metadata_ileum_for_plot <- metadata_ileum %>% 
                                dplyr::filter(PSC_IBD %in% c(0,1)) %>%
  mutate(Group=case_when(PSC_IBD == 1 ~ "ibd",
                       PSC_IBD == 0 ~ "no_ibd"))

metadata_ileum_for_plot$Group <- factor(metadata_ileum_for_plot$Group,levels = c("no_ibd","ibd"))

dys_limit <- max(metadata_ileum_for_plot$dys_filtered_genus,na.rm = TRUE) + 0.6*max(metadata_ileum_for_plot$dys_filtered_genus,na.rm = TRUE)
dys_min_limit <-  min(metadata_ileum_for_plot$dys_filtered_genus)

p_il_dys_genus_ibd <- ggplot(metadata_ileum_for_plot %>%
                           mutate(`MDI`=dys_filtered_genus)) + 
  geom_boxplot(aes(x=Group, y=`MDI`),
               outliers = FALSE) + 
  geom_jitter(width = 0.3,height = 0,
              aes(x=Group, y=`MDI`, color=Group,shape=Country),
              size=1.5) +
  theme_minimal() + 
    theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
          axis.ticks.x = element_line(size=0.3,color = "black"),
          axis.ticks.y = element_line(size=0.3,color="black"),
          axis.ticks.length = unit(4,"pt"),
          panel.grid = element_blank()) + 
  ylim(dys_min_limit,dys_limit) + 
  scale_fill_manual(values=c("#A06A2C", "#B2182B")) + 
  scale_color_manual(values=c("#A06A2C", "#B2182B")) + 
  theme(axis.text.x = element_text(angle = 45,vjust = 0.5))

p_il_dys_genus_ibd

4.1.2 ROCs

comparisons <- combn(unique(metadata_ileum$Group), 2, simplify = TRUE)

comparisons <- comparisons[,c(3,2,1,6,5,4)]
  
auc_df <- data.frame(Comparison=rep(NA,ncol(comparisons)),
                     AUC=rep(NA,ncol(comparisons)))

roc_curve_list <- list()
roc_curve_data_list <- list()

for (i in 1:ncol(comparisons)){
  name <- paste0(comparisons[1,i]," vs ", comparisons[2,i])
  metadata_sub <- subset(metadata_ileum,metadata_ileum$Group %in% comparisons[,i])
  metadata_sub$Group <- factor(metadata_sub$Group,
                               levels=c(comparisons[1,i],comparisons[2,i]))
  roc_curve_data <- roc(metadata_sub$Group, metadata_sub$dys_unfiltered_genus) 
  auc_value <- auc(roc_curve_data)  # Compute AUC
  auc_df[i,] <- c(name,auc_value)
  roc_curve_list[[name]] <- ggroc(roc_curve_data) + 
    theme_minimal() + 
    theme(legend.position = "none") + 
    ggtitle(paste0(name,' (AUC = ', round(auc_value,2))) + 
    theme(plot.title = element_text(size = 7))
  roc_curve_data_list[[name]] <- roc_curve_data
}

Setting levels: control = non-rPSC, case = healthy

Setting direction: controls > cases

Setting levels: control = non-rPSC, case = pre_ltx

Setting direction: controls > cases

Setting levels: control = non-rPSC, case = rPSC

Setting direction: controls < cases

Setting levels: control = pre_ltx, case = healthy

Setting direction: controls > cases

Setting levels: control = rPSC, case = healthy

Setting direction: controls > cases

Setting levels: control = rPSC, case = pre_ltx

Setting direction: controls > cases

auc_df

           Comparison               AUC
1 non-rPSC vs healthy 0.817806603773585
2 non-rPSC vs pre_ltx 0.470678225395207
3    non-rPSC vs rPSC 0.589067702552719
4  pre_ltx vs healthy            0.8375
5     rPSC vs healthy            0.8875
6     rPSC vs pre_ltx 0.552861685214626

roc_ileum <- roc_curve_all(roc_curve_data_list)
roc_ileum

4.1.3 Linear model

PSC groups

results_model <- pairwise.lm(
  formula = "dys_filtered_genus ~ Group * Country",
  factors=metadata_ileum$Group,
  data=metadata_ileum)

knitr::kable(results_model[[1]][,c("Estimate", "p.adj","sig")],
             digits=3,caption="Results of linear modeling")

Results of linear modeling
	Estimate	p.adj	sig
healthy vs Groupnon-rPSC	17.097	0.000	***
healthy , non-rPSC - CZ vs NO	-0.803	0.781
healthy vs Groupnon-rPSC:CountryNO	-1.684	0.781
healthy vs GrouprPSC	22.812	0.000	***
healthy , rPSC - CZ vs NO	-0.803	0.781
healthy vs GrouprPSC:CountryNO	-4.288	0.652
healthy vs Grouppre_ltx	25.244	0.000	***
healthy , pre_ltx - CZ vs NO	-0.803	0.781
healthy vs Grouppre_ltx:CountryNO	-9.217	0.080
non-rPSC vs GrouprPSC	5.715	0.191
non-rPSC , rPSC - CZ vs NO	-2.488	0.705
non-rPSC vs GrouprPSC:CountryNO	-2.603	0.781
pre_ltx vs Groupnon-rPSC	-8.148	0.056
pre_ltx , non-rPSC - CZ vs NO	-10.021	0.033	*
pre_ltx vs Groupnon-rPSC:CountryNO	7.533	0.259
pre_ltx vs GrouprPSC	-2.433	0.781
pre_ltx , rPSC - CZ vs NO	-10.021	0.039	*
pre_ltx vs GrouprPSC:CountryNO	4.930	0.737

IBD

results_model <- pairwise.lm(
  formula = "dys_filtered_genus ~ Group * Country",
  factors=metadata_ileum_for_plot$Group,
  data=metadata_ileum_for_plot)

knitr::kable(results_model[[1]],
             digits=3,caption="Results of linear modeling")

Results of linear modeling
	Estimate	Std. Error	t value	Pr(>\|t\|)
no_ibd vs Groupibd	4.144	4.022	1.030	0.304
no_ibd , ibd - CZ vs NO	-7.866	5.111	-1.539	0.125
no_ibd vs Groupibd:CountryNO	4.921	5.603	0.878	0.381

4.2 Colon

4.2.1 Boxplots

4.2.1.1 PSC_groups

colon_metadata_melted <- melt(metadata_colon %>% 
                                dplyr::select("Group", 
                                              "dys_unfiltered_asv",
                                              "dys_unfiltered_genus",
                                              "dys_filtered_asv",
                                              "dys_filtered_genus"))

Using Group as id variables

p_col <- ggplot(colon_metadata_melted) + 
  geom_boxplot(aes(x=Group, y=value),outliers = FALSE) + 
  geom_jitter(width = 0.2,height = 0,
              aes(x=Group, y=value, color=Group),
              size=2) +
  facet_wrap(~variable, ncol = 4,scales = "free") + 
  theme_minimal() + 
 theme(panel.border = element_rect(color = "black", fill = NA, size = 0)) +
  scale_fill_manual(values=c("#309f87","#f9c675","#F08080","#A00000")) + 
  scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000"))

p_col

FILTERED GENUS

dys_limit <- max(metadata_colon$dys_filtered_genus,na.rm = TRUE) + 1.1*max(metadata_colon$dys_filtered_genus,na.rm = TRUE)
dys_min_limit <-  min(metadata_colon$dys_filtered_genus)

p_col_dys_genus <- ggplot(metadata_colon %>%
                            mutate(`MDI`=dys_filtered_genus)) + 
  geom_boxplot(aes(x=Group, y=`MDI`),outliers = FALSE) + 
  geom_jitter(width = 0.3,height = 0,
              aes(x=Group, y=`MDI`, color=Group,shape=Country),
              size=1) +
  theme_minimal() + 
    theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
          axis.ticks.x = element_line(size=0.3,color = "black"),
          axis.ticks.y = element_line(size=0.3,color="black"),
          axis.ticks.length = unit(4,"pt"),
          panel.grid = element_blank()) + 
  ylim(dys_min_limit,dys_limit) + 
  scale_fill_manual(values=c("#309f87","#f9c675","#F08080","#A00000")) + 
  scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000")) + 
  theme(axis.text.x = element_text(angle = 45,vjust = 0.5))

p_col_dys_genus

4.2.1.2 IBD

metadata_colon_melted <- melt(metadata_colon %>% 
                                dplyr::filter(PSC_IBD %in% c(0,1)) %>%
                                dplyr::select("PSC_IBD", 
                                              "dys_unfiltered_asv",
                                              "dys_unfiltered_genus",
                                              "dys_filtered_asv",
                                              "dys_filtered_genus"))

Using PSC_IBD as id variables

p_col <- ggplot(metadata_colon_melted) + 
  geom_boxplot(aes(x=PSC_IBD, y=value),outliers = FALSE) + 
  geom_jitter(width = 0.2,height = 0,aes(x=PSC_IBD, y=value, color=PSC_IBD), 
              size=2) +
  facet_wrap(~variable, ncol = 4,scales = "free") + 
  theme_minimal() + 
    theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
          axis.ticks.x = element_line(size=0.3,color = "black"),
          axis.ticks.y = element_line(size=0.3,color="black"),
          axis.ticks.length = unit(4,"pt"),
          panel.grid = element_blank()) + 
  scale_fill_manual(values=c("#A06A2C", "#B2182B")) + 
  scale_color_manual(values=c("#A06A2C", "#B2182B"))

p_col

FILTERED GENUS

metadata_colon_for_plot <- metadata_colon %>% 
                                dplyr::filter(PSC_IBD %in% c(0,1)) %>%
  mutate(Group=case_when(PSC_IBD == 1 ~ "ibd",
                       PSC_IBD == 0 ~ "no_ibd"))

metadata_colon_for_plot$Group <- factor(metadata_colon_for_plot$Group,levels = c("no_ibd","ibd"))

dys_limit <- max(metadata_colon_for_plot$dys_filtered_genus,na.rm = TRUE) + 0.7*max(metadata_colon_for_plot$dys_filtered_genus,na.rm = TRUE)
dys_min_limit <-  min(metadata_colon_for_plot$dys_filtered_genus)

p_col_dys_genus_ibd <- ggplot(metadata_colon_for_plot %>%
                           mutate(`MDI`=dys_filtered_genus)) + 
  geom_boxplot(aes(x=Group, y=`MDI`),
               outliers = FALSE) + 
  geom_jitter(width = 0.3,height = 0,
              aes(x=Group, y=`MDI`, color=Group,shape=Country),
              size=1.5) +
  theme_minimal() + 
    theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
          axis.ticks.x = element_line(size=0.3,color = "black"),
          axis.ticks.y = element_line(size=0.3,color="black"),
          axis.ticks.length = unit(4,"pt"),
          panel.grid = element_blank()) + 
  ylim(dys_min_limit,dys_limit) + 
  scale_fill_manual(values=c("#A06A2C", "#B2182B")) + 
  scale_color_manual(values=c("#A06A2C", "#B2182B")) + 
  theme(axis.text.x = element_text(angle = 45,vjust = 0.5))

p_col_dys_genus_ibd

4.2.2 ROCs

metadata_colon_auc <- metadata_colon %>%
  group_by(PatientID) %>%
  distinct(PatientID, .keep_all = TRUE) %>%
  as.data.frame()

comparisons <- combn(unique(metadata_colon_auc$Group), 2, simplify = TRUE)
comparisons <- comparisons[,c(3,2,1,6,5,4)]

auc_df <- data.frame(Comparison=rep(NA,ncol(comparisons)),
                     AUC=rep(NA,ncol(comparisons)))

roc_curve_list <- list()
roc_curve_data_list <- list()

for (i in 1:ncol(comparisons)){
  name <- paste0(comparisons[1,i]," vs ", comparisons[2,i])
  metadata_sub <- subset(metadata_colon_auc,metadata_colon_auc$Group %in% comparisons[,i])
  metadata_sub$Group <- factor(metadata_sub$Group,
                               levels=c(comparisons[1,i],comparisons[2,i]))
  roc_curve_data <- roc(metadata_sub$Group, metadata_sub$dys_unfiltered_genus) 
  auc_value <- auc(roc_curve_data)  # Compute AUC
  auc_df[i,] <- c(name,auc_value)
  roc_curve_list[[name]] <- ggroc(roc_curve_data) + 
    theme_minimal() + 
    theme(legend.position = "none") + 
    ggtitle(paste0(name,' (AUC = ', round(auc_value,2),")")) + 
    theme(plot.title = element_text(size = 7))
  roc_curve_data_list[[name]] <- roc_curve_data
}

Setting levels: control = non-rPSC, case = healthy

Setting direction: controls > cases

Setting levels: control = non-rPSC, case = pre_ltx

Setting direction: controls < cases

Setting levels: control = non-rPSC, case = rPSC

Setting direction: controls < cases

Setting levels: control = pre_ltx, case = healthy

Setting direction: controls > cases

Setting levels: control = rPSC, case = healthy

Setting direction: controls > cases

Setting levels: control = rPSC, case = pre_ltx

Setting direction: controls > cases

auc_df

           Comparison               AUC
1 non-rPSC vs healthy 0.849832952347459
2 non-rPSC vs pre_ltx 0.560759106213652
3    non-rPSC vs rPSC 0.675293605915615
4  pre_ltx vs healthy 0.877167060677699
5     rPSC vs healthy 0.924132138857783
6     rPSC vs pre_ltx   0.6135477582846

roc_colon <- roc_curve_all(roc_curve_data_list)
roc_colon

4.2.3 Linear model

results_model <- pairwise.lmer(
  formula = "dys_filtered_genus ~ Group * Country + (1|PatientID)",
  factors=metadata_colon$Group,
  data=metadata_colon)

knitr::kable(results_model[[1]][,c("Estimate", "p.adj","sig")],
             digits=3,caption="Results of linear mixed-effect modeling")

Results of linear mixed-effect modeling
	Estimate	p.adj	sig
healthy vs Groupnon-rPSC	25.818	0.000	***
healthy vs non-rPSC - CZ vs NO	-0.432	0.955
healthy vs Groupnon-rPSC:CountryNO	-0.240	0.956
healthy vs GrouprPSC	40.128	0.000	***
healthy vs rPSC - CZ vs NO	-0.426	0.955
healthy vs GrouprPSC:CountryNO	-6.892	0.362
healthy vs Grouppre_ltx	37.540	0.000	***
healthy vs pre_ltx - CZ vs NO	-0.420	0.955
healthy vs Grouppre_ltx:CountryNO	-11.185	0.058
non-rPSC vs GrouprPSC	14.312	0.002	**
non-rPSC vs rPSC - CZ vs NO	-0.667	0.955
non-rPSC vs GrouprPSC:CountryNO	-6.662	0.528
pre_ltx vs Groupnon-rPSC	-11.706	0.016	*
pre_ltx vs non-rPSC - CZ vs NO	-11.601	0.016	*
pre_ltx vs Groupnon-rPSC:CountryNO	10.936	0.089
pre_ltx vs GrouprPSC	2.640	0.853
pre_ltx vs rPSC - CZ vs NO	-11.582	0.019	*
pre_ltx vs GrouprPSC:CountryNO	4.228	0.853

IBD

results_model <- pairwise.lmer(
  formula = "dys_filtered_genus ~ Group * Country + (1|PatientID)",
  factors=metadata_colon_for_plot$Group,
  data=metadata_colon_for_plot)

knitr::kable(results_model[[1]],
             digits=3,caption="Results of linear modeling")

Results of linear modeling
	Estimate	Std..Error	df	t.value	Pr…t..	p.adj
no_ibd vs Groupibd	5.771	4.110	437.349	1.404	0.161	0.161
no_ibd vs ibd - CZ vs NO	-3.164	5.250	331.780	-0.603	0.547	0.547
no_ibd vs Groupibd:CountryNO	-0.886	5.687	340.946	-0.156	0.876	0.876

library(pwr)

Warning: package 'pwr' was built under R version 4.3.3

# Fit  linear model
lmm <- lmer(dys_filtered_genus ~ Group * Country + (1|PatientID), metadata_colon_for_plot)

# Extract the fixed effect estimate for X
beta <- fixef(lmm)["Groupibd"]

# Extract residual standard deviation
sd_resid <- sigma(lmm)

# Compute Cohen's d
cohen_d <- beta / sd_resid
cohen_d

Groupibd 
1.001531

# Define parameters
u <- 2
v <- length(metadata_ileum_for_plot$SampleID) - u - 1  # Residual degrees of freedom (Total sample size - predictors - 1)
f2 <- 1 # Cohen’s f² effect size (0.02 = small, 0.15 = medium, 0.35 = large)
alpha <- 0.05  # Significance level
power <- pwr.f2.test(u = u, v = v, f2 = f2, sig.level = alpha)
power


     Multiple regression power calculation 

              u = 2
              v = 211
             f2 = 1
      sig.level = 0.05
          power = 1

4.3 Saving results

5 Alpha diversity

# needed variables
alpha_div_plots <- list()
alpha_boxplots <- list()
corrs <- list()

5.1 Terminal ileum

ileum_metadata_alpha_final <- metadata_ileum %>% 
  dplyr::select("SampleID", "PatientID","Group","Country",
                "Observe","Shannon","Simpson","Pielou",
                "dys_filtered_asv", "dys_filtered_genus")

variables <- c("Observe","Shannon","Simpson","Pielou")

for (clinical_variable in variables){
  
  # ILEUM
  # boxplot
  p <- clinical_boxplot(ileum_metadata_alpha_final,
                      variable=clinical_variable)

  alpha_boxplots[[paste0("ileum_",clinical_variable)]] <- p
  
  # correlation
  ## ASV
  level="ASV"
  corr <- clinical_correlation(
    ileum_metadata_alpha_final,
    clinical_variable,
    level)
  
  corrs[[paste0("ileum_asv_",clinical_variable)]] <- corr
  
  p <- clinical_scatter(corr,
                        ileum_metadata_alpha_final,
                        clinical_variable,
                        level)
  
  alpha_div_plots[[paste0("ileum_asv_",clinical_variable)]] <- p

  ## Genus
  level="genus"
  corr <- clinical_correlation(ileum_metadata_alpha_final,
                               clinical_variable,
                               level)
  corrs[[paste0("ileum_genus_",clinical_variable)]] <- corr
  
  p <- clinical_scatter(corr,
                        ileum_metadata_alpha_final,
                        clinical_variable,
                        level)
  
  alpha_div_plots[[paste0("ileum_genus_",clinical_variable)]] <- p
}

5.1.1 Boxplot

alpha_boxplots_ileum <- ggarrange(
  plotlist = alpha_boxplots[grepl("ileum",names(alpha_boxplots))],
  ncol = 4,
  common.legend = TRUE, 
  legend = "right")

alpha_boxplots_ileum

5.2 Colon

colon_metadata_alpha_final <- metadata_colon %>% 
  dplyr::select("SampleID", "PatientID", "Group","Country",
                "Observe","Shannon","Simpson","Pielou",
                "dys_filtered_asv","dys_filtered_genus")

variables <- c("Observe","Shannon","Simpson","Pielou")

for (clinical_variable in variables){

  # COLUMN
  p <- clinical_boxplot(colon_metadata_alpha_final,
                        variable=clinical_variable)

  alpha_boxplots[[paste0("colon_",clinical_variable)]] <- p
  
  # correlation
  ## ASV
  level="ASV"
  corr <- clinical_correlation(colon_metadata_alpha_final,
                               clinical_variable,
                               level,
                               segment="colon")
  
  corrs[[paste0("colon_asv_",clinical_variable)]] <- corr
  
  p <- clinical_scatter(corr,
                        colon_metadata_alpha_final,
                        clinical_variable,
                        level)
  
  alpha_div_plots[[paste0("colon_asv_",clinical_variable)]] <- p

  ## Genus
  level="genus"
  corr <- clinical_correlation(colon_metadata_alpha_final,
                               clinical_variable,
                               level,
                               segment="colon")
  
  corrs[[paste0("colon_genus_",clinical_variable)]] <- corr
  
  p <- clinical_scatter(corr,
                        colon_metadata_alpha_final,
                        clinical_variable,
                        level)
  
  alpha_div_plots[[paste0("colon_genus_",clinical_variable)]] <- p
}

5.2.1 Boxplot

alpha_boxplots_colon <- ggarrange(
  plotlist = alpha_boxplots[grepl("colon",names(alpha_boxplots))],
  ncol = 4,
  common.legend = TRUE, 
  legend = "right")

alpha_boxplots_colon

5.3 Scatterplots

ASV

alpha_dys_asv <- ggarrange(
  plotlist = alpha_div_plots[grepl("asv",names(alpha_div_plots))],
  ncol=4,
  nrow=2,
  common.legend = TRUE,
  legend="right")

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

alpha_dys_asv

Genus

alpha_dys_genus <- ggarrange(
  plotlist = alpha_div_plots[grepl("genus",names(alpha_div_plots))],
  ncol=4,
  nrow=2,
  common.legend = TRUE,
  legend="right")

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

alpha_dys_genus

5.4 Linear models

5.4.1 Richness

Ileum

results_model <- pairwise.lm(formula = "Observe ~ Group * Country",
                                     factors=metadata_ileum$Group,
                                     data=metadata_ileum)

knitr::kable(results_model[[1]][,c("Estimate", "p.adj","sig")],
             digits=3,caption="Results of linear modeling")

Results of linear modeling
	Estimate	p.adj	sig
healthy vs Groupnon-rPSC	-25.110	0.045	*
healthy , non-rPSC - CZ vs NO	11.219	0.346
healthy vs Groupnon-rPSC:CountryNO	-31.594	0.112
healthy vs GrouprPSC	-37.475	0.025	*
healthy , rPSC - CZ vs NO	11.219	0.346
healthy vs GrouprPSC:CountryNO	-22.969	0.346
healthy vs Grouppre_ltx	-53.642	0.000	***
healthy , pre_ltx - CZ vs NO	11.219	0.346
healthy vs Grouppre_ltx:CountryNO	6.313	0.710
non-rPSC vs GrouprPSC	-12.365	0.346
non-rPSC , rPSC - CZ vs NO	-20.375	0.200
non-rPSC vs GrouprPSC:CountryNO	8.625	0.710
pre_ltx vs Groupnon-rPSC	28.532	0.061
pre_ltx , non-rPSC - CZ vs NO	17.533	0.346
pre_ltx vs Groupnon-rPSC:CountryNO	-37.907	0.081
pre_ltx vs GrouprPSC	16.167	0.346
pre_ltx , rPSC - CZ vs NO	17.533	0.346
pre_ltx vs GrouprPSC:CountryNO	-29.283	0.346

Colon

results_model <- pairwise.lmer(
  formula = "Observe ~ Group * Country + (1|Patient)",
  factors=metadata_colon$Group,
  data=metadata_colon %>% mutate(Patient=PatientID))

knitr::kable(results_model[[1]][,c("Estimate", "p.adj","sig")],
             digits=3,caption="Results of linear modeling")

Results of linear modeling
	Estimate	p.adj	sig
healthy vs Groupnon-rPSC	-23.300	0.045	*
healthy vs non-rPSC - CZ vs NO	14.430	0.284
healthy vs Groupnon-rPSC:CountryNO	-32.063	0.113
healthy vs GrouprPSC	-40.512	0.009	**
healthy vs rPSC - CZ vs NO	14.241	0.284
healthy vs GrouprPSC:CountryNO	-28.963	0.284
healthy vs Grouppre_ltx	-56.230	0.000	***
healthy vs pre_ltx - CZ vs NO	14.193	0.284
healthy vs Grouppre_ltx:CountryNO	-7.489	0.659
non-rPSC vs GrouprPSC	-17.130	0.284
non-rPSC vs rPSC - CZ vs NO	-17.626	0.284
non-rPSC vs GrouprPSC:CountryNO	2.894	0.893
pre_ltx vs Groupnon-rPSC	32.865	0.040	*
pre_ltx vs non-rPSC - CZ vs NO	6.762	0.647
pre_ltx vs Groupnon-rPSC:CountryNO	-24.384	0.284
pre_ltx vs GrouprPSC	15.660	0.396
pre_ltx vs rPSC - CZ vs NO	6.652	0.647
pre_ltx vs GrouprPSC:CountryNO	-21.362	0.405

5.4.2 Shannon

Ileum

results_model <- pairwise.lm(formula = "Shannon ~ Group * Country",
                                     factors=metadata_ileum$Group,
                                     data=metadata_ileum)

knitr::kable(results_model[[1]][,c("Estimate", "p.adj","sig")],
             digits=3,caption="Results of linear modeling")

Results of linear modeling
	Estimate	p.adj	sig
healthy vs Groupnon-rPSC	-0.109	0.576
healthy , non-rPSC - CZ vs NO	0.007	0.994
healthy vs Groupnon-rPSC:CountryNO	-0.356	0.205
healthy vs GrouprPSC	-0.212	0.289
healthy , rPSC - CZ vs NO	0.007	0.994
healthy vs GrouprPSC:CountryNO	-0.358	0.289
healthy vs Grouppre_ltx	-0.505	0.034	*
healthy , pre_ltx - CZ vs NO	0.007	0.994
healthy vs Grouppre_ltx:CountryNO	0.144	0.634
non-rPSC vs GrouprPSC	-0.103	0.634
non-rPSC , rPSC - CZ vs NO	-0.350	0.080
non-rPSC vs GrouprPSC:CountryNO	-0.002	0.994
pre_ltx vs Groupnon-rPSC	0.396	0.080
pre_ltx , non-rPSC - CZ vs NO	0.150	0.576
pre_ltx vs Groupnon-rPSC:CountryNO	-0.500	0.115
pre_ltx vs GrouprPSC	0.293	0.289
pre_ltx , rPSC - CZ vs NO	0.150	0.576
pre_ltx vs GrouprPSC:CountryNO	-0.502	0.289

Colon

results_model <- pairwise.lmer(
  formula = "Shannon ~ Group * Country + (1|Patient)",
  factors=metadata_colon$Group,
  data=metadata_colon %>% mutate(Patient=PatientID))

knitr::kable(results_model[[1]][,c("Estimate", "p.adj","sig")],
             digits=3,caption="Results of linear modeling")

Results of linear modeling
	Estimate	p.adj	sig
healthy vs Groupnon-rPSC	-0.184	0.182
healthy vs non-rPSC - CZ vs NO	0.016	0.961
healthy vs Groupnon-rPSC:CountryNO	-0.378	0.091
healthy vs GrouprPSC	-0.334	0.061
healthy vs rPSC - CZ vs NO	0.014	0.961
healthy vs GrouprPSC:CountryNO	-0.361	0.207
healthy vs Grouppre_ltx	-0.585	0.001	**
healthy vs pre_ltx - CZ vs NO	0.015	0.961
healthy vs Grouppre_ltx:CountryNO	0.081	0.852
non-rPSC vs GrouprPSC	-0.148	0.591
non-rPSC vs rPSC - CZ vs NO	-0.360	0.061
non-rPSC vs GrouprPSC:CountryNO	0.013	0.962
pre_ltx vs Groupnon-rPSC	0.399	0.061
pre_ltx vs non-rPSC - CZ vs NO	0.095	0.790
pre_ltx vs Groupnon-rPSC:CountryNO	-0.455	0.095
pre_ltx vs GrouprPSC	0.251	0.407
pre_ltx vs rPSC - CZ vs NO	0.095	0.790
pre_ltx vs GrouprPSC:CountryNO	-0.442	0.283

6 FIGURE 5

terminal_ileum_list <- list(
  ileum_genus_mdi_boxplot=p_il_dys_genus + 
    theme(legend.position = "none") + 
    ggtitle("Terminal ileum") +
    theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)),
  ileum_genus_observe=                          alpha_div_plots[grepl("ileum_genus",names(alpha_div_plots))][[2]] +
    theme(legend.position = "none")#+ ylab("Richness")
  )
  
p_terminal_ileum <- ggarrange(
  plotlist = terminal_ileum_list,
  ncol=1,
  nrow=2,heights = c(1.2,0.8),labels = c("A","B"))

`geom_smooth()` using formula = 'y ~ x'

colon_list <- list(
  colon_genus_mdi_boxplot=p_col_dys_genus + 
    theme(legend.position = "none") +
  ggtitle("Colon") + 
    theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)),
  colon_genus_observe=                          alpha_div_plots[grepl("colon_genus",names(alpha_div_plots))][[2]] + 
    theme(legend.position = "none") #+ ylab("Shannon")
  )
  
p_colon <- ggarrange(
  plotlist = colon_list,
  ncol=1,
  nrow=2,heights = c(1.2,0.8))

`geom_smooth()` using formula = 'y ~ x'

p <- ggarrange(p_terminal_ileum,p_colon,ncol=2,common.legend = TRUE,
  legend="right")

p <- ggarrange(p,ggplot() + theme_minimal(),ncol=2,
                                widths = c(1,0.15))

7 Clinical variables comparison

7.1 PSC groups

variables <- c("log_Bilirubin","Albumin","log_ALP","Platelets","log_Calprotectin","AOM","log_APRI","INR","log_FIB4","log_Creatinine","log_ALT","log_AST","log_GGT","Nancy_max","eMAYO","MAYO_dai")

metadata_for_boxplots_psc <- metadata_for_boxplots %>% 
  dplyr::mutate(Group = case_when(
      Group == "pre_ltx" ~ "pre_LTx",
      Group == "non-rPSC" ~ "post_LTx",
      Group == "rPSC" ~ "post_LTx",
      TRUE ~ Group  # Keep other values as is
    )) %>%
  dplyr::select(-PSC_IBD)
  
metadata_for_boxplots_psc$Group <- factor(
    metadata_for_boxplots_psc$Group,levels=c("healthy","pre_LTx","post_LTx"))
  
test_df <- data.frame(`Variable`=NA,
                      `healthy_vs_pre_LTx`=NA,
                      `healthy_vs_post_LTx`=NA,
                      `pre_LTx_vs_post_LTx`=NA,check.names = FALSE)

boxplots_plots <- list()

i <- 1

for (clinical_variable in variables){
  if (grepl("log_",clinical_variable)) {
   metadata_for_boxplots_psc[,clinical_variable] <- log(
     as.numeric(metadata_for_boxplots_psc[,gsub("log_","",clinical_variable)])
     )
  }
  
  df <- metadata_for_boxplots_psc %>% dplyr::select("Group",clinical_variable)
  df <- df %>% drop_na()
  
  p <- clinical_boxplot(metadata_for_boxplots_psc,variable=clinical_variable)

  boxplots_plots[[clinical_variable]] <- p
  
  p_test <- pairwise.wilcox.test(
    df[,clinical_variable],
    df$Group,
    p.adjust.method = "BH")$p.value
  
  if (all(dim(p_test)>0)){
    if (all(colnames(p_test) == c("healthy","pre_LTx")) &
      all(rownames(p_test) == c("pre_LTx","post_LTx"))){
      
      test_df[i,] <- c(clinical_variable,
                     round(p_test[1,1],3),
                     round(p_test[2,1],3),
                     round(p_test[2,2],3))
    
  } else if (all(colnames(p_test) == c("pre_LTx")) &
      all(rownames(p_test) == c("post_LTx"))) {
    
      test_df[i,] <- c(clinical_variable,
                       NA,NA,
                       round(p_test[1,1],3))
      
    } else{ cat(clinical_variable)}
  } else{ cat(clinical_variable)}
  
  i <- i +1
  
}

test_df[,-1] <- as.numeric(as.matrix(test_df[,-1]))
significance_df <- test_df[,-1]
significance_df[test_df[,-1] <= 0.05] <-'*'
significance_df[test_df[,-1] <= 0.01] <-'**'
significance_df[test_df[,-1] <= 0.001] <-'***'
significance_df$Variable <- test_df$Variable
write.xlsx(significance_df,"../results/clinical_significance.xlsx")

significance_df %<>% column_to_rownames("Variable")

7.1.1 Test

knitr::kable(significance_df,
             digits=3,caption="Pairwise wilcox test of clinical variables")

Pairwise wilcox test of clinical variables
	healthy_vs_pre_LTx	healthy_vs_post_LTx	pre_LTx_vs_post_LTx
log_Bilirubin	**	0.798	***
Albumin	***	***	***
log_ALP	***	*	***
Platelets	0.318	***	***
log_Calprotectin	NA	NA	*
AOM	NA	NA	0.726
log_APRI	***	***	***
INR	0.968	0.968	0.638
log_FIB4	**	***	0.873
log_Creatinine	***	**	***
log_ALT	***	0.855	***
log_AST	***	0.384	***
log_GGT	***	0.197	***
Nancy_max	NA	NA	0.785
eMAYO	NA	NA	0.515
MAYO_dai	NA	NA	0.188

7.1.2 Boxplots

for (i in 1:length(boxplots_plots)){
  name <- names(boxplots_plots)[i]
  pl <- boxplots_plots[[i]]
  pl <- pl + ggtitle(name)
  boxplots_plots[[i]] <- pl
}
bx <- ggarrange(plotlist = boxplots_plots, ncol = 4,nrow=4)

bx

7.2 IBD

variables <- c("log_Bilirubin","Albumin","log_ALP","Platelets","log_Calprotectin","AOM","log_APRI","INR","log_FIB4","log_Creatinine","log_ALT","log_AST","log_GGT","Nancy_max","eMAYO","MAYO_dai")

test_df <- data.frame(`Variable`=NA,
                      `ibd_vs_noibd`=NA)

boxplots_plots <- list()
metadata_for_boxplots_ibd <- metadata_for_boxplots %>% dplyr::filter(PSC_IBD %in% c(0,1))

metadata_for_boxplots_ibd$Group

  [1] non-rPSC rPSC     non-rPSC non-rPSC non-rPSC rPSC     non-rPSC pre_ltx 
  [9] pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx  non-rPSC pre_ltx  pre_ltx 
 [17] pre_ltx  non-rPSC pre_ltx  pre_ltx  rPSC     pre_ltx  non-rPSC non-rPSC
 [25] rPSC     pre_ltx  pre_ltx  pre_ltx  non-rPSC non-rPSC rPSC     pre_ltx 
 [33] non-rPSC non-rPSC non-rPSC pre_ltx  non-rPSC pre_ltx  non-rPSC pre_ltx 
 [41] pre_ltx  non-rPSC pre_ltx  rPSC     non-rPSC pre_ltx  pre_ltx  pre_ltx 
 [49] pre_ltx  non-rPSC pre_ltx  pre_ltx  non-rPSC pre_ltx  pre_ltx  pre_ltx 
 [57] non-rPSC pre_ltx  rPSC     pre_ltx  rPSC     non-rPSC pre_ltx  pre_ltx 
 [65] non-rPSC non-rPSC pre_ltx  pre_ltx  non-rPSC pre_ltx  non-rPSC non-rPSC
 [73] pre_ltx  non-rPSC non-rPSC non-rPSC pre_ltx  pre_ltx  pre_ltx  non-rPSC
 [81] pre_ltx  pre_ltx  pre_ltx  pre_ltx  non-rPSC pre_ltx  pre_ltx  pre_ltx 
 [89] non-rPSC pre_ltx  pre_ltx  pre_ltx  pre_ltx  non-rPSC rPSC     rPSC    
 [97] pre_ltx  pre_ltx  pre_ltx  non-rPSC non-rPSC pre_ltx  pre_ltx  pre_ltx 
[105] pre_ltx  pre_ltx  pre_ltx  non-rPSC pre_ltx  pre_ltx  pre_ltx  rPSC    
[113] rPSC     pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx 
[121] pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx 
[129] pre_ltx  pre_ltx  pre_ltx  rPSC     pre_ltx  pre_ltx  non-rPSC non-rPSC
[137] non-rPSC non-rPSC non-rPSC rPSC     non-rPSC non-rPSC rPSC     non-rPSC
[145] non-rPSC non-rPSC non-rPSC non-rPSC rPSC     non-rPSC rPSC     non-rPSC
[153] non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC rPSC     non-rPSC non-rPSC
[161] non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC
[169] non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC
[177] rPSC     non-rPSC non-rPSC rPSC     non-rPSC non-rPSC non-rPSC rPSC    
[185] non-rPSC rPSC     non-rPSC non-rPSC non-rPSC non-rPSC rPSC     non-rPSC
[193] non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC rPSC     rPSC    
[201] non-rPSC rPSC     non-rPSC non-rPSC rPSC     non-rPSC rPSC     non-rPSC
[209] non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC rPSC     non-rPSC non-rPSC
[217] non-rPSC non-rPSC rPSC     non-rPSC non-rPSC rPSC     rPSC     non-rPSC
[225] non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC rPSC     rPSC     non-rPSC
[233] non-rPSC non-rPSC non-rPSC non-rPSC rPSC     rPSC     non-rPSC non-rPSC
[241] non-rPSC pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx 
[249] rPSC     pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx 
[257] pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx 
[265] pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx  pre_ltx 
Levels: healthy pre_ltx non-rPSC rPSC

i <- 1

for (clinical_variable in variables){
  if (grepl("log_",clinical_variable)) {
   metadata_for_boxplots_ibd[,clinical_variable] <- log(
     as.numeric(metadata_for_boxplots_ibd[,gsub("log_","",clinical_variable)])
     )
  }
  
  df <- metadata_for_boxplots_ibd %>% dplyr::select("PSC_IBD",clinical_variable)
  
  df$PSC_IBD <- factor(df$PSC_IBD,levels=c("0","1"))
  df <- df %>% drop_na()
  
  p_test <- pairwise.wilcox.test(
    df[,clinical_variable], 
    df$PSC_IBD,
    p.adjust.method = "BH")$p.value
  
  if (length(p_test)==0) test_df[i,] <- c(clinical_variable,NA)
  else if ((colnames(p_test)==0) & (rownames(p_test)==1)){
    test_df[i,] <- c(clinical_variable,
                     round(p_test[1,1],3))
  } else{
    cat(clinical_variable)
  }

  # boxplot
  p <- clinical_boxplot(metadata_for_boxplots_ibd,
                      variable=clinical_variable)

  boxplots_plots[[clinical_variable]] <- p
  
  i <- i +1
  
}

test_df[,-1] <- as.numeric(as.matrix(test_df[,-1]))
significance_df <- test_df[,-1]
significance_df[test_df[,-1] <= 0.05] <-'*'
significance_df[test_df[,-1] <= 0.01] <-'**'
significance_df[test_df[,-1] <= 0.001] <-'***'
significance_df %<>% as.data.frame() 
significance_df$Variable <- test_df$Variable
colnames(significance_df) <- c("ibd_vs_noibd","Variable")

write.xlsx(significance_df,"../results/clinical_significance_ibd.xlsx")

significance_df %<>% column_to_rownames("Variable")

7.2.1 Test

knitr::kable(significance_df,
             digits=3,caption="Pairwise wilcox test of clinical variables")

Pairwise wilcox test of clinical variables
	ibd_vs_noibd
log_Bilirubin	0.154
Albumin	0.396
log_ALP	0.968
Platelets	0.412
log_Calprotectin	***
AOM	*
log_APRI	0.087
INR	0.298
log_FIB4	**
log_Creatinine	**
log_ALT	0.159
log_AST	0.201
log_GGT	0.592
Nancy_max	NA
eMAYO	NA
MAYO_dai	NA

7.2.2 Boxplots

for (i in 1:length(boxplots_plots)){
  name <- names(boxplots_plots)[i]
  pl <- boxplots_plots[[i]]
  pl <- pl + ggtitle(name)
  boxplots_plots[[i]] <- pl
}
bx <- ggarrange(plotlist = boxplots_plots, ncol = 4,nrow=4)

bx

8 MDI and clinical variables

variables <- c("ALP","log_ALP",
               "GGT","log_GGT",
               "ALT","log_ALT",
               "AST","log_AST",
               "Bilirubin","log_Bilirubin",
               "INR","log_INR",
               "Albumin",
               "Platelets",
               "Creatinine","log_Creatinine",
               "FIB4","log_FIB4",
               "APRI","log_APRI",
               "MAYO_PSC",
               "AOM",
               "Calprotectin","log_Calprotectin",
               "Nancy_max","eMAYO","MAYO_dai")

corrs <- list()
clinical_plots <- list()

for (clinical_variable in variables){
  
  if (grepl("log_",clinical_variable)) {
   metadata_for_boxplots[,clinical_variable] <- log(as.numeric(metadata_for_boxplots[,gsub("log_","",clinical_variable)]))
   
   metadata_ileum[,clinical_variable] <- log(as.numeric(metadata_ileum[,gsub("log_","",clinical_variable)]))
   metadata_colon[,clinical_variable] <- log(as.numeric(metadata_colon[,gsub("log_","",clinical_variable)]))
  }
  
  # ILEUM
  # correlation
  
  ## ASV
  level="ASV"
  corr <- clinical_correlation(metadata_ileum,
                               clinical_variable,
                               level)
  
  corrs[[paste0("ileum_asv_",clinical_variable)]] <- corr
  
  p <- clinical_scatter(corr,
                        metadata_ileum,
                        clinical_variable,
                        level)
  clinical_plots[[paste0("ileum_asv_",clinical_variable)]] <- p

  ## Genus
  level="genus"
  corr <- clinical_correlation(metadata_ileum,
                               clinical_variable,
                               level)
  corrs[[paste0("ileum_genus_",clinical_variable)]] <- corr
  
  p <- clinical_scatter(corr,
                        metadata_ileum,
                        clinical_variable,
                        level)
  clinical_plots[[paste0("ileum_genus_",clinical_variable)]] <- p
  
  # correlation
  ## ASV
  level="ASV"
  corr <- clinical_correlation(metadata_colon,
                               clinical_variable,
                               level,
                               segment="colon")
  
  corrs[[paste0("colon_asv_",clinical_variable)]] <- corr
  
  p <- clinical_scatter(corr,
                        metadata_colon,
                        clinical_variable,
                        level)
  
  clinical_plots[[paste0("colon_asv_",clinical_variable)]] <- p

  ## Genus
  level="genus"
  corr <- clinical_correlation(metadata_colon,
                               clinical_variable,
                               level,
                               segment="colon")
  
  corrs[[paste0("colon_genus_",clinical_variable)]] <- corr
  
  p <- clinical_scatter(corr,
                        metadata_colon,
                        clinical_variable,
                        level)
  
  clinical_plots[[paste0("colon_genus_",clinical_variable)]] <- p
}

8.1 MDI ~ clinical variable

for (i in 1:length(clinical_plots)){
  name <- names(clinical_plots)[i]
  pl <- clinical_plots[[i]]
  pl <- pl + ggtitle(name)
  clinical_plots[[i]] <- pl
}

cl_asv <- ggarrange(
  plotlist = clinical_plots[grepl("asv",names(clinical_plots))],
  ncol=6,nrow=9)

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

cl_genus <- ggarrange(plotlist = clinical_plots[grepl("genus",names(clinical_plots))],
          ncol=6,nrow=9)

`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'

cl_genus

8.2 Heatmap

Genus level

corrs_ileum <- corrs[grepl("ileum_genus",names(corrs))]
corrs_colon <- corrs[grepl("colon_genus",names(corrs))]

prepared_list <- prepare_for_heatmap(corrs_ileum,corrs_colon)
p_df_sig_mdi <- prepared_list[[1]]
r_df_mdi <- prepared_list[[2]]

r_df_melt_mdi <- melt(r_df_mdi %>% rownames_to_column("SeqID"))

Using SeqID as id variables

p_df_melt_mdi <- melt(p_df_sig_mdi %>% rownames_to_column("SeqID"),
                      id.vars = c("SeqID"))

r_df_melt_mdi$SeqID <- factor(r_df_melt_mdi$SeqID,
                              levels=unique(r_df_melt_mdi$SeqID))

p_df_melt_mdi$SeqID <- factor(p_df_melt_mdi$SeqID,
                              levels=unique(p_df_melt_mdi$SeqID))

r_df_melt_mdi$variable <- factor(r_df_melt_mdi$variable,
                              levels=rev(unique(r_df_melt_mdi$variable)))

p_df_melt_mdi$variable <- factor(p_df_melt_mdi$variable,
                              levels=rev(unique(p_df_melt_mdi$variable)))

p_mdi <- ggplot() + 
  geom_tile(data=r_df_melt_mdi, aes(SeqID, variable, fill= value))  + 
  theme_minimal() + 
  scale_fill_gradient2(name = "Rho", low = "blue", mid = "white", high = "red", midpoint = 0,limits = c(-0.7, 0.7)) + 
  geom_text(data=p_df_melt_mdi,aes(x=SeqID,y=variable,label=value,vjust=0.6)) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
  xlab("") + ylab("")  + 
  ggtitle("MDI")+ 
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15))

p_mdi

9 Bacteria and clinical variables

9.1 Terminal ileum

level="genus"

Aggregation, filtering

# Aggregation
genus_data <- aggregate_taxa(ileum_asv_tab,
                             ileum_taxa_tab,
                             taxonomic_level=level,
                             names=TRUE)
# Filtration
filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             ileum_metadata,
                             seq_depth_threshold=10000)

Removing 5 ASV(s)

filt_ileum_genus_tab <- filt_data[[1]]
filt_ileum_genus_taxa <- filt_data[[2]]
filt_ileum_metadata_genus <- filt_data[[3]]

PSC effect taxa

psc_effect_ileum_genus <- read.xlsx(
  "../results/Q1/univariate_analysis/supplements_psc_effect_terminal_ileum.xlsx",
  sheet = "terminal_ileum genus")

psc_taxa <- psc_effect_ileum_genus$SeqID
increased <- psc_effect_ileum_genus$ASV[psc_effect_ileum_genus$log2FoldChange >0]
decreased <- psc_effect_ileum_genus$ASV[psc_effect_ileum_genus$log2FoldChange <0]

Preparing datasets

my_table <- filt_ileum_genus_tab %>% column_to_rownames("SeqID")
data_clr <- vegan::decostand(my_table,method = "clr", MARGIN = 2,pseudocount=0.5) %>% as.matrix()

data_clr <- data_clr[psc_taxa,] %>% t() %>% as.data.frame() %>% rownames_to_column("SampleID")

metadata_with_abundances <- data_clr %>% full_join(metadata_ileum,by="SampleID")

# significant variables
variables <- c("ALP",
               "APRI", 
               "Albumin",
               "Platelets")

corrs_ileum <- corrs[grepl("ileum_genus",names(corrs))]
corrs_colon <- corrs[grepl("colon_genus",names(corrs))]

prepared_list <- prepare_for_heatmap(corrs_ileum,corrs_colon)
p_df_sig_mdi <- prepared_list[[1]]

variables <- colnames(p_df_sig_mdi)[!apply(p_df_sig_mdi,2,function(x) {
  all(x == "")})]

corrs <- list()
clinical_plots <- list()

for (clinical_variable in variables){
  for (taxon in psc_taxa){
    if (grepl("log_",clinical_variable)) {
   metadata_with_abundances[,clinical_variable] <- log(as.numeric(metadata_with_abundances[,gsub("log_","",clinical_variable)]))
  }
  
  # ILEUM
  # correlation
  corr <- clinical_correlation_abundances(metadata_with_abundances,clinical_variable,taxon,level="genus")
  corrs[[paste0("ileum_genus_",clinical_variable,"_", taxon)]] <- corr
  if (corr$P <0.05){
    p <- clinical_scatter_abundances(corr,metadata_with_abundances,clinical_variable,taxon,level="genus",size = 3)
  clinical_plots[[paste0("ileum_genus_",clinical_variable,"_",taxon)]] <- p
  } else print(corr$P)
  }
  
}

[1] 0.68971
[1] 0.5727108
[1] 0.06684104
[1] 0.07115727
[1] 0.6939827
[1] 0.1949693
[1] 0.9010701
[1] 0.8492244
[1] 0.06406532
[1] 0.7152049
[1] 0.2833256
[1] 0.3875891
[1] 0.1593069
[1] 0.1583775
[1] 0.3674669
[1] 0.2368698
[1] 0.2485435
[1] 0.07856963
[1] 0.3431773
[1] 0.3279462
[1] 0.05899516
[1] 0.3401515
[1] 0.6083045
[1] 0.2541031
[1] 0.1143359
[1] 0.0968635
[1] 0.05785104
[1] 0.5824842
[1] 0.990239
[1] 0.2349355
[1] 0.1710323
[1] 0.6455714
[1] 0.2189091
[1] 0.2266506
[1] 0.9466934
[1] 0.3278282
[1] 0.1386096
[1] 0.1180884

prepared_list <- subprepare_for_heatmap(corrs,MDI = FALSE)
p_df_sig <- prepared_list[[1]]
r_df <- prepared_list[[2]]

r_df_melt <- melt(r_df %>% rownames_to_column("SeqID"))

Using SeqID as id variables

p_df_melt <- melt(p_df_sig %>% rownames_to_column("SeqID"),id.vars = c("SeqID"))

r_df_melt$variable <- factor(r_df_melt$variable, 
                             levels=rev(unique(r_df_melt$variable)))
p_df_melt$variable <- factor(p_df_melt$variable,
                             levels =rev(unique(p_df_melt$variable)))

r_df_melt$SeqID <- factor(r_df_melt$SeqID, levels = unique(r_df_melt$SeqID))
p_df_melt$SeqID <- factor(p_df_melt$SeqID, levels = unique(p_df_melt$SeqID))

p_ileum <- ggplot() + 
  geom_tile(data=r_df_melt, aes(variable, SeqID, fill= value))  + 
  theme_minimal() + 
  scale_fill_gradient2(name = "Rho", low = "blue", mid = "white", high = "red", midpoint = 0,limits = c(-0.5, 0.5)) + 
  geom_text(data=p_df_melt,aes(x=variable,y=SeqID,label=value,vjust=0.6)) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
  xlab("") + ylab("")  + 
  ggtitle("Terminal ileum")+ 
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15))

p_ileum

9.2 Colon

Aggregation, filtering

genus_data <- aggregate_taxa(colon_asv_tab,
                             colon_taxa_tab,
                             taxonomic_level="Genus",
                             names=TRUE)

# Filtration
filt_data <- filtering_steps(genus_data[[1]],
                             genus_data[[2]],
                             colon_metadata,
                             seq_depth_threshold=10000)

Removing 10 ASV(s)

filt_colon_genus_tab <- filt_data[[1]]
filt_colom_genus_taxa <- filt_data[[2]]
filt_colon_metadata_genus <- filt_data[[3]]

psc_effect_colon_genus <- read.xlsx("../results/Q1/univariate_analysis/supplements_psc_effect_colon.xlsx",
                                    sheet = "colon genus")

psc_taxa <- psc_effect_colon_genus$SeqID
increased <- psc_effect_colon_genus$SeqID[psc_effect_colon_genus$log2FoldChange >0]
decreased <- psc_effect_colon_genus$SeqID[psc_effect_colon_genus$log2FoldChange <0]

my_table <- filt_colon_genus_tab %>% column_to_rownames("SeqID")
data_clr <- vegan::decostand(my_table,method = "clr", MARGIN = 2,pseudocount=0.5) %>% as.matrix()
data_clr <- data_clr[psc_taxa,] %>% t() %>% as.data.frame() %>% rownames_to_column("SampleID")

metadata_with_abundances <- data_clr %>% full_join(metadata_colon,by="SampleID")

variables <- colnames(p_df_sig_mdi)[!apply(p_df_sig_mdi,2,function(x) {
  all(x == "")})]

corrs <- list()
clinical_plots <- list()

for (clinical_variable in variables){
  for (taxon in psc_taxa){
    if (grepl("log_",clinical_variable)) {
   metadata_with_abundances[,clinical_variable] <- log(as.numeric(metadata_with_abundances[,gsub("log_","",clinical_variable)]))
  }
  
  # ILEUM
  # correlation
  corr <- clinical_correlation_abundances(metadata_with_abundances,
                                          clinical_variable,
                                          taxon,level="genus",
                                          rename_p = TRUE)
  
  
  if ( (corr$P == "< 0.05") | 
       (corr$P == "< 0.01") | 
       (corr$P == "< 0.001") | 
       corr$P < 0.05){
    p <- clinical_scatter_abundances(corr,
                                     metadata_with_abundances,
                                     clinical_variable,
                                     taxon,
                                     level="genus",
                                     size = 3)
    
    clinical_plots[[paste0("colon_genus_",clinical_variable,"_",taxon)]] <- p
    
    # once again, withou renaming
    corr <- clinical_correlation_abundances(metadata_with_abundances,
                                          clinical_variable,
                                          taxon,level="genus")
    
    corrs[[paste0("colon_genus_",clinical_variable,"_", taxon)]] <- corr
      
  } else print(corr$P)
}
  

}

clinical_plots_colon <- clinical_plots
corrs_colon <- corrs

prepared_list <- subprepare_for_heatmap(corrs,MDI = FALSE)
p_df_sig <- prepared_list[[1]]
r_df <- prepared_list[[2]]

r_df_melt <- melt(r_df %>% rownames_to_column("SeqID"))

Using SeqID as id variables

p_df_melt <- melt(p_df_sig %>% rownames_to_column("SeqID"),id.vars = c("SeqID"))

r_df_melt$variable <- factor(r_df_melt$variable, 
                             levels=rev(unique(r_df_melt$variable)))
p_df_melt$variable <- factor(p_df_melt$variable,
                             levels =rev(unique(p_df_melt$variable)))

r_df_melt$SeqID <- factor(r_df_melt$SeqID, levels = unique(r_df_melt$SeqID))
p_df_melt$SeqID <- factor(p_df_melt$SeqID, levels = unique(p_df_melt$SeqID))

p_colon <- ggplot() + 
  geom_tile(data=r_df_melt, aes(variable, SeqID, fill= value))  + 
  theme_minimal() + 
  scale_fill_gradient2(name = "Rho", low = "blue", mid = "white", high = "red", midpoint = 0,limits = c(-0.5, 0.5)) + 
  geom_text(data=p_df_melt,aes(x=variable,y=SeqID,label=value,vjust=0.6)) + 
    theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
  xlab("") + ylab("")  + 
  ggtitle("Colon")+ 
  theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15))

p_colon

9.3 FIGURE 6

mdi_heatmap <- ggarrange(p_mdi,p_ileum,p_colon,ncol = 3,widths = c(0.7,1,1))
mdi_heatmap

10 Calprotectin ~ Nancy_max

cor_res <- cor.test(
  metadata_for_boxplots$Calprotectin,
  metadata_for_boxplots$Nancy_max)

cor_res


    Pearson's product-moment correlation

data:  metadata_for_boxplots$Calprotectin and metadata_for_boxplots$Nancy_max
t = 3.0164, df = 90, p-value = 0.003325
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 0.1046851 0.4781488
sample estimates:
      cor 
0.3030057

p_1 <- ggplot(metadata_for_boxplots) + 
  geom_jitter(
      aes(x=Nancy_max, y=Calprotectin, color=Group),
      height = 0,width = 0.2) + 
  geom_text(aes(x=3.2, y=5700),
            label=gsub("\\\\n", "\n", labels),hjust = 0) + 
  geom_smooth(aes(x=Nancy_max, y=Calprotectin), 
              method=lm, se=TRUE) +
  theme_minimal() + 
  theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
        axis.ticks.x = element_line(size=0.3,color = "black"),
        axis.ticks.y = element_line(size=0.3,color="black"),
        axis.ticks.length = unit(4,"pt"),
        panel.grid = element_blank()) + 
  scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000"))

p_1

Warning in geom_text(aes(x = 3.2, y = 5700), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 278 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 278 rows containing missing values or values outside the scale range
(`geom_point()`).

11 Calprotectin ~ ALP

cor_res <- cor.test(
  metadata_for_boxplots$Calprotectin,
  metadata_for_boxplots$ALP)

cor_res


    Pearson's product-moment correlation

data:  metadata_for_boxplots$Calprotectin and metadata_for_boxplots$ALP
t = -0.45614, df = 236, p-value = 0.6487
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.15625130  0.09785193
sample estimates:
        cor 
-0.02967918

p_2 <- ggplot(metadata_for_boxplots) + 
    geom_jitter(aes(x=ALP, y=Calprotectin, color=Group),height = 0,width = 0) + 
  geom_text(aes(x=14.34, y=5700),
              label=gsub("\\\\n", "\n", labels),hjust = 0) + 
    geom_smooth(aes(x=ALP, y=Calprotectin),method=lm, se=TRUE) +
    theme_minimal() + 
    theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
          axis.ticks.x = element_line(size=0.3,color = "black"),
          axis.ticks.y = element_line(size=0.3,color="black"),
          axis.ticks.length = unit(4,"pt"),
          panel.grid = element_blank()) + 
  scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000"))

p_2

Warning in geom_text(aes(x = 14.34, y = 5700), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 132 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 132 rows containing missing values or values outside the scale range
(`geom_point()`).

12 Calprotectin ~ GGT

cor_res <- cor.test(
  metadata_for_boxplots$Calprotectin,
  metadata_for_boxplots$GGT)

cor_res


    Pearson's product-moment correlation

data:  metadata_for_boxplots$Calprotectin and metadata_for_boxplots$GGT
t = -0.17189, df = 108, p-value = 0.8638
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.2031507  0.1712334
sample estimates:
        cor 
-0.01653834

p_3 <- ggplot(metadata_for_boxplots) + 
  geom_jitter(aes(x=GGT, y=Calprotectin, color=Group),
                height = 0,width = 0) + 
  geom_text(aes(x=12.18, y=5700),
            label=gsub("\\\\n", "\n", labels),hjust = 0) + 
  geom_smooth(aes(x=GGT, y=Calprotectin),method=lm, se=TRUE) +
  theme_minimal() + 
  theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
        axis.ticks.x = element_line(size=0.3,color = "black"),
        axis.ticks.y = element_line(size=0.3,color="black"),
        axis.ticks.length = unit(4,"pt"),
        panel.grid = element_blank()) + 
  scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000"))

p_3

Warning in geom_text(aes(x = 12.18, y = 5700), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 260 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 260 rows containing missing values or values outside the scale range
(`geom_point()`).

13 Nancy_max ~ ALP

cor_res <- cor.test(
  metadata_for_boxplots$ALP,
  metadata_for_boxplots$Nancy_max)

cor_res


    Pearson's product-moment correlation

data:  metadata_for_boxplots$ALP and metadata_for_boxplots$Nancy_max
t = -0.5031, df = 111, p-value = 0.6159
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.2303974  0.1382504
sample estimates:
        cor 
-0.04769762

p_4 <- ggplot(metadata_for_boxplots) + 
  geom_jitter(aes(x=Nancy_max, y=ALP, color=Group),
                height = 0,width = 0.2) + 
  geom_text(aes(x=3.2, y=16.9),
              label=gsub("\\\\n", "\n", labels),hjust = 0) + 
    geom_smooth(aes(x=Nancy_max, y=ALP),method=lm, se=TRUE) +
    theme_minimal() + 
    theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
          axis.ticks.x = element_line(size=0.3,color = "black"),
          axis.ticks.y = element_line(size=0.3,color="black"),
          axis.ticks.length = unit(4,"pt"),
          panel.grid = element_blank()) + 
  scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000"))

p_4

Warning in geom_text(aes(x = 3.2, y = 16.9), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 257 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 257 rows containing missing values or values outside the scale range
(`geom_point()`).

14 Nancy_max ~ GGT

cor_res <- cor.test(metadata_for_boxplots$GGT,
                    metadata_for_boxplots$Nancy_max)
cor_res


    Pearson's product-moment correlation

data:  metadata_for_boxplots$GGT and metadata_for_boxplots$Nancy_max
t = 0.52563, df = 110, p-value = 0.6002
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
 -0.1367723  0.2334414
sample estimates:
      cor 
0.0500538

p_5 <- ggplot(metadata_for_boxplots) + 
    geom_jitter(aes(x=Nancy_max, y=GGT, color=Group),height = 0,width = 0.2) + 
  geom_text(aes(x=3.2, y=14.45),
              label=gsub("\\\\n", "\n", labels),hjust = 0) + 
    geom_smooth(aes(x=Nancy_max, y=GGT),method=lm, se=TRUE) +
    theme_minimal() + 
    theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
          axis.ticks.x = element_line(size=0.3,color = "black"),
          axis.ticks.y = element_line(size=0.3,color="black"),
          axis.ticks.length = unit(4,"pt"),
          panel.grid = element_blank()) + 
  scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000"))

p_5

Warning in geom_text(aes(x = 3.2, y = 14.45), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 258 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 258 rows containing missing values or values outside the scale range
(`geom_point()`).

p <- ggarrange(plotlist=list(p_1,p_2,p_3,p_4,p_5),ncol=2,nrow=3)

Warning in geom_text(aes(x = 3.2, y = 5700), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 278 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 278 rows containing missing values or values outside the scale range
(`geom_point()`).

Warning in geom_text(aes(x = 14.34, y = 5700), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 132 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 132 rows containing missing values or values outside the scale range
(`geom_point()`).

Warning in geom_text(aes(x = 12.18, y = 5700), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 260 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 260 rows containing missing values or values outside the scale range
(`geom_point()`).

Warning in geom_text(aes(x = 3.2, y = 16.9), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 257 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 257 rows containing missing values or values outside the scale range
(`geom_point()`).

Warning in geom_text(aes(x = 3.2, y = 14.45), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
  a single row.

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 258 rows containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 258 rows containing missing values or values outside the scale range
(`geom_point()`).

pdf("../figures/clinical/calprotectin_nancymax.pdf",width = 10,height = 10)
p
dev.off()

png 
  2