source("custom_functions.R")Mucosal microbiota alterations in primary sclerosis cholangitis persist after liver transplantation and are associated with clinical features independently of geography
Clinical analysis
1 Introduction
Following a comprehensive analysis of the microbiome data, we extended our investigation to clinical parameters, incorporating the previously computed Microbial Dysbiosis Index (MDI). Our analysis focused on evaluating the relationships between the MDI, individual microbial taxa, and relevant clinical variables.
Specifically, this script includes multiple statistical and visualization approaches to explore these associations. First, we present MDI distributions across different groups using boxplots and ROC curves to evaluate the discriminatory power of the MDI.
Additionally, we analyze the correlation between alpha diversity and MDI to assess whether overall microbiome diversity is linked to dysbiosis severity. These relationships are further explored through correlation analyses, where we visualize associations between MDI and clinical variables using heatmaps. Finally, we assess the correlation between specific PSC-associated bacterial taxa and clinical parameters, highlighting potential microbial biomarkers for disease progression and severity.
Importing libraries and custom functions built for this analysis
2 Import clinical data
2.1 CZ
# clinical metadata
metadata_clinical <- read.csv("../../data/clinical_data/clinical_metadata_cz.csv")
metadata_clinical$PatientID <- as.character(metadata_clinical$PatientID)
# DYSBIOSIS
metadata_dysbiosis <- read.csv("../../data/clinical_data/dysbiosis_metadata.csv") %>%
dplyr::filter(Country=="CZ")
# ALPHA DIVERSITY
metadata_alpha_ileum <- read.csv(
"../results/Q1/alpha_diversity/alpha_indices_terminal_ileum.csv") %>%
dplyr::filter(Country=="CZ")
metadata_alpha_colon <- read.csv(
"../results/Q1/alpha_diversity/alpha_indices_colon.csv") %>%
dplyr::filter(Country=="CZ")
metadata_alpha <- rbind(metadata_alpha_ileum,metadata_alpha_colon) %>%
mutate(PatientID=Patient) %>%
dplyr::select(-c(Patient, Group))
# MERGING
metadata_cz <- full_join(metadata_clinical, metadata_dysbiosis, by=c("SampleID","Matrix","PatientID","Group","Country"))
metadata_cz <- full_join(metadata_cz, metadata_alpha, by=c("SampleID","PatientID","Country"))
metadata_cz$Group <- factor(metadata_cz$Group,levels = c("healthy","pre_ltx","non-rPSC","rPSC"))2.2 NO
# clinical metadata
metadata_clinical <- read.csv("../../data/clinical_data/clinical_metadata_no.csv")
metadata_clinical %<>% mutate(PatientID=subjectid,
Matrix=segment) %>%
dplyr::select(-subjectid,-segment)
metadata_clinical$PatientID <- as.character(paste0("NO_",metadata_clinical$PatientID))
# DYSBIOSIS
metadata_dysbiosis <- read.csv("../../data/clinical_data/dysbiosis_metadata.csv") %>%
dplyr::filter(Country=="NO") %>%
dplyr::select(-c(Patient))
# ALPHA DIVERSITY
metadata_alpha_ileum <- read.csv(
"../results/Q1/alpha_diversity/alpha_indices_terminal_ileum.csv") %>%
dplyr::filter(Country=="NO")
metadata_alpha_colon <- read.csv(
"../results/Q1/alpha_diversity/alpha_indices_colon.csv") %>%
dplyr::filter(Country=="NO")
metadata_alpha <- rbind(metadata_alpha_ileum,metadata_alpha_colon) %>%
mutate(PatientID=Patient) %>%
dplyr::select(-c(Patient, Group))
# MERGING
metadata_no <- full_join(metadata_clinical, metadata_dysbiosis, by=c("SampleID","Matrix","PatientID","Group","Country"))
metadata_no <- full_join(metadata_no, metadata_alpha, by=c("SampleID","PatientID","Country"))
metadata_no$Group <- factor(metadata_no$Group,levels = c("healthy","pre_ltx","non-rPSC","rPSC"))2.3 Merging metadata
metadata_cz %<>% dplyr::mutate(Calprotectin=Fecal.calprotectin,
AOM=AOM_score,
APRI=APRI_score,
FIB4=FIB4_score) %>%
dplyr::select(SampleID,Matrix,PatientID,Group,Country,Bilirubin,ALP,Calprotectin,
MAYO_PSC,AOM,APRI,FIB4,Platelets,AST,Creatinine,
Albumin,ALT,PSC_IBD,GGT,INR,Albumin,Nancy_max,eMAYO,MAYO_dai,
dys_unfiltered_asv,dys_unfiltered_genus,
dys_filtered_asv,dys_filtered_genus,
Observe,Shannon,Simpson,Pielou)
metadata_no %<>% dplyr::mutate(Platelets=TRC,
Creatinine=Kreatinin,
MAYO_PSC=Mayo_score,
AST=ASAT/60,
ALT=ALAT/60,
PSC_IBD=IBD,
Bilirubin=Bilirubin*17.1,
ALP=ALP/60,
Albumin=Albumin*10) %>%
dplyr::mutate(PSC_IBD = case_when(
PSC_IBD == "no_ibd" ~ "0",
PSC_IBD == "ibd" ~ "1",
TRUE ~ Group # Keep other values as is
)) %>%
dplyr::select(SampleID,Matrix,PatientID,Group,Country,Bilirubin,ALP,Calprotectin,
MAYO_PSC,AOM,APRI,FIB4,Platelets,AST,Creatinine,
Albumin,ALT,PSC_IBD,
dys_unfiltered_asv,dys_unfiltered_genus,
dys_filtered_asv,dys_filtered_genus,
Observe,Shannon,Simpson,Pielou)
metadata_final <- merge(metadata_cz,metadata_no,all = TRUE)variables <- c("PatientID", "Group", "Country",
"Bilirubin", "Albumin",
"ALP", "Platelets",
"Calprotectin",
"MAYO_PSC","AOM",
"APRI","INR",
"FIB4",
"Creatinine",
"ALT","AST","GGT",
"Nancy_max","eMAYO","MAYO_dai","PSC_IBD")metadata_final$Calprotectin[metadata_final$Group=="healthy"] <- NA
metadata_final$Calprotectin[metadata_final$Calprotectin==">6000"] <- 6000
metadata_final$Calprotectin <- as.numeric(metadata_final$Calprotectin)
metadata_ileum <- metadata_final %>% subset(Matrix=="TI") %>% as.data.frame()
metadata_colon <- metadata_final %>% subset(Matrix!="TI") %>% as.data.frame()
metadata_for_boxplots <- metadata_final[,variables] %>%
group_by(PatientID) %>%
distinct(PatientID, .keep_all = TRUE) %>%
as.data.frame()3 Import microbiome data
3.1 Data Import
Importing ASV, taxa and metadata tables for both Czech and Norway samples.
Czech
path = "../../data/analysis_ready_data/ikem/"
asv_tab_ikem <- as.data.frame(fread(file.path(path,"asv_table_ikem.csv"),
check.names = FALSE))
taxa_tab_ikem <- as.data.frame(fread(file.path(path,"taxa_table_ikem.csv"),
check.names = FALSE))
metadata_ikem <- as.data.frame(fread(file.path(path,"metadata_ikem.csv"),
check.names = FALSE))Norway
path = "../../data/analysis_ready_data/norway/"
asv_tab_norway <- as.data.frame(fread(file.path(path,"asv_table_norway.csv"),
check.names = FALSE))
taxa_tab_norway <- as.data.frame(fread(file.path(path,"taxa_table_norway.csv"),
check.names = FALSE))
metadata_norway <- as.data.frame(fread(file.path(path,"metadata_norway.csv"),
check.names = FALSE))3.2 Spliting to segments
Terminal ileum
ileum_data <- merging_data(asv_tab_1=asv_tab_ikem,
asv_tab_2=asv_tab_norway,
taxa_tab_1=taxa_tab_ikem,
taxa_tab_2=taxa_tab_norway,
metadata_1=metadata_ikem,
metadata_2=metadata_norway,
segment="TI",Q="clinical")Removing 1498 ASV(s)
Removing 1834 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
ileum_asv_tab <- ileum_data[[1]]
ileum_taxa_tab <- ileum_data[[2]]
ileum_metadata <- ileum_data[[3]]Colon
colon_data <- merging_data(asv_tab_1=asv_tab_ikem,
asv_tab_2=asv_tab_norway,
taxa_tab_1=taxa_tab_ikem,
taxa_tab_2=taxa_tab_norway,
metadata_1=metadata_ikem,
metadata_2=metadata_norway,
segment="colon",Q="clinical")Removing 739 ASV(s)
Removing 266 ASV(s)
Merging at ASV level
Finding inconsistencies in taxonomy, trying to keep the ones that have better taxonomy assignment
colon_asv_tab <- colon_data[[1]]
colon_taxa_tab <- colon_data[[2]]
colon_metadata <- colon_data[[3]]3.3 Filtering
Rules: - sequencing depth > 10000 - nearZeroVar() with default settings
Terminal ileum
Sequencing depth
data_filt <- seq_depth_filtering(ileum_asv_tab,
ileum_taxa_tab,
ileum_metadata,
seq_depth_threshold = 10000)Removing 131 ASV(s)
filt_ileum_asv_tab <- data_filt[[1]]; alpha_ileum_asv_tab <- filt_ileum_asv_tab
filt_ileum_taxa_tab <- data_filt[[2]]; alpha_ileum_taxa_tab <- filt_ileum_taxa_tab
filt_ileum_metadata <- data_filt[[3]]; alpha_ileum_metadata <- filt_ileum_metadata
seq_step <- dim(filt_ileum_asv_tab)[1]NearZeroVar
data_filt <- nearzerovar_filtering(filt_ileum_asv_tab,
filt_ileum_taxa_tab,
filt_ileum_metadata)
filt_ileum_asv_tab <- data_filt[[1]]
filt_ileum_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_ileum_asv_tab)[1]Colon
Sequencing depth
data_filt <- seq_depth_filtering(colon_asv_tab,
colon_taxa_tab,
colon_metadata,
seq_depth_threshold = 10000)Removing 75 ASV(s)
filt_colon_asv_tab <- data_filt[[1]]; alpha_colon_asv_tab <- filt_colon_asv_tab
filt_colon_taxa_tab <- data_filt[[2]]; alpha_colon_taxa_tab <- filt_colon_taxa_tab
filt_colon_metadata <- data_filt[[3]]; alpha_colon_metadata <- filt_colon_metadata
seq_step <- dim(filt_colon_asv_tab)[1]NearZeroVar
data_filt <- nearzerovar_filtering(filt_colon_asv_tab,
filt_colon_taxa_tab,
filt_colon_metadata)
filt_colon_asv_tab <- data_filt[[1]]
filt_colon_taxa_tab <- data_filt[[2]]
nearzero_step <- dim(filt_colon_asv_tab)[1]Check zero depth
data_filt <- check_zero_depth(filt_colon_asv_tab,
filt_colon_taxa_tab,
filt_colon_metadata)
filt_colon_asv_tab <- data_filt[[1]];
filt_colon_taxa_tab <- data_filt[[2]];
filt_colon_metadata <- data_filt[[3]]; 4 MDI
4.1 Terminal ileum
4.1.1 Boxplots
4.1.1.1 PSC groups
metadata_ileum_melted <- melt(metadata_ileum %>%
dplyr::select("Group",
"dys_unfiltered_asv",
"dys_unfiltered_genus",
"dys_filtered_asv",
"dys_filtered_genus"))Using Group as id variables
p_il <- ggplot(metadata_ileum_melted) +
geom_boxplot(aes(x=Group, y=value),outliers = FALSE) +
geom_jitter(width = 0.2,height = 0,aes(x=Group, y=value, color=Group),
size=2) +
facet_wrap(~variable, ncol = 4,scales = "free") +
theme_minimal() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
axis.ticks.x = element_line(size=0.3,color = "black"),
axis.ticks.y = element_line(size=0.3,color="black"),
axis.ticks.length = unit(4,"pt"),
panel.grid = element_blank()) +
scale_fill_manual(values=c("#309f87","#f9c675","#F08080","#A00000")) +
scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000"))
p_il FILTERED GENUS
dys_limit <- max(metadata_ileum$dys_filtered_genus,na.rm = TRUE) + 1.1*max(metadata_ileum$dys_filtered_genus,na.rm = TRUE)
dys_min_limit <- min(metadata_ileum$dys_filtered_genus)
p_il_dys_genus <- ggplot(metadata_ileum %>%
mutate(`MDI`=dys_filtered_genus)) +
geom_boxplot(aes(x=Group, y=`MDI`),outliers = FALSE) +
geom_jitter(width = 0.3,height = 0,
aes(x=Group, y=`MDI`, color=Group,shape=Country),
size=1) +
theme_minimal() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
axis.ticks.x = element_line(size=0.3,color = "black"),
axis.ticks.y = element_line(size=0.3,color="black"),
axis.ticks.length = unit(4,"pt"),
panel.grid = element_blank()) +
ylim(dys_min_limit,dys_limit) +
scale_fill_manual(values=c("#309f87","#f9c675","#F08080","#A00000")) +
scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000")) +
theme(axis.text.x = element_text(angle = 45,vjust = 0.5))
p_il_dys_genus4.1.1.2 IBD
metadata_ileum_melted <- melt(metadata_ileum %>%
dplyr::filter(PSC_IBD %in% c(0,1)) %>%
dplyr::select("PSC_IBD",
"dys_unfiltered_asv",
"dys_unfiltered_genus",
"dys_filtered_asv",
"dys_filtered_genus")) Using PSC_IBD as id variables
p_il <- ggplot(metadata_ileum_melted) +
geom_boxplot(aes(x=PSC_IBD, y=value),outliers = FALSE) +
geom_jitter(width = 0.2,height = 0,aes(x=PSC_IBD, y=value, color=PSC_IBD),
size=2) +
facet_wrap(~variable, ncol = 4,scales = "free") +
theme_minimal() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
axis.ticks.x = element_line(size=0.3,color = "black"),
axis.ticks.y = element_line(size=0.3,color="black"),
axis.ticks.length = unit(4,"pt"),
panel.grid = element_blank()) +
scale_fill_manual(values=c("#A06A2C", "#B2182B")) +
scale_color_manual(values=c("#A06A2C", "#B2182B"))
p_il FILTERED GENUS
metadata_ileum_for_plot <- metadata_ileum %>%
dplyr::filter(PSC_IBD %in% c(0,1)) %>%
mutate(Group=case_when(PSC_IBD == 1 ~ "ibd",
PSC_IBD == 0 ~ "no_ibd"))
metadata_ileum_for_plot$Group <- factor(metadata_ileum_for_plot$Group,levels = c("no_ibd","ibd"))
dys_limit <- max(metadata_ileum_for_plot$dys_filtered_genus,na.rm = TRUE) + 0.6*max(metadata_ileum_for_plot$dys_filtered_genus,na.rm = TRUE)
dys_min_limit <- min(metadata_ileum_for_plot$dys_filtered_genus)
p_il_dys_genus_ibd <- ggplot(metadata_ileum_for_plot %>%
mutate(`MDI`=dys_filtered_genus)) +
geom_boxplot(aes(x=Group, y=`MDI`),
outliers = FALSE) +
geom_jitter(width = 0.3,height = 0,
aes(x=Group, y=`MDI`, color=Group,shape=Country),
size=1.5) +
theme_minimal() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
axis.ticks.x = element_line(size=0.3,color = "black"),
axis.ticks.y = element_line(size=0.3,color="black"),
axis.ticks.length = unit(4,"pt"),
panel.grid = element_blank()) +
ylim(dys_min_limit,dys_limit) +
scale_fill_manual(values=c("#A06A2C", "#B2182B")) +
scale_color_manual(values=c("#A06A2C", "#B2182B")) +
theme(axis.text.x = element_text(angle = 45,vjust = 0.5))
p_il_dys_genus_ibd4.1.2 ROCs
comparisons <- combn(unique(metadata_ileum$Group), 2, simplify = TRUE)
comparisons <- comparisons[,c(3,2,1,6,5,4)]
auc_df <- data.frame(Comparison=rep(NA,ncol(comparisons)),
AUC=rep(NA,ncol(comparisons)))
roc_curve_list <- list()
roc_curve_data_list <- list()
for (i in 1:ncol(comparisons)){
name <- paste0(comparisons[1,i]," vs ", comparisons[2,i])
metadata_sub <- subset(metadata_ileum,metadata_ileum$Group %in% comparisons[,i])
metadata_sub$Group <- factor(metadata_sub$Group,
levels=c(comparisons[1,i],comparisons[2,i]))
roc_curve_data <- roc(metadata_sub$Group, metadata_sub$dys_unfiltered_genus)
auc_value <- auc(roc_curve_data) # Compute AUC
auc_df[i,] <- c(name,auc_value)
roc_curve_list[[name]] <- ggroc(roc_curve_data) +
theme_minimal() +
theme(legend.position = "none") +
ggtitle(paste0(name,' (AUC = ', round(auc_value,2))) +
theme(plot.title = element_text(size = 7))
roc_curve_data_list[[name]] <- roc_curve_data
}Setting levels: control = non-rPSC, case = healthy
Setting direction: controls > cases
Setting levels: control = non-rPSC, case = pre_ltx
Setting direction: controls > cases
Setting levels: control = non-rPSC, case = rPSC
Setting direction: controls < cases
Setting levels: control = pre_ltx, case = healthy
Setting direction: controls > cases
Setting levels: control = rPSC, case = healthy
Setting direction: controls > cases
Setting levels: control = rPSC, case = pre_ltx
Setting direction: controls > cases
auc_df Comparison AUC
1 non-rPSC vs healthy 0.817806603773585
2 non-rPSC vs pre_ltx 0.470678225395207
3 non-rPSC vs rPSC 0.589067702552719
4 pre_ltx vs healthy 0.8375
5 rPSC vs healthy 0.8875
6 rPSC vs pre_ltx 0.552861685214626
roc_ileum <- roc_curve_all(roc_curve_data_list)
roc_ileum4.1.3 Linear model
PSC groups
results_model <- pairwise.lm(
formula = "dys_filtered_genus ~ Group * Country",
factors=metadata_ileum$Group,
data=metadata_ileum)
knitr::kable(results_model[[1]][,c("Estimate", "p.adj","sig")],
digits=3,caption="Results of linear modeling")| Estimate | p.adj | sig | |
|---|---|---|---|
| healthy vs Groupnon-rPSC | 17.097 | 0.000 | *** |
| healthy , non-rPSC - CZ vs NO | -0.803 | 0.781 | |
| healthy vs Groupnon-rPSC:CountryNO | -1.684 | 0.781 | |
| healthy vs GrouprPSC | 22.812 | 0.000 | *** |
| healthy , rPSC - CZ vs NO | -0.803 | 0.781 | |
| healthy vs GrouprPSC:CountryNO | -4.288 | 0.652 | |
| healthy vs Grouppre_ltx | 25.244 | 0.000 | *** |
| healthy , pre_ltx - CZ vs NO | -0.803 | 0.781 | |
| healthy vs Grouppre_ltx:CountryNO | -9.217 | 0.080 | |
| non-rPSC vs GrouprPSC | 5.715 | 0.191 | |
| non-rPSC , rPSC - CZ vs NO | -2.488 | 0.705 | |
| non-rPSC vs GrouprPSC:CountryNO | -2.603 | 0.781 | |
| pre_ltx vs Groupnon-rPSC | -8.148 | 0.056 | |
| pre_ltx , non-rPSC - CZ vs NO | -10.021 | 0.033 | * |
| pre_ltx vs Groupnon-rPSC:CountryNO | 7.533 | 0.259 | |
| pre_ltx vs GrouprPSC | -2.433 | 0.781 | |
| pre_ltx , rPSC - CZ vs NO | -10.021 | 0.039 | * |
| pre_ltx vs GrouprPSC:CountryNO | 4.930 | 0.737 |
IBD
results_model <- pairwise.lm(
formula = "dys_filtered_genus ~ Group * Country",
factors=metadata_ileum_for_plot$Group,
data=metadata_ileum_for_plot)
knitr::kable(results_model[[1]],
digits=3,caption="Results of linear modeling")| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| no_ibd vs Groupibd | 4.144 | 4.022 | 1.030 | 0.304 |
| no_ibd , ibd - CZ vs NO | -7.866 | 5.111 | -1.539 | 0.125 |
| no_ibd vs Groupibd:CountryNO | 4.921 | 5.603 | 0.878 | 0.381 |
4.2 Colon
4.2.1 Boxplots
4.2.1.1 PSC_groups
colon_metadata_melted <- melt(metadata_colon %>%
dplyr::select("Group",
"dys_unfiltered_asv",
"dys_unfiltered_genus",
"dys_filtered_asv",
"dys_filtered_genus"))Using Group as id variables
p_col <- ggplot(colon_metadata_melted) +
geom_boxplot(aes(x=Group, y=value),outliers = FALSE) +
geom_jitter(width = 0.2,height = 0,
aes(x=Group, y=value, color=Group),
size=2) +
facet_wrap(~variable, ncol = 4,scales = "free") +
theme_minimal() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 0)) +
scale_fill_manual(values=c("#309f87","#f9c675","#F08080","#A00000")) +
scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000"))
p_col FILTERED GENUS
dys_limit <- max(metadata_colon$dys_filtered_genus,na.rm = TRUE) + 1.1*max(metadata_colon$dys_filtered_genus,na.rm = TRUE)
dys_min_limit <- min(metadata_colon$dys_filtered_genus)
p_col_dys_genus <- ggplot(metadata_colon %>%
mutate(`MDI`=dys_filtered_genus)) +
geom_boxplot(aes(x=Group, y=`MDI`),outliers = FALSE) +
geom_jitter(width = 0.3,height = 0,
aes(x=Group, y=`MDI`, color=Group,shape=Country),
size=1) +
theme_minimal() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
axis.ticks.x = element_line(size=0.3,color = "black"),
axis.ticks.y = element_line(size=0.3,color="black"),
axis.ticks.length = unit(4,"pt"),
panel.grid = element_blank()) +
ylim(dys_min_limit,dys_limit) +
scale_fill_manual(values=c("#309f87","#f9c675","#F08080","#A00000")) +
scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000")) +
theme(axis.text.x = element_text(angle = 45,vjust = 0.5))
p_col_dys_genus4.2.1.2 IBD
metadata_colon_melted <- melt(metadata_colon %>%
dplyr::filter(PSC_IBD %in% c(0,1)) %>%
dplyr::select("PSC_IBD",
"dys_unfiltered_asv",
"dys_unfiltered_genus",
"dys_filtered_asv",
"dys_filtered_genus")) Using PSC_IBD as id variables
p_col <- ggplot(metadata_colon_melted) +
geom_boxplot(aes(x=PSC_IBD, y=value),outliers = FALSE) +
geom_jitter(width = 0.2,height = 0,aes(x=PSC_IBD, y=value, color=PSC_IBD),
size=2) +
facet_wrap(~variable, ncol = 4,scales = "free") +
theme_minimal() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
axis.ticks.x = element_line(size=0.3,color = "black"),
axis.ticks.y = element_line(size=0.3,color="black"),
axis.ticks.length = unit(4,"pt"),
panel.grid = element_blank()) +
scale_fill_manual(values=c("#A06A2C", "#B2182B")) +
scale_color_manual(values=c("#A06A2C", "#B2182B"))
p_col FILTERED GENUS
metadata_colon_for_plot <- metadata_colon %>%
dplyr::filter(PSC_IBD %in% c(0,1)) %>%
mutate(Group=case_when(PSC_IBD == 1 ~ "ibd",
PSC_IBD == 0 ~ "no_ibd"))
metadata_colon_for_plot$Group <- factor(metadata_colon_for_plot$Group,levels = c("no_ibd","ibd"))
dys_limit <- max(metadata_colon_for_plot$dys_filtered_genus,na.rm = TRUE) + 0.7*max(metadata_colon_for_plot$dys_filtered_genus,na.rm = TRUE)
dys_min_limit <- min(metadata_colon_for_plot$dys_filtered_genus)
p_col_dys_genus_ibd <- ggplot(metadata_colon_for_plot %>%
mutate(`MDI`=dys_filtered_genus)) +
geom_boxplot(aes(x=Group, y=`MDI`),
outliers = FALSE) +
geom_jitter(width = 0.3,height = 0,
aes(x=Group, y=`MDI`, color=Group,shape=Country),
size=1.5) +
theme_minimal() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
axis.ticks.x = element_line(size=0.3,color = "black"),
axis.ticks.y = element_line(size=0.3,color="black"),
axis.ticks.length = unit(4,"pt"),
panel.grid = element_blank()) +
ylim(dys_min_limit,dys_limit) +
scale_fill_manual(values=c("#A06A2C", "#B2182B")) +
scale_color_manual(values=c("#A06A2C", "#B2182B")) +
theme(axis.text.x = element_text(angle = 45,vjust = 0.5))
p_col_dys_genus_ibd4.2.2 ROCs
metadata_colon_auc <- metadata_colon %>%
group_by(PatientID) %>%
distinct(PatientID, .keep_all = TRUE) %>%
as.data.frame()
comparisons <- combn(unique(metadata_colon_auc$Group), 2, simplify = TRUE)
comparisons <- comparisons[,c(3,2,1,6,5,4)]
auc_df <- data.frame(Comparison=rep(NA,ncol(comparisons)),
AUC=rep(NA,ncol(comparisons)))
roc_curve_list <- list()
roc_curve_data_list <- list()
for (i in 1:ncol(comparisons)){
name <- paste0(comparisons[1,i]," vs ", comparisons[2,i])
metadata_sub <- subset(metadata_colon_auc,metadata_colon_auc$Group %in% comparisons[,i])
metadata_sub$Group <- factor(metadata_sub$Group,
levels=c(comparisons[1,i],comparisons[2,i]))
roc_curve_data <- roc(metadata_sub$Group, metadata_sub$dys_unfiltered_genus)
auc_value <- auc(roc_curve_data) # Compute AUC
auc_df[i,] <- c(name,auc_value)
roc_curve_list[[name]] <- ggroc(roc_curve_data) +
theme_minimal() +
theme(legend.position = "none") +
ggtitle(paste0(name,' (AUC = ', round(auc_value,2),")")) +
theme(plot.title = element_text(size = 7))
roc_curve_data_list[[name]] <- roc_curve_data
}Setting levels: control = non-rPSC, case = healthy
Setting direction: controls > cases
Setting levels: control = non-rPSC, case = pre_ltx
Setting direction: controls < cases
Setting levels: control = non-rPSC, case = rPSC
Setting direction: controls < cases
Setting levels: control = pre_ltx, case = healthy
Setting direction: controls > cases
Setting levels: control = rPSC, case = healthy
Setting direction: controls > cases
Setting levels: control = rPSC, case = pre_ltx
Setting direction: controls > cases
auc_df Comparison AUC
1 non-rPSC vs healthy 0.849832952347459
2 non-rPSC vs pre_ltx 0.560759106213652
3 non-rPSC vs rPSC 0.675293605915615
4 pre_ltx vs healthy 0.877167060677699
5 rPSC vs healthy 0.924132138857783
6 rPSC vs pre_ltx 0.6135477582846
roc_colon <- roc_curve_all(roc_curve_data_list)
roc_colon4.2.3 Linear model
results_model <- pairwise.lmer(
formula = "dys_filtered_genus ~ Group * Country + (1|PatientID)",
factors=metadata_colon$Group,
data=metadata_colon)
knitr::kable(results_model[[1]][,c("Estimate", "p.adj","sig")],
digits=3,caption="Results of linear mixed-effect modeling")| Estimate | p.adj | sig | |
|---|---|---|---|
| healthy vs Groupnon-rPSC | 25.818 | 0.000 | *** |
| healthy vs non-rPSC - CZ vs NO | -0.432 | 0.955 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.240 | 0.956 | |
| healthy vs GrouprPSC | 40.128 | 0.000 | *** |
| healthy vs rPSC - CZ vs NO | -0.426 | 0.955 | |
| healthy vs GrouprPSC:CountryNO | -6.892 | 0.362 | |
| healthy vs Grouppre_ltx | 37.540 | 0.000 | *** |
| healthy vs pre_ltx - CZ vs NO | -0.420 | 0.955 | |
| healthy vs Grouppre_ltx:CountryNO | -11.185 | 0.058 | |
| non-rPSC vs GrouprPSC | 14.312 | 0.002 | ** |
| non-rPSC vs rPSC - CZ vs NO | -0.667 | 0.955 | |
| non-rPSC vs GrouprPSC:CountryNO | -6.662 | 0.528 | |
| pre_ltx vs Groupnon-rPSC | -11.706 | 0.016 | * |
| pre_ltx vs non-rPSC - CZ vs NO | -11.601 | 0.016 | * |
| pre_ltx vs Groupnon-rPSC:CountryNO | 10.936 | 0.089 | |
| pre_ltx vs GrouprPSC | 2.640 | 0.853 | |
| pre_ltx vs rPSC - CZ vs NO | -11.582 | 0.019 | * |
| pre_ltx vs GrouprPSC:CountryNO | 4.228 | 0.853 |
IBD
results_model <- pairwise.lmer(
formula = "dys_filtered_genus ~ Group * Country + (1|PatientID)",
factors=metadata_colon_for_plot$Group,
data=metadata_colon_for_plot)
knitr::kable(results_model[[1]],
digits=3,caption="Results of linear modeling")| Estimate | Std..Error | df | t.value | Pr…t.. | p.adj | sig | |
|---|---|---|---|---|---|---|---|
| no_ibd vs Groupibd | 5.771 | 4.110 | 437.349 | 1.404 | 0.161 | 0.161 | |
| no_ibd vs ibd - CZ vs NO | -3.164 | 5.250 | 331.780 | -0.603 | 0.547 | 0.547 | |
| no_ibd vs Groupibd:CountryNO | -0.886 | 5.687 | 340.946 | -0.156 | 0.876 | 0.876 |
library(pwr)Warning: package 'pwr' was built under R version 4.3.3
# Fit linear model
lmm <- lmer(dys_filtered_genus ~ Group * Country + (1|PatientID), metadata_colon_for_plot)
# Extract the fixed effect estimate for X
beta <- fixef(lmm)["Groupibd"]
# Extract residual standard deviation
sd_resid <- sigma(lmm)
# Compute Cohen's d
cohen_d <- beta / sd_resid
cohen_dGroupibd
1.001531
# Define parameters
u <- 2
v <- length(metadata_ileum_for_plot$SampleID) - u - 1 # Residual degrees of freedom (Total sample size - predictors - 1)
f2 <- 1 # Cohen’s f² effect size (0.02 = small, 0.15 = medium, 0.35 = large)
alpha <- 0.05 # Significance level
power <- pwr.f2.test(u = u, v = v, f2 = f2, sig.level = alpha)
power
Multiple regression power calculation
u = 2
v = 211
f2 = 1
sig.level = 0.05
power = 1
4.3 Saving results
5 Alpha diversity
# needed variables
alpha_div_plots <- list()
alpha_boxplots <- list()
corrs <- list()5.1 Terminal ileum
ileum_metadata_alpha_final <- metadata_ileum %>%
dplyr::select("SampleID", "PatientID","Group","Country",
"Observe","Shannon","Simpson","Pielou",
"dys_filtered_asv", "dys_filtered_genus")variables <- c("Observe","Shannon","Simpson","Pielou")
for (clinical_variable in variables){
# ILEUM
# boxplot
p <- clinical_boxplot(ileum_metadata_alpha_final,
variable=clinical_variable)
alpha_boxplots[[paste0("ileum_",clinical_variable)]] <- p
# correlation
## ASV
level="ASV"
corr <- clinical_correlation(
ileum_metadata_alpha_final,
clinical_variable,
level)
corrs[[paste0("ileum_asv_",clinical_variable)]] <- corr
p <- clinical_scatter(corr,
ileum_metadata_alpha_final,
clinical_variable,
level)
alpha_div_plots[[paste0("ileum_asv_",clinical_variable)]] <- p
## Genus
level="genus"
corr <- clinical_correlation(ileum_metadata_alpha_final,
clinical_variable,
level)
corrs[[paste0("ileum_genus_",clinical_variable)]] <- corr
p <- clinical_scatter(corr,
ileum_metadata_alpha_final,
clinical_variable,
level)
alpha_div_plots[[paste0("ileum_genus_",clinical_variable)]] <- p
}5.1.1 Boxplot
alpha_boxplots_ileum <- ggarrange(
plotlist = alpha_boxplots[grepl("ileum",names(alpha_boxplots))],
ncol = 4,
common.legend = TRUE,
legend = "right")
alpha_boxplots_ileum5.2 Colon
colon_metadata_alpha_final <- metadata_colon %>%
dplyr::select("SampleID", "PatientID", "Group","Country",
"Observe","Shannon","Simpson","Pielou",
"dys_filtered_asv","dys_filtered_genus")variables <- c("Observe","Shannon","Simpson","Pielou")
for (clinical_variable in variables){
# COLUMN
p <- clinical_boxplot(colon_metadata_alpha_final,
variable=clinical_variable)
alpha_boxplots[[paste0("colon_",clinical_variable)]] <- p
# correlation
## ASV
level="ASV"
corr <- clinical_correlation(colon_metadata_alpha_final,
clinical_variable,
level,
segment="colon")
corrs[[paste0("colon_asv_",clinical_variable)]] <- corr
p <- clinical_scatter(corr,
colon_metadata_alpha_final,
clinical_variable,
level)
alpha_div_plots[[paste0("colon_asv_",clinical_variable)]] <- p
## Genus
level="genus"
corr <- clinical_correlation(colon_metadata_alpha_final,
clinical_variable,
level,
segment="colon")
corrs[[paste0("colon_genus_",clinical_variable)]] <- corr
p <- clinical_scatter(corr,
colon_metadata_alpha_final,
clinical_variable,
level)
alpha_div_plots[[paste0("colon_genus_",clinical_variable)]] <- p
}5.2.1 Boxplot
alpha_boxplots_colon <- ggarrange(
plotlist = alpha_boxplots[grepl("colon",names(alpha_boxplots))],
ncol = 4,
common.legend = TRUE,
legend = "right")
alpha_boxplots_colon5.3 Scatterplots
ASV
alpha_dys_asv <- ggarrange(
plotlist = alpha_div_plots[grepl("asv",names(alpha_div_plots))],
ncol=4,
nrow=2,
common.legend = TRUE,
legend="right")`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
alpha_dys_asvGenus
alpha_dys_genus <- ggarrange(
plotlist = alpha_div_plots[grepl("genus",names(alpha_div_plots))],
ncol=4,
nrow=2,
common.legend = TRUE,
legend="right")`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
alpha_dys_genus5.4 Linear models
5.4.1 Richness
Ileum
results_model <- pairwise.lm(formula = "Observe ~ Group * Country",
factors=metadata_ileum$Group,
data=metadata_ileum)
knitr::kable(results_model[[1]][,c("Estimate", "p.adj","sig")],
digits=3,caption="Results of linear modeling")| Estimate | p.adj | sig | |
|---|---|---|---|
| healthy vs Groupnon-rPSC | -25.110 | 0.045 | * |
| healthy , non-rPSC - CZ vs NO | 11.219 | 0.346 | |
| healthy vs Groupnon-rPSC:CountryNO | -31.594 | 0.112 | |
| healthy vs GrouprPSC | -37.475 | 0.025 | * |
| healthy , rPSC - CZ vs NO | 11.219 | 0.346 | |
| healthy vs GrouprPSC:CountryNO | -22.969 | 0.346 | |
| healthy vs Grouppre_ltx | -53.642 | 0.000 | *** |
| healthy , pre_ltx - CZ vs NO | 11.219 | 0.346 | |
| healthy vs Grouppre_ltx:CountryNO | 6.313 | 0.710 | |
| non-rPSC vs GrouprPSC | -12.365 | 0.346 | |
| non-rPSC , rPSC - CZ vs NO | -20.375 | 0.200 | |
| non-rPSC vs GrouprPSC:CountryNO | 8.625 | 0.710 | |
| pre_ltx vs Groupnon-rPSC | 28.532 | 0.061 | |
| pre_ltx , non-rPSC - CZ vs NO | 17.533 | 0.346 | |
| pre_ltx vs Groupnon-rPSC:CountryNO | -37.907 | 0.081 | |
| pre_ltx vs GrouprPSC | 16.167 | 0.346 | |
| pre_ltx , rPSC - CZ vs NO | 17.533 | 0.346 | |
| pre_ltx vs GrouprPSC:CountryNO | -29.283 | 0.346 |
Colon
results_model <- pairwise.lmer(
formula = "Observe ~ Group * Country + (1|Patient)",
factors=metadata_colon$Group,
data=metadata_colon %>% mutate(Patient=PatientID))
knitr::kable(results_model[[1]][,c("Estimate", "p.adj","sig")],
digits=3,caption="Results of linear modeling")| Estimate | p.adj | sig | |
|---|---|---|---|
| healthy vs Groupnon-rPSC | -23.300 | 0.045 | * |
| healthy vs non-rPSC - CZ vs NO | 14.430 | 0.284 | |
| healthy vs Groupnon-rPSC:CountryNO | -32.063 | 0.113 | |
| healthy vs GrouprPSC | -40.512 | 0.009 | ** |
| healthy vs rPSC - CZ vs NO | 14.241 | 0.284 | |
| healthy vs GrouprPSC:CountryNO | -28.963 | 0.284 | |
| healthy vs Grouppre_ltx | -56.230 | 0.000 | *** |
| healthy vs pre_ltx - CZ vs NO | 14.193 | 0.284 | |
| healthy vs Grouppre_ltx:CountryNO | -7.489 | 0.659 | |
| non-rPSC vs GrouprPSC | -17.130 | 0.284 | |
| non-rPSC vs rPSC - CZ vs NO | -17.626 | 0.284 | |
| non-rPSC vs GrouprPSC:CountryNO | 2.894 | 0.893 | |
| pre_ltx vs Groupnon-rPSC | 32.865 | 0.040 | * |
| pre_ltx vs non-rPSC - CZ vs NO | 6.762 | 0.647 | |
| pre_ltx vs Groupnon-rPSC:CountryNO | -24.384 | 0.284 | |
| pre_ltx vs GrouprPSC | 15.660 | 0.396 | |
| pre_ltx vs rPSC - CZ vs NO | 6.652 | 0.647 | |
| pre_ltx vs GrouprPSC:CountryNO | -21.362 | 0.405 |
5.4.2 Shannon
Ileum
results_model <- pairwise.lm(formula = "Shannon ~ Group * Country",
factors=metadata_ileum$Group,
data=metadata_ileum)
knitr::kable(results_model[[1]][,c("Estimate", "p.adj","sig")],
digits=3,caption="Results of linear modeling")| Estimate | p.adj | sig | |
|---|---|---|---|
| healthy vs Groupnon-rPSC | -0.109 | 0.576 | |
| healthy , non-rPSC - CZ vs NO | 0.007 | 0.994 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.356 | 0.205 | |
| healthy vs GrouprPSC | -0.212 | 0.289 | |
| healthy , rPSC - CZ vs NO | 0.007 | 0.994 | |
| healthy vs GrouprPSC:CountryNO | -0.358 | 0.289 | |
| healthy vs Grouppre_ltx | -0.505 | 0.034 | * |
| healthy , pre_ltx - CZ vs NO | 0.007 | 0.994 | |
| healthy vs Grouppre_ltx:CountryNO | 0.144 | 0.634 | |
| non-rPSC vs GrouprPSC | -0.103 | 0.634 | |
| non-rPSC , rPSC - CZ vs NO | -0.350 | 0.080 | |
| non-rPSC vs GrouprPSC:CountryNO | -0.002 | 0.994 | |
| pre_ltx vs Groupnon-rPSC | 0.396 | 0.080 | |
| pre_ltx , non-rPSC - CZ vs NO | 0.150 | 0.576 | |
| pre_ltx vs Groupnon-rPSC:CountryNO | -0.500 | 0.115 | |
| pre_ltx vs GrouprPSC | 0.293 | 0.289 | |
| pre_ltx , rPSC - CZ vs NO | 0.150 | 0.576 | |
| pre_ltx vs GrouprPSC:CountryNO | -0.502 | 0.289 |
Colon
results_model <- pairwise.lmer(
formula = "Shannon ~ Group * Country + (1|Patient)",
factors=metadata_colon$Group,
data=metadata_colon %>% mutate(Patient=PatientID))
knitr::kable(results_model[[1]][,c("Estimate", "p.adj","sig")],
digits=3,caption="Results of linear modeling")| Estimate | p.adj | sig | |
|---|---|---|---|
| healthy vs Groupnon-rPSC | -0.184 | 0.182 | |
| healthy vs non-rPSC - CZ vs NO | 0.016 | 0.961 | |
| healthy vs Groupnon-rPSC:CountryNO | -0.378 | 0.091 | |
| healthy vs GrouprPSC | -0.334 | 0.061 | |
| healthy vs rPSC - CZ vs NO | 0.014 | 0.961 | |
| healthy vs GrouprPSC:CountryNO | -0.361 | 0.207 | |
| healthy vs Grouppre_ltx | -0.585 | 0.001 | ** |
| healthy vs pre_ltx - CZ vs NO | 0.015 | 0.961 | |
| healthy vs Grouppre_ltx:CountryNO | 0.081 | 0.852 | |
| non-rPSC vs GrouprPSC | -0.148 | 0.591 | |
| non-rPSC vs rPSC - CZ vs NO | -0.360 | 0.061 | |
| non-rPSC vs GrouprPSC:CountryNO | 0.013 | 0.962 | |
| pre_ltx vs Groupnon-rPSC | 0.399 | 0.061 | |
| pre_ltx vs non-rPSC - CZ vs NO | 0.095 | 0.790 | |
| pre_ltx vs Groupnon-rPSC:CountryNO | -0.455 | 0.095 | |
| pre_ltx vs GrouprPSC | 0.251 | 0.407 | |
| pre_ltx vs rPSC - CZ vs NO | 0.095 | 0.790 | |
| pre_ltx vs GrouprPSC:CountryNO | -0.442 | 0.283 |
6 FIGURE 5
terminal_ileum_list <- list(
ileum_genus_mdi_boxplot=p_il_dys_genus +
theme(legend.position = "none") +
ggtitle("Terminal ileum") +
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)),
ileum_genus_observe= alpha_div_plots[grepl("ileum_genus",names(alpha_div_plots))][[2]] +
theme(legend.position = "none")#+ ylab("Richness")
)
p_terminal_ileum <- ggarrange(
plotlist = terminal_ileum_list,
ncol=1,
nrow=2,heights = c(1.2,0.8),labels = c("A","B"))`geom_smooth()` using formula = 'y ~ x'
colon_list <- list(
colon_genus_mdi_boxplot=p_col_dys_genus +
theme(legend.position = "none") +
ggtitle("Colon") +
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15)),
colon_genus_observe= alpha_div_plots[grepl("colon_genus",names(alpha_div_plots))][[2]] +
theme(legend.position = "none") #+ ylab("Shannon")
)
p_colon <- ggarrange(
plotlist = colon_list,
ncol=1,
nrow=2,heights = c(1.2,0.8))`geom_smooth()` using formula = 'y ~ x'
p <- ggarrange(p_terminal_ileum,p_colon,ncol=2,common.legend = TRUE,
legend="right")
p <- ggarrange(p,ggplot() + theme_minimal(),ncol=2,
widths = c(1,0.15))p7 Clinical variables comparison
7.1 PSC groups
variables <- c("log_Bilirubin","Albumin","log_ALP","Platelets","log_Calprotectin","AOM","log_APRI","INR","log_FIB4","log_Creatinine","log_ALT","log_AST","log_GGT","Nancy_max","eMAYO","MAYO_dai")metadata_for_boxplots_psc <- metadata_for_boxplots %>%
dplyr::mutate(Group = case_when(
Group == "pre_ltx" ~ "pre_LTx",
Group == "non-rPSC" ~ "post_LTx",
Group == "rPSC" ~ "post_LTx",
TRUE ~ Group # Keep other values as is
)) %>%
dplyr::select(-PSC_IBD)
metadata_for_boxplots_psc$Group <- factor(
metadata_for_boxplots_psc$Group,levels=c("healthy","pre_LTx","post_LTx"))
test_df <- data.frame(`Variable`=NA,
`healthy_vs_pre_LTx`=NA,
`healthy_vs_post_LTx`=NA,
`pre_LTx_vs_post_LTx`=NA,check.names = FALSE)
boxplots_plots <- list()
i <- 1
for (clinical_variable in variables){
if (grepl("log_",clinical_variable)) {
metadata_for_boxplots_psc[,clinical_variable] <- log(
as.numeric(metadata_for_boxplots_psc[,gsub("log_","",clinical_variable)])
)
}
df <- metadata_for_boxplots_psc %>% dplyr::select("Group",clinical_variable)
df <- df %>% drop_na()
p <- clinical_boxplot(metadata_for_boxplots_psc,variable=clinical_variable)
boxplots_plots[[clinical_variable]] <- p
p_test <- pairwise.wilcox.test(
df[,clinical_variable],
df$Group,
p.adjust.method = "BH")$p.value
if (all(dim(p_test)>0)){
if (all(colnames(p_test) == c("healthy","pre_LTx")) &
all(rownames(p_test) == c("pre_LTx","post_LTx"))){
test_df[i,] <- c(clinical_variable,
round(p_test[1,1],3),
round(p_test[2,1],3),
round(p_test[2,2],3))
} else if (all(colnames(p_test) == c("pre_LTx")) &
all(rownames(p_test) == c("post_LTx"))) {
test_df[i,] <- c(clinical_variable,
NA,NA,
round(p_test[1,1],3))
} else{ cat(clinical_variable)}
} else{ cat(clinical_variable)}
i <- i +1
}
test_df[,-1] <- as.numeric(as.matrix(test_df[,-1]))
significance_df <- test_df[,-1]
significance_df[test_df[,-1] <= 0.05] <-'*'
significance_df[test_df[,-1] <= 0.01] <-'**'
significance_df[test_df[,-1] <= 0.001] <-'***'
significance_df$Variable <- test_df$Variable
write.xlsx(significance_df,"../results/clinical_significance.xlsx")
significance_df %<>% column_to_rownames("Variable")7.1.1 Test
knitr::kable(significance_df,
digits=3,caption="Pairwise wilcox test of clinical variables")| healthy_vs_pre_LTx | healthy_vs_post_LTx | pre_LTx_vs_post_LTx | |
|---|---|---|---|
| log_Bilirubin | ** | 0.798 | *** |
| Albumin | *** | *** | *** |
| log_ALP | *** | * | *** |
| Platelets | 0.318 | *** | *** |
| log_Calprotectin | NA | NA | * |
| AOM | NA | NA | 0.726 |
| log_APRI | *** | *** | *** |
| INR | 0.968 | 0.968 | 0.638 |
| log_FIB4 | ** | *** | 0.873 |
| log_Creatinine | *** | ** | *** |
| log_ALT | *** | 0.855 | *** |
| log_AST | *** | 0.384 | *** |
| log_GGT | *** | 0.197 | *** |
| Nancy_max | NA | NA | 0.785 |
| eMAYO | NA | NA | 0.515 |
| MAYO_dai | NA | NA | 0.188 |
7.1.2 Boxplots
for (i in 1:length(boxplots_plots)){
name <- names(boxplots_plots)[i]
pl <- boxplots_plots[[i]]
pl <- pl + ggtitle(name)
boxplots_plots[[i]] <- pl
}
bx <- ggarrange(plotlist = boxplots_plots, ncol = 4,nrow=4)bx7.2 IBD
variables <- c("log_Bilirubin","Albumin","log_ALP","Platelets","log_Calprotectin","AOM","log_APRI","INR","log_FIB4","log_Creatinine","log_ALT","log_AST","log_GGT","Nancy_max","eMAYO","MAYO_dai")test_df <- data.frame(`Variable`=NA,
`ibd_vs_noibd`=NA)
boxplots_plots <- list()
metadata_for_boxplots_ibd <- metadata_for_boxplots %>% dplyr::filter(PSC_IBD %in% c(0,1))
metadata_for_boxplots_ibd$Group [1] non-rPSC rPSC non-rPSC non-rPSC non-rPSC rPSC non-rPSC pre_ltx
[9] pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx non-rPSC pre_ltx pre_ltx
[17] pre_ltx non-rPSC pre_ltx pre_ltx rPSC pre_ltx non-rPSC non-rPSC
[25] rPSC pre_ltx pre_ltx pre_ltx non-rPSC non-rPSC rPSC pre_ltx
[33] non-rPSC non-rPSC non-rPSC pre_ltx non-rPSC pre_ltx non-rPSC pre_ltx
[41] pre_ltx non-rPSC pre_ltx rPSC non-rPSC pre_ltx pre_ltx pre_ltx
[49] pre_ltx non-rPSC pre_ltx pre_ltx non-rPSC pre_ltx pre_ltx pre_ltx
[57] non-rPSC pre_ltx rPSC pre_ltx rPSC non-rPSC pre_ltx pre_ltx
[65] non-rPSC non-rPSC pre_ltx pre_ltx non-rPSC pre_ltx non-rPSC non-rPSC
[73] pre_ltx non-rPSC non-rPSC non-rPSC pre_ltx pre_ltx pre_ltx non-rPSC
[81] pre_ltx pre_ltx pre_ltx pre_ltx non-rPSC pre_ltx pre_ltx pre_ltx
[89] non-rPSC pre_ltx pre_ltx pre_ltx pre_ltx non-rPSC rPSC rPSC
[97] pre_ltx pre_ltx pre_ltx non-rPSC non-rPSC pre_ltx pre_ltx pre_ltx
[105] pre_ltx pre_ltx pre_ltx non-rPSC pre_ltx pre_ltx pre_ltx rPSC
[113] rPSC pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx
[121] pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx
[129] pre_ltx pre_ltx pre_ltx rPSC pre_ltx pre_ltx non-rPSC non-rPSC
[137] non-rPSC non-rPSC non-rPSC rPSC non-rPSC non-rPSC rPSC non-rPSC
[145] non-rPSC non-rPSC non-rPSC non-rPSC rPSC non-rPSC rPSC non-rPSC
[153] non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC rPSC non-rPSC non-rPSC
[161] non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC
[169] non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC
[177] rPSC non-rPSC non-rPSC rPSC non-rPSC non-rPSC non-rPSC rPSC
[185] non-rPSC rPSC non-rPSC non-rPSC non-rPSC non-rPSC rPSC non-rPSC
[193] non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC rPSC rPSC
[201] non-rPSC rPSC non-rPSC non-rPSC rPSC non-rPSC rPSC non-rPSC
[209] non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC rPSC non-rPSC non-rPSC
[217] non-rPSC non-rPSC rPSC non-rPSC non-rPSC rPSC rPSC non-rPSC
[225] non-rPSC non-rPSC non-rPSC non-rPSC non-rPSC rPSC rPSC non-rPSC
[233] non-rPSC non-rPSC non-rPSC non-rPSC rPSC rPSC non-rPSC non-rPSC
[241] non-rPSC pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx
[249] rPSC pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx
[257] pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx
[265] pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx pre_ltx
Levels: healthy pre_ltx non-rPSC rPSC
i <- 1
for (clinical_variable in variables){
if (grepl("log_",clinical_variable)) {
metadata_for_boxplots_ibd[,clinical_variable] <- log(
as.numeric(metadata_for_boxplots_ibd[,gsub("log_","",clinical_variable)])
)
}
df <- metadata_for_boxplots_ibd %>% dplyr::select("PSC_IBD",clinical_variable)
df$PSC_IBD <- factor(df$PSC_IBD,levels=c("0","1"))
df <- df %>% drop_na()
p_test <- pairwise.wilcox.test(
df[,clinical_variable],
df$PSC_IBD,
p.adjust.method = "BH")$p.value
if (length(p_test)==0) test_df[i,] <- c(clinical_variable,NA)
else if ((colnames(p_test)==0) & (rownames(p_test)==1)){
test_df[i,] <- c(clinical_variable,
round(p_test[1,1],3))
} else{
cat(clinical_variable)
}
# boxplot
p <- clinical_boxplot(metadata_for_boxplots_ibd,
variable=clinical_variable)
boxplots_plots[[clinical_variable]] <- p
i <- i +1
}
test_df[,-1] <- as.numeric(as.matrix(test_df[,-1]))
significance_df <- test_df[,-1]
significance_df[test_df[,-1] <= 0.05] <-'*'
significance_df[test_df[,-1] <= 0.01] <-'**'
significance_df[test_df[,-1] <= 0.001] <-'***'
significance_df %<>% as.data.frame()
significance_df$Variable <- test_df$Variable
colnames(significance_df) <- c("ibd_vs_noibd","Variable")
write.xlsx(significance_df,"../results/clinical_significance_ibd.xlsx")
significance_df %<>% column_to_rownames("Variable")7.2.1 Test
knitr::kable(significance_df,
digits=3,caption="Pairwise wilcox test of clinical variables")| ibd_vs_noibd | |
|---|---|
| log_Bilirubin | 0.154 |
| Albumin | 0.396 |
| log_ALP | 0.968 |
| Platelets | 0.412 |
| log_Calprotectin | *** |
| AOM | * |
| log_APRI | 0.087 |
| INR | 0.298 |
| log_FIB4 | ** |
| log_Creatinine | ** |
| log_ALT | 0.159 |
| log_AST | 0.201 |
| log_GGT | 0.592 |
| Nancy_max | NA |
| eMAYO | NA |
| MAYO_dai | NA |
7.2.2 Boxplots
for (i in 1:length(boxplots_plots)){
name <- names(boxplots_plots)[i]
pl <- boxplots_plots[[i]]
pl <- pl + ggtitle(name)
boxplots_plots[[i]] <- pl
}
bx <- ggarrange(plotlist = boxplots_plots, ncol = 4,nrow=4)bx8 MDI and clinical variables
variables <- c("ALP","log_ALP",
"GGT","log_GGT",
"ALT","log_ALT",
"AST","log_AST",
"Bilirubin","log_Bilirubin",
"INR","log_INR",
"Albumin",
"Platelets",
"Creatinine","log_Creatinine",
"FIB4","log_FIB4",
"APRI","log_APRI",
"MAYO_PSC",
"AOM",
"Calprotectin","log_Calprotectin",
"Nancy_max","eMAYO","MAYO_dai")corrs <- list()
clinical_plots <- list()
for (clinical_variable in variables){
if (grepl("log_",clinical_variable)) {
metadata_for_boxplots[,clinical_variable] <- log(as.numeric(metadata_for_boxplots[,gsub("log_","",clinical_variable)]))
metadata_ileum[,clinical_variable] <- log(as.numeric(metadata_ileum[,gsub("log_","",clinical_variable)]))
metadata_colon[,clinical_variable] <- log(as.numeric(metadata_colon[,gsub("log_","",clinical_variable)]))
}
# ILEUM
# correlation
## ASV
level="ASV"
corr <- clinical_correlation(metadata_ileum,
clinical_variable,
level)
corrs[[paste0("ileum_asv_",clinical_variable)]] <- corr
p <- clinical_scatter(corr,
metadata_ileum,
clinical_variable,
level)
clinical_plots[[paste0("ileum_asv_",clinical_variable)]] <- p
## Genus
level="genus"
corr <- clinical_correlation(metadata_ileum,
clinical_variable,
level)
corrs[[paste0("ileum_genus_",clinical_variable)]] <- corr
p <- clinical_scatter(corr,
metadata_ileum,
clinical_variable,
level)
clinical_plots[[paste0("ileum_genus_",clinical_variable)]] <- p
# correlation
## ASV
level="ASV"
corr <- clinical_correlation(metadata_colon,
clinical_variable,
level,
segment="colon")
corrs[[paste0("colon_asv_",clinical_variable)]] <- corr
p <- clinical_scatter(corr,
metadata_colon,
clinical_variable,
level)
clinical_plots[[paste0("colon_asv_",clinical_variable)]] <- p
## Genus
level="genus"
corr <- clinical_correlation(metadata_colon,
clinical_variable,
level,
segment="colon")
corrs[[paste0("colon_genus_",clinical_variable)]] <- corr
p <- clinical_scatter(corr,
metadata_colon,
clinical_variable,
level)
clinical_plots[[paste0("colon_genus_",clinical_variable)]] <- p
}8.1 MDI ~ clinical variable
for (i in 1:length(clinical_plots)){
name <- names(clinical_plots)[i]
pl <- clinical_plots[[i]]
pl <- pl + ggtitle(name)
clinical_plots[[i]] <- pl
}
cl_asv <- ggarrange(
plotlist = clinical_plots[grepl("asv",names(clinical_plots))],
ncol=6,nrow=9)`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
cl_genus <- ggarrange(plotlist = clinical_plots[grepl("genus",names(clinical_plots))],
ncol=6,nrow=9)`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
`geom_smooth()` using formula = 'y ~ x'
cl_genus8.2 Heatmap
Genus level
corrs_ileum <- corrs[grepl("ileum_genus",names(corrs))]
corrs_colon <- corrs[grepl("colon_genus",names(corrs))]prepared_list <- prepare_for_heatmap(corrs_ileum,corrs_colon)
p_df_sig_mdi <- prepared_list[[1]]
r_df_mdi <- prepared_list[[2]]r_df_melt_mdi <- melt(r_df_mdi %>% rownames_to_column("SeqID"))Using SeqID as id variables
p_df_melt_mdi <- melt(p_df_sig_mdi %>% rownames_to_column("SeqID"),
id.vars = c("SeqID"))
r_df_melt_mdi$SeqID <- factor(r_df_melt_mdi$SeqID,
levels=unique(r_df_melt_mdi$SeqID))
p_df_melt_mdi$SeqID <- factor(p_df_melt_mdi$SeqID,
levels=unique(p_df_melt_mdi$SeqID))
r_df_melt_mdi$variable <- factor(r_df_melt_mdi$variable,
levels=rev(unique(r_df_melt_mdi$variable)))
p_df_melt_mdi$variable <- factor(p_df_melt_mdi$variable,
levels=rev(unique(p_df_melt_mdi$variable)))
p_mdi <- ggplot() +
geom_tile(data=r_df_melt_mdi, aes(SeqID, variable, fill= value)) +
theme_minimal() +
scale_fill_gradient2(name = "Rho", low = "blue", mid = "white", high = "red", midpoint = 0,limits = c(-0.7, 0.7)) +
geom_text(data=p_df_melt_mdi,aes(x=SeqID,y=variable,label=value,vjust=0.6)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("") + ylab("") +
ggtitle("MDI")+
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15))
p_mdi9 Bacteria and clinical variables
9.1 Terminal ileum
level="genus"Aggregation, filtering
# Aggregation
genus_data <- aggregate_taxa(ileum_asv_tab,
ileum_taxa_tab,
taxonomic_level=level,
names=TRUE)
# Filtration
filt_data <- filtering_steps(genus_data[[1]],
genus_data[[2]],
ileum_metadata,
seq_depth_threshold=10000)Removing 5 ASV(s)
filt_ileum_genus_tab <- filt_data[[1]]
filt_ileum_genus_taxa <- filt_data[[2]]
filt_ileum_metadata_genus <- filt_data[[3]]PSC effect taxa
psc_effect_ileum_genus <- read.xlsx(
"../results/Q1/univariate_analysis/supplements_psc_effect_terminal_ileum.xlsx",
sheet = "terminal_ileum genus")
psc_taxa <- psc_effect_ileum_genus$SeqID
increased <- psc_effect_ileum_genus$ASV[psc_effect_ileum_genus$log2FoldChange >0]
decreased <- psc_effect_ileum_genus$ASV[psc_effect_ileum_genus$log2FoldChange <0]Preparing datasets
my_table <- filt_ileum_genus_tab %>% column_to_rownames("SeqID")
data_clr <- vegan::decostand(my_table,method = "clr", MARGIN = 2,pseudocount=0.5) %>% as.matrix()
data_clr <- data_clr[psc_taxa,] %>% t() %>% as.data.frame() %>% rownames_to_column("SampleID")
metadata_with_abundances <- data_clr %>% full_join(metadata_ileum,by="SampleID")# significant variables
variables <- c("ALP",
"APRI",
"Albumin",
"Platelets")corrs_ileum <- corrs[grepl("ileum_genus",names(corrs))]
corrs_colon <- corrs[grepl("colon_genus",names(corrs))]prepared_list <- prepare_for_heatmap(corrs_ileum,corrs_colon)
p_df_sig_mdi <- prepared_list[[1]]
variables <- colnames(p_df_sig_mdi)[!apply(p_df_sig_mdi,2,function(x) {
all(x == "")})]corrs <- list()
clinical_plots <- list()
for (clinical_variable in variables){
for (taxon in psc_taxa){
if (grepl("log_",clinical_variable)) {
metadata_with_abundances[,clinical_variable] <- log(as.numeric(metadata_with_abundances[,gsub("log_","",clinical_variable)]))
}
# ILEUM
# correlation
corr <- clinical_correlation_abundances(metadata_with_abundances,clinical_variable,taxon,level="genus")
corrs[[paste0("ileum_genus_",clinical_variable,"_", taxon)]] <- corr
if (corr$P <0.05){
p <- clinical_scatter_abundances(corr,metadata_with_abundances,clinical_variable,taxon,level="genus",size = 3)
clinical_plots[[paste0("ileum_genus_",clinical_variable,"_",taxon)]] <- p
} else print(corr$P)
}
}[1] 0.68971
[1] 0.5727108
[1] 0.06684104
[1] 0.07115727
[1] 0.6939827
[1] 0.1949693
[1] 0.9010701
[1] 0.8492244
[1] 0.06406532
[1] 0.7152049
[1] 0.2833256
[1] 0.3875891
[1] 0.1593069
[1] 0.1583775
[1] 0.3674669
[1] 0.2368698
[1] 0.2485435
[1] 0.07856963
[1] 0.3431773
[1] 0.3279462
[1] 0.05899516
[1] 0.3401515
[1] 0.6083045
[1] 0.2541031
[1] 0.1143359
[1] 0.0968635
[1] 0.05785104
[1] 0.5824842
[1] 0.990239
[1] 0.2349355
[1] 0.1710323
[1] 0.6455714
[1] 0.2189091
[1] 0.2266506
[1] 0.9466934
[1] 0.3278282
[1] 0.1386096
[1] 0.1180884
prepared_list <- subprepare_for_heatmap(corrs,MDI = FALSE)
p_df_sig <- prepared_list[[1]]
r_df <- prepared_list[[2]]r_df_melt <- melt(r_df %>% rownames_to_column("SeqID"))Using SeqID as id variables
p_df_melt <- melt(p_df_sig %>% rownames_to_column("SeqID"),id.vars = c("SeqID"))
r_df_melt$variable <- factor(r_df_melt$variable,
levels=rev(unique(r_df_melt$variable)))
p_df_melt$variable <- factor(p_df_melt$variable,
levels =rev(unique(p_df_melt$variable)))
r_df_melt$SeqID <- factor(r_df_melt$SeqID, levels = unique(r_df_melt$SeqID))
p_df_melt$SeqID <- factor(p_df_melt$SeqID, levels = unique(p_df_melt$SeqID))
p_ileum <- ggplot() +
geom_tile(data=r_df_melt, aes(variable, SeqID, fill= value)) +
theme_minimal() +
scale_fill_gradient2(name = "Rho", low = "blue", mid = "white", high = "red", midpoint = 0,limits = c(-0.5, 0.5)) +
geom_text(data=p_df_melt,aes(x=variable,y=SeqID,label=value,vjust=0.6)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("") + ylab("") +
ggtitle("Terminal ileum")+
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15))
p_ileum9.2 Colon
Aggregation, filtering
genus_data <- aggregate_taxa(colon_asv_tab,
colon_taxa_tab,
taxonomic_level="Genus",
names=TRUE)
# Filtration
filt_data <- filtering_steps(genus_data[[1]],
genus_data[[2]],
colon_metadata,
seq_depth_threshold=10000)Removing 10 ASV(s)
filt_colon_genus_tab <- filt_data[[1]]
filt_colom_genus_taxa <- filt_data[[2]]
filt_colon_metadata_genus <- filt_data[[3]]psc_effect_colon_genus <- read.xlsx("../results/Q1/univariate_analysis/supplements_psc_effect_colon.xlsx",
sheet = "colon genus")
psc_taxa <- psc_effect_colon_genus$SeqID
increased <- psc_effect_colon_genus$SeqID[psc_effect_colon_genus$log2FoldChange >0]
decreased <- psc_effect_colon_genus$SeqID[psc_effect_colon_genus$log2FoldChange <0]my_table <- filt_colon_genus_tab %>% column_to_rownames("SeqID")
data_clr <- vegan::decostand(my_table,method = "clr", MARGIN = 2,pseudocount=0.5) %>% as.matrix()
data_clr <- data_clr[psc_taxa,] %>% t() %>% as.data.frame() %>% rownames_to_column("SampleID")
metadata_with_abundances <- data_clr %>% full_join(metadata_colon,by="SampleID")variables <- colnames(p_df_sig_mdi)[!apply(p_df_sig_mdi,2,function(x) {
all(x == "")})]corrs <- list()
clinical_plots <- list()
for (clinical_variable in variables){
for (taxon in psc_taxa){
if (grepl("log_",clinical_variable)) {
metadata_with_abundances[,clinical_variable] <- log(as.numeric(metadata_with_abundances[,gsub("log_","",clinical_variable)]))
}
# ILEUM
# correlation
corr <- clinical_correlation_abundances(metadata_with_abundances,
clinical_variable,
taxon,level="genus",
rename_p = TRUE)
if ( (corr$P == "< 0.05") |
(corr$P == "< 0.01") |
(corr$P == "< 0.001") |
corr$P < 0.05){
p <- clinical_scatter_abundances(corr,
metadata_with_abundances,
clinical_variable,
taxon,
level="genus",
size = 3)
clinical_plots[[paste0("colon_genus_",clinical_variable,"_",taxon)]] <- p
# once again, withou renaming
corr <- clinical_correlation_abundances(metadata_with_abundances,
clinical_variable,
taxon,level="genus")
corrs[[paste0("colon_genus_",clinical_variable,"_", taxon)]] <- corr
} else print(corr$P)
}
}clinical_plots_colon <- clinical_plots
corrs_colon <- corrs
prepared_list <- subprepare_for_heatmap(corrs,MDI = FALSE)
p_df_sig <- prepared_list[[1]]
r_df <- prepared_list[[2]]r_df_melt <- melt(r_df %>% rownames_to_column("SeqID"))Using SeqID as id variables
p_df_melt <- melt(p_df_sig %>% rownames_to_column("SeqID"),id.vars = c("SeqID"))
r_df_melt$variable <- factor(r_df_melt$variable,
levels=rev(unique(r_df_melt$variable)))
p_df_melt$variable <- factor(p_df_melt$variable,
levels =rev(unique(p_df_melt$variable)))
r_df_melt$SeqID <- factor(r_df_melt$SeqID, levels = unique(r_df_melt$SeqID))
p_df_melt$SeqID <- factor(p_df_melt$SeqID, levels = unique(p_df_melt$SeqID))
p_colon <- ggplot() +
geom_tile(data=r_df_melt, aes(variable, SeqID, fill= value)) +
theme_minimal() +
scale_fill_gradient2(name = "Rho", low = "blue", mid = "white", high = "red", midpoint = 0,limits = c(-0.5, 0.5)) +
geom_text(data=p_df_melt,aes(x=variable,y=SeqID,label=value,vjust=0.6)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
xlab("") + ylab("") +
ggtitle("Colon")+
theme(plot.title = element_text(hjust=0.5,face = "bold",size = 15))
p_colon9.3 FIGURE 6
mdi_heatmap <- ggarrange(p_mdi,p_ileum,p_colon,ncol = 3,widths = c(0.7,1,1))
mdi_heatmap10 Calprotectin ~ Nancy_max
cor_res <- cor.test(
metadata_for_boxplots$Calprotectin,
metadata_for_boxplots$Nancy_max)
cor_res
Pearson's product-moment correlation
data: metadata_for_boxplots$Calprotectin and metadata_for_boxplots$Nancy_max
t = 3.0164, df = 90, p-value = 0.003325
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
0.1046851 0.4781488
sample estimates:
cor
0.3030057
p_1 <- ggplot(metadata_for_boxplots) +
geom_jitter(
aes(x=Nancy_max, y=Calprotectin, color=Group),
height = 0,width = 0.2) +
geom_text(aes(x=3.2, y=5700),
label=gsub("\\\\n", "\n", labels),hjust = 0) +
geom_smooth(aes(x=Nancy_max, y=Calprotectin),
method=lm, se=TRUE) +
theme_minimal() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
axis.ticks.x = element_line(size=0.3,color = "black"),
axis.ticks.y = element_line(size=0.3,color="black"),
axis.ticks.length = unit(4,"pt"),
panel.grid = element_blank()) +
scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000"))
p_1Warning in geom_text(aes(x = 3.2, y = 5700), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
a single row.
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 278 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 278 rows containing missing values or values outside the scale range
(`geom_point()`).
11 Calprotectin ~ ALP
cor_res <- cor.test(
metadata_for_boxplots$Calprotectin,
metadata_for_boxplots$ALP)
cor_res
Pearson's product-moment correlation
data: metadata_for_boxplots$Calprotectin and metadata_for_boxplots$ALP
t = -0.45614, df = 236, p-value = 0.6487
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.15625130 0.09785193
sample estimates:
cor
-0.02967918
p_2 <- ggplot(metadata_for_boxplots) +
geom_jitter(aes(x=ALP, y=Calprotectin, color=Group),height = 0,width = 0) +
geom_text(aes(x=14.34, y=5700),
label=gsub("\\\\n", "\n", labels),hjust = 0) +
geom_smooth(aes(x=ALP, y=Calprotectin),method=lm, se=TRUE) +
theme_minimal() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
axis.ticks.x = element_line(size=0.3,color = "black"),
axis.ticks.y = element_line(size=0.3,color="black"),
axis.ticks.length = unit(4,"pt"),
panel.grid = element_blank()) +
scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000"))
p_2Warning in geom_text(aes(x = 14.34, y = 5700), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
a single row.
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 132 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 132 rows containing missing values or values outside the scale range
(`geom_point()`).
12 Calprotectin ~ GGT
cor_res <- cor.test(
metadata_for_boxplots$Calprotectin,
metadata_for_boxplots$GGT)
cor_res
Pearson's product-moment correlation
data: metadata_for_boxplots$Calprotectin and metadata_for_boxplots$GGT
t = -0.17189, df = 108, p-value = 0.8638
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2031507 0.1712334
sample estimates:
cor
-0.01653834
p_3 <- ggplot(metadata_for_boxplots) +
geom_jitter(aes(x=GGT, y=Calprotectin, color=Group),
height = 0,width = 0) +
geom_text(aes(x=12.18, y=5700),
label=gsub("\\\\n", "\n", labels),hjust = 0) +
geom_smooth(aes(x=GGT, y=Calprotectin),method=lm, se=TRUE) +
theme_minimal() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
axis.ticks.x = element_line(size=0.3,color = "black"),
axis.ticks.y = element_line(size=0.3,color="black"),
axis.ticks.length = unit(4,"pt"),
panel.grid = element_blank()) +
scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000"))
p_3Warning in geom_text(aes(x = 12.18, y = 5700), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
a single row.
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 260 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 260 rows containing missing values or values outside the scale range
(`geom_point()`).
13 Nancy_max ~ ALP
cor_res <- cor.test(
metadata_for_boxplots$ALP,
metadata_for_boxplots$Nancy_max)
cor_res
Pearson's product-moment correlation
data: metadata_for_boxplots$ALP and metadata_for_boxplots$Nancy_max
t = -0.5031, df = 111, p-value = 0.6159
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.2303974 0.1382504
sample estimates:
cor
-0.04769762
p_4 <- ggplot(metadata_for_boxplots) +
geom_jitter(aes(x=Nancy_max, y=ALP, color=Group),
height = 0,width = 0.2) +
geom_text(aes(x=3.2, y=16.9),
label=gsub("\\\\n", "\n", labels),hjust = 0) +
geom_smooth(aes(x=Nancy_max, y=ALP),method=lm, se=TRUE) +
theme_minimal() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
axis.ticks.x = element_line(size=0.3,color = "black"),
axis.ticks.y = element_line(size=0.3,color="black"),
axis.ticks.length = unit(4,"pt"),
panel.grid = element_blank()) +
scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000"))
p_4Warning in geom_text(aes(x = 3.2, y = 16.9), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
a single row.
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 257 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 257 rows containing missing values or values outside the scale range
(`geom_point()`).
14 Nancy_max ~ GGT
cor_res <- cor.test(metadata_for_boxplots$GGT,
metadata_for_boxplots$Nancy_max)
cor_res
Pearson's product-moment correlation
data: metadata_for_boxplots$GGT and metadata_for_boxplots$Nancy_max
t = 0.52563, df = 110, p-value = 0.6002
alternative hypothesis: true correlation is not equal to 0
95 percent confidence interval:
-0.1367723 0.2334414
sample estimates:
cor
0.0500538
p_5 <- ggplot(metadata_for_boxplots) +
geom_jitter(aes(x=Nancy_max, y=GGT, color=Group),height = 0,width = 0.2) +
geom_text(aes(x=3.2, y=14.45),
label=gsub("\\\\n", "\n", labels),hjust = 0) +
geom_smooth(aes(x=Nancy_max, y=GGT),method=lm, se=TRUE) +
theme_minimal() +
theme(panel.border = element_rect(color = "black", fill = NA, size = 0),
axis.ticks.x = element_line(size=0.3,color = "black"),
axis.ticks.y = element_line(size=0.3,color="black"),
axis.ticks.length = unit(4,"pt"),
panel.grid = element_blank()) +
scale_color_manual(values=c("#309f87","#f9c675","#F08080","#A00000"))
p_5Warning in geom_text(aes(x = 3.2, y = 14.45), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
a single row.
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 258 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 258 rows containing missing values or values outside the scale range
(`geom_point()`).
p <- ggarrange(plotlist=list(p_1,p_2,p_3,p_4,p_5),ncol=2,nrow=3)Warning in geom_text(aes(x = 3.2, y = 5700), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
a single row.
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 278 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 278 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning in geom_text(aes(x = 14.34, y = 5700), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
a single row.
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 132 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 132 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning in geom_text(aes(x = 12.18, y = 5700), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
a single row.
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 260 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 260 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning in geom_text(aes(x = 3.2, y = 16.9), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
a single row.
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 257 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 257 rows containing missing values or values outside the scale range
(`geom_point()`).
Warning in geom_text(aes(x = 3.2, y = 14.45), label = gsub("\\\\n", "\n", : All aesthetics have length 1, but the data has 370 rows.
ℹ Please consider using `annotate()` or provide this layer with data containing
a single row.
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 258 rows containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 258 rows containing missing values or values outside the scale range
(`geom_point()`).
ppdf("../figures/clinical/calprotectin_nancymax.pdf",width = 10,height = 10)
p
dev.off()png
2