Drost HG, Gabel A, Grosse I, Quint M. Evidence for Active Maintenance of Phylotranscriptomic Hourglass Patterns in Animal and Plant Embryogenesis. Mol. Biol. Evol. (2015) 32 (5) 1221-1231. doi:10.1093/molbev/msv012.
A new version of the Phylostratigraphy algorithm implemented by Alexander Gabel can be found at: https://github.com/AlexGa/Phylostratigraphy.
We recommend to use the new version for performing custom phylostratigraphy due to easier applicability to any genome of interest. The older version implemented by Alexander is useful to repreduce the results shown in this study, but was not implemented to be applicable to any genome of interest.
Proceed with the old version used to generate the phylostratigraphic maps in the publication:
Phylostratigraphy was introduced by Domazet-Lošo et al. in 2007 to trace the evolutionary origin of protein coding genes. It was performed by using the Perl script createPsMap.pl (implemented (cc) by Alexander Gabel). The resulting phylostratigraphic map stores the phylostratum in the first column and the corresponding gene id in the second column.
For creating the phylostratigraphic map the following steps have to be done:
-
Make sure that BLAST (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/release/2.2.21/) is installed on your machine.
-
Download the sequence database phyloBlastDB_Drost_Gabel_Grosse_Quint.fa used for BLAST searches and unpack it (
tar xfvj phyloBlastDB_Drost_Gabel_Grosse_Quint.fa.tbz). -
Make sure that the header of your FASTA-files (e.g. Athaliana_167_protein.fa) fullfills the following specification:
>GeneID | [organism_name] | [taxonomy]
Notice, the taxonomy begins after the node "Cellular organisms" e.g.
>NP_146894.1 | [Aeropyrum pernix] | [Archaea; Crenarchaeota; Thermoprotei; Desulfurococcales; Desulfurococcaceae; Aeropyrum]
or
>YP_001514406.1 | [Acaryochloris marina MBIC11017] | [Bacteria; Cyanobacteria; Oscillatoriophycideae; Chroococcales; Acaryochloris; Acaryochloris marina]
or
>ATCG00500.1|PACid:19637947 | [Arabidopsis thaliana] | [Eukaryota; Viridiplantae; Streptophyta; Streptophytina; Embryophyta; Tracheophyta; Euphyllophyta; Spermatophyta; Magnoliophyta; eudicotyledons; core eudicotyledons; rosids; malvids; Brassicales; Brassicaceae; Camelineae; Arabidopsis]
- Use the following command to start the Perl script
perl createPsMap.pl -i Athaliana_167_protein_with_new_Header.fa -d phyloBlastDB_Drost_Gabel_Grosse_Quint.fa -p BLAST_Athaliana
-r athaliana_blast_results -t 30 -a 64
Arguments:
-i,--input input file name in FASTA format
-d,--database BLAST sequence database name
-p,--prefix Prefix for generated mysql-files containing BLAST results
-r,--resultTable mysql table name
-t,--threshold threshold for sequence length (Default 30 amino acids)
-a threads for BLAST searches
-e,--evalue e-value threshold for BLAST
Divergence Stratigraphy is the process of quantifying the selection pressure (in terms of amino acid sequence divergence) acting on orthologous genes between closely related species. The resulting sequence divergence map (short divergence map), stores the divergence stratum in the first column and the query_id of inferred orthologous genes in the second column (Quint et al., 2012; Drost et al., 2015).
Following steps are performed to obtain a standard divergence map based on the divergence_stratigraphy() function:
-
Orthology Inference using BLAST reciprocal best hit ("RBH") based on blastp
-
Pairwise global amino acid alignments of orthologous genes using the Needleman-Wunsch algorithm
-
Codon alignments of orthologous genes using PAL2NAL
-
dNdS estimation using Comeron's method (1995)
-
Assigning estimated dNdS values to divergence strata (deciles of all dNdS values)
When using the divergence_stratigraphy() function implemented in orthologr it is assumed that you have BLAST installed on your machine.
# install package 'orthologr' from: https://github.com/HajkD/orthologr
install.packages("devtools") # note for wondows installation see https://github.com/HajkD/orthologr for details
devtools::install_github("HajkD/orthologr", build_vignettes = TRUE, dependencies = TRUE)
library(orthologr)
install.packages("myTAI")
library(myTAI)
install.packages("gdata")
library(gdata)
# install the Biostrings package from Bioconductor
source("http://bioconductor.org/biocLite.R")
biocLite("Biostrings")
To perform Divergence Stratigraphy using orthologr users need the following prerequisites
- a CDS file covering all protein coding genes of the query organism of interest
- a CDS file covering all protein coding genes of the subject organism of interest
First, the CDS sequences for all protein coding genes of all query and subject organisms need to be obtained.
The CDS retrieval can be done using a Terminal or by manual downloading the files
- ftp://ftp.ensembl.org/pub/release-77/fasta/danio_rerio/cds/Danio_rerio.Zv9.cds.all.fa.gz
- ftp://ftp.ensembl.org/pub/release-77/fasta/takifugu_rubripes/cds/Takifugu_rubripes.FUGU4.cds.all.fa.gz
- ftp://ftp.ensembl.org/pub/release-77/fasta/astyanax_mexicanus/cds/Astyanax_mexicanus.AstMex102.cds.all.fa.gz
- ftp://ftp.ensembl.org/pub/release-77/fasta/xiphophorus_maculatus/cds/Xiphophorus_maculatus.Xipmac4.4.2.cds.all.fa.gz
- ftp://ftp.ensembl.org/pub/release-77/fasta/gadus_morhua/cds/Gadus_morhua.gadMor1.cds.all.fa.gz
- ftp://ftp.flybase.net/genomes/Drosophila_melanogaster/dmel_r5.53_FB2013_05/fasta/dmel-all-CDS-r5.53.fasta.gz
- ftp://ftp.flybase.net/genomes/Drosophila_simulans/dsim_r1.4_FB2012_03/fasta/dsim-all-CDS-r1.4.fasta.gz
- ftp://ftp.flybase.net/genomes/Drosophila_yakuba/dyak_r1.3_FB2011_08/fasta/dyak-all-CDS-r1.3.fasta.gz
- ftp://ftp.flybase.net/genomes/Drosophila_persimilis/dper_r1.3_FB2014_03/fasta/dper-all-CDS-r1.3.fasta.gz
- ftp://ftp.flybase.net/genomes/Drosophila_virilis/dvir_r1.2_FB2012_01/fasta/dvir-all-CDS-r1.2.fasta.gz
- ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Athaliana/annotation/Athaliana_167_cds.fa.gz
- ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Alyrata/annotation/Alyrata_107_cds.fa.gz
- ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Cpapaya/annotation/Cpapaya_113_cds.fa.gz
- ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Crubella/annotation/Crubella_183_cds.fa.gz
- ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Brapa/annotation/Brapa_197_cds.fa.gz
This is an example shell script how to download the CDS files listed above:
# download CDS file of A. thaliana
curl ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Athaliana/annotation/Athaliana_167_cds.fa.gz -o Athaliana_167_cds.fa.gz
# unzip the fasta file
gunzip -d Athaliana_167_cds.fa.gz
# download CDS file of A. lyrata
curl ftp://ftp.jgi-psf.org/pub/compgen/phytozome/v9.0/Alyrata/annotation/Alyrata_107_cds.fa.gz
-o Alyrata_107_cds.fa.gz
# unzip the fasta file
gunzip -d Alyrata_107_cds.fa.gz
When the download is finished you need to unzip the files and then start R to perform the following analyses. Make sure you have all fasta files stored in the current working directory of your R session.
When using the divergence_stratigraphy() function, you can specify the comp_cores
argument to perform parallel computations on a multicore machine.
library(orthologr)
### For Danio rerio
# compute the divergence map of D. rerio vs. T. rubripes
Drerio_vs_Trubripes_DM <- divergence_stratigraphy(
query_file = "Danio_rerio.Zv9.cds.all.fa",
subject_file = "Takifugu_rubripes.FUGU4.cds.all.fa",
eval = "1E-5",
ortho_detection = "RBH",
comp_cores = 1,
quiet = TRUE,
clean_folders = TRUE )
# compute the divergence map of D. rerio vs. A. mexicanus
Drerio_vs_Amexicanus_DM <- divergence_stratigraphy(
query_file = "Danio_rerio.Zv9.cds.all.fa",
subject_file = "Astyanax_mexicanus.AstMex102.cds.all.fa",
eval = "1E-5",
ortho_detection = "RBH",
comp_cores = 1,
quiet = TRUE,
clean_folders = TRUE )
# compute the divergence map of D. rerio vs. X. maculatus
Drerio_vs_Xmaculatus_DM <- divergence_stratigraphy(
query_file = "Danio_rerio.Zv9.cds.all.fa",
subject_file = "Xiphophorus_maculatus.Xipmac4.4.2.cds.all.fa",
eval = "1E-5",
ortho_detection = "RBH",
comp_cores = 1,
quiet = TRUE,
clean_folders = TRUE )
# compute the divergence map of D. rerio vs. G. morhua
Drerio_vs_Gmorhua_DM <- divergence_stratigraphy(
query_file = "Danio_rerio.Zv9.cds.all.fa",
subject_file = "Gadus_morhua.gadMor1.cds.all.fa",
eval = "1E-5",
ortho_detection = "RBH",
comp_cores = 1,
quiet = TRUE,
clean_folders = TRUE )
### For Drosophila melanogaster
# compute the divergence map of D. melanogaster vs. D. simulans
Dmel_vs_Dsim_DM <- divergence_stratigraphy(
query_file = "dmel-all-CDS-r5.53.fasta",
subject_file = "dsim-all-CDS-r1.4.fasta",
eval = "1E-5",
ortho_detection = "RBH",
comp_cores = 1,
quiet = TRUE,
clean_folders = TRUE )
# compute the divergence map of D. melanogaster vs. D. persimilis
Dmel_vs_Dper_DM <- divergence_stratigraphy(
query_file = "dmel-all-CDS-r5.53.fasta",
subject_file = "dper-all-CDS-r1.3.fasta",
eval = "1E-5",
ortho_detection = "RBH",
comp_cores = 1,
quiet = TRUE,
clean_folders = TRUE )
# compute the divergence map of D. melanogaster vs. D. yakuba
Dmel_vs_Dyak_DM <- divergence_stratigraphy(
query_file = "dmel-all-CDS-r5.53.fasta",
subject_file = "dyak-all-CDS-r1.3.fasta",
eval = "1E-5",
ortho_detection = "RBH",
comp_cores = 1,
quiet = TRUE,
clean_folders = TRUE )
# compute the divergence map of D. melanogaster vs. D. virilis
Dmel_vs_Dvir_DM <- divergence_stratigraphy(
query_file = "dmel-all-CDS-r5.53.fasta",
subject_file = "dvir-all-CDS-r1.2.fasta",
eval = "1E-5",
ortho_detection = "RBH",
comp_cores = 1,
quiet = TRUE,
clean_folders = TRUE )
### For Arabidopsis thaliana
# compute the divergence map of A. thaliana vs. A. lyrata
Ath_vs_Aly_DM <- divergence_stratigraphy(
query_file = "Athaliana_167_cds.fa",
subject_file = "Alyrata_107_cds.fa",
eval = "1E-5",
ortho_detection = "RBH",
comp_cores = 1,
quiet = TRUE,
clean_folders = TRUE )
# compute the divergence map of A. thaliana vs. C. rubella
Ath_vs_Crubella_DM <- divergence_stratigraphy(
query_file = "Athaliana_167_cds.fa",
subject_file = "Crubella_183_cds.fa",
eval = "1E-5",
ortho_detection = "RBH",
comp_cores = 1,
quiet = TRUE,
clean_folders = TRUE )
# compute the divergence map of A. thaliana vs. B. rapa
Ath_vs_Brapa_DM <- divergence_stratigraphy(
query_file = "Athaliana_167_cds.fa",
subject_file = "Brapa_197_cds.fa",
eval = "1E-5",
ortho_detection = "RBH",
comp_cores = 1,
quiet = TRUE,
clean_folders = TRUE )
# compute the divergence map of A. thaliana vs. C. papaya
Ath_vs_Cpapaya_DM <- divergence_stratigraphy(
query_file = "Athaliana_167_cds.fa",
subject_file = "Cpapaya_113_cds.fa",
eval = "1E-5",
ortho_detection = "RBH",
comp_cores = 1,
quiet = TRUE,
clean_folders = TRUE )
It is now assumed that the Divergence Map of interest and the corresponding gene expression data set
are joined. For this purpose the MatchMap() function implemented in the myTAI package can be used.
See ?myTAI::MatchMap for details. However, the MatchMap() function can only deal with identical gene ids
present in the Phylo/Divergence-Maps and the corresponding gene expression set.
After performing Phylostratigraphy and Divergence Stratigraphy PhyloExpressionSets and DivergenceExpressionSets can be obtained by matching the corresponding phylostratigraphic maps and divergence maps of Danio rerio, Drosophila melanogaster, and Arabidopsis thaliana with the corresponding transcriptome data sets covering the embryogenesis of Danio rerio, Drosophila melanogaster, and Arabidopsis thaliana (see Methods in Drost et al., 2015 for details)
The following data sets (Supplementary table S3 and S5) can be downloaded here : Supplementary table S3.xls and Supplementary table S5.xls.
Please note, that for D. melanogaster we retrieved already processed gene expression levels from http://www.stat.ucla.edu/~jingyi.li/software-and-data.html .
library(gdata)
### read data sets
## read PhyloExpressionSets
Drerio_PhyloExpressionSet <- read.xls("Supplementary table S3.xls",sheet = 1)
Dmelanogaster_PhyloExpressionSet <- read.xls("Supplementary table S3.xls",sheet = 2)
Athaliana_PhyloExpressionSet <- read.xls("Supplementary table S3.xls",sheet = 3)
## read DivergenceExpressionSets
# Danio rerio
# D. rerio vs. A. mexicanus
Drerio_vs_Amex_DivergenceExpressionSet <- read.xls("Supplementary table S5.xls",sheet = 9)
# D. rerio vs. F. rubripes
Drerio_vs_Frubripes_DivergenceExpressionSet <- read.xls("Supplementary table S5.xls",sheet = 10)
# D. rerio vs. X. maculatus
Drerio_vs_Xmac_DivergenceExpressionSet <- read.xls("Supplementary table S5.xls",sheet = 11)
# D. rerio vs. G. morhua
Drerio_vs_Gmor_DivergenceExpressionSet <- read.xls("Supplementary table S5.xls",sheet = 12)
# Drosophila melanogaster
# D. melanogaster vs. D. simulans
Dmel_Dsim_DivergenceExpressionSet <- read.xls("Supplementary table S5.xls",sheet = 1)
# D. melanogaster vs. D. yakuba
Dmel_Dyak_DivergenceExpressionSet <- read.xls("Supplementary table S5.xls",sheet = 2)
# D. melanogaster vs. D. persimilis
Dmel_Dper_DivergenceExpressionSet <- read.xls("Supplementary table S5.xls",sheet = 3)
# D. melanogaster vs. D. virilis
Dmel_Dvir_DivergenceExpressionSet <- read.xls("Supplementary table S5.xls",sheet = 4)
# Arabidopsis thaliana
# A. thaliana vs A. lyrata
Ath_Aly_DivergenceExpressionSet <- read.xls("Supplementary table S5.xls",sheet = 5)
# A thaliana vs. T. halophila
Ath_Brapa_DivergenceExpressionSet <- read.xls("Supplementary table S5.xls",sheet = 8)
# A thaliana vs. C. rubella
Ath_Crub_DivergenceExpressionSet <- read.xls("Supplementary table S5.xls",sheet = 7)
# A thaliana vs. C. papaya
Ath_Cpapaya_DivergenceExpressionSet <- read.xls("Supplementary table S5.xls",sheet = 6)
First load the following packages into the work space:
library(myTAI)
svg("Fig2.svg",width = 16.9,height = 5)
par(mfrow = c(1,3))
par(mar = c(1.5, 0.5, 0.5, 0.1))
par(mai = c(1.4,0.6,0.5,0.1))
par(mgp = c(8,1,0))
PlotPattern(Drerio_PhyloExpressionSet, TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:18, mid = 19:36, late = 37:40),
shaded.area = TRUE, p.value = TRUE, y.ticks = 5, type = "l", lwd = 6, col = "darkblue",
ylab = "", xlab = "Ontogeny", las = 3, cex.lab = 1.5, cex.axis = 1.5)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("A")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TAI", mgp = c(3,0.5,0), cex.lab = 1.5)
PlotPattern(Dmelanogaster_PhyloExpressionSet, TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:3, mid = 4:5, late = 6:12),
shaded.area = TRUE, p.value = TRUE, y.ticks = 5, type = "l", lwd = 6, col = "magenta",
ylab = "", xlab = "Ontogeny", las = 3, cex.lab = 1.5, cex.axis = 1.5)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("B")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TAI", mgp = c(3,0.5,0), cex.lab = 1.5)
PlotPattern(Athaliana_PhyloExpressionSet, TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:2, mid = 3:5, late = 6:7),
shaded.area = TRUE, p.value = TRUE, y.ticks = 5, type = "l", lwd = 6, col = "darkgreen",
ylab = "", xlab = "Ontogeny", las = 3, cex.lab = 1.5, cex.axis = 1.5)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("C")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TAI", mgp = c(3,0.5,0), cex.lab = 1.5)
dev.off()
svg("Fig3.svg",width = 16.9,height = 5)
par(mfrow = c(1,3))
par(mar = c(1.5, 0.5, 0.5, 0.1))
par(mai = c(1.4,0.6,0.5,0.1))
par(mgp = c(8,1,0))
PlotCorrelation(Drerio_PhyloExpressionSet, Drerio_vs_Amex_DivergenceExpressionSet ,
method = "kendall", main.text = "")
PlotCorrelation(Dmelanogaster_PhyloExpressionSet, Dmel_Dsim_DivergenceExpressionSet,
method = "kendall", main.text = "")
PlotCorrelation(Athaliana_PhyloExpressionSet, Ath_Aly_DivergenceExpressionSet,
method = "kendall", main.text = "")
dev.off()svg("Fig4.svg",width = 16.9,height = 5)
par(mfrow = c(1,3))
par(mar = c(1.5, 0.5, 0.5, 0.1))
par(mai = c(1.4,0.6,0.5,0.1))
par(mgp = c(8,1,0))
PlotPattern(Drerio_vs_Amex_DivergenceExpressionSet[ , 1:42] , TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:18, mid = 19:36, late = 37:40),
shaded.area = TRUE, p.value = FALSE, y.ticks = 5, type = "l", lwd = 6, col = "darkblue",
ylab = "", xlab = "Ontogeny", las = 3, cex.lab = 1.5, cex.axis = 1.5)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("A")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TDI", mgp = c(3,0.5,0), cex.lab = 1.5)
cat(paste0("Drerio_vs_Frubripes_DivergenceExpressionSet : ",nrow(Drerio_vs_Frubripes_DivergenceExpressionSet), " genes.")
cat("\n")
PlotPattern(Dmel_Dsim_DivergenceExpressionSet, TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:3, mid = 4:5, late = 6:12),
shaded.area = TRUE, p.value = TRUE, y.ticks = 5, type = "l", lwd = 6, col = "magenta",
ylab = "", xlab = "Ontogeny", las = 3, cex.lab = 1.5, cex.axis = 1.5)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("B")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TDI", mgp = c(3,0.5,0), cex.lab = 1.5)
cat(paste0("Dmel_Dsim_DivergenceExpressionSet : ",nrow(Dmel_Dsim_DivergenceExpressionSet), " genes.")
cat("\n")
PlotPattern(Ath_Aly_DivergenceExpressionSet, TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:2, mid = 3:5, late = 6:7),
shaded.area = TRUE, p.value = TRUE, y.ticks = 5, type = "l", lwd = 6, col = "darkgreen",
ylab = "", xlab = "Ontogeny", las = 3, cex.lab = 1.5, cex.axis = 1.5)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("C")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TDI", mgp = c(3,0.5,0), cex.lab = 1.5)
cat(paste0("Ath_Aly_DivergenceExpressionSet : ",nrow(Ath_Aly_DivergenceExpressionSet), " genes.")
cat("\n")
dev.off()
svg("S2.svg",width = 16.9,height = 5)
par(mfrow = c(1,3))
par(mar = c(1.5, 0.5, 0.5, 0.5))
par(mai = c(1.4,0.6,0.5,0.5))
par(mgp = c(5,1,0))
PlotPattern(Drerio_vs_Frubripes_DivergenceExpressionSet[ , 1:42] , TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:18, mid = 19:36, late = 37:40),
shaded.area = TRUE, p.value = TRUE, y.ticks = 5, type = "l", lwd = 6, col = "darkblue",
ylab = "", xlab = "Ontogeny", main = "D. rerio vs T. rubripes", las = 3, cex.lab = 1.5, cex.axis = 1)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("A")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TDI", mgp = c(2,0.5,0), cex.lab = 1.5)
cat(paste0("Drerio_vs_Amex_DivergenceExpressionSet : ",nrow(Drerio_vs_Amex_DivergenceExpressionSet), " genes."))
cat("\n")
PlotPattern(Drerio_vs_Xmac_DivergenceExpressionSet[ , 1:42] , TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:18, mid = 19:36, late = 37:40),
shaded.area = TRUE, p.value = TRUE, y.ticks = 5, type = "l", lwd = 6, col = "darkblue",
ylab = "", xlab = "Ontogeny", main = "D. rerio vs X. maculatus", las = 3, cex.lab = 1.5, cex.axis = 1)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("B")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TDI", mgp = c(2,0.5,0), cex.lab = 1.5)
cat(paste0("Drerio_vs_Xmac_DivergenceExpressionSet : ",nrow(Drerio_vs_Xmac_DivergenceExpressionSet), " genes."))
cat("\n")
PlotPattern(Drerio_vs_Gmor_DivergenceExpressionSet[ , 1:42] , TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:18, mid = 19:36, late = 37:40),
shaded.area = TRUE, p.value = TRUE, y.ticks = 5, type = "l", lwd = 6, col = "darkblue",
ylab = "", xlab = "Ontogeny", main = "D. rerio vs G. morhua", las = 3, cex.lab = 1.4, cex.axis = 1)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("D")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TDI", mgp = c(2,0.5,0), cex.lab = 1.5)
cat(paste0("Drerio_vs_Gmor_DivergenceExpressionSet : ",nrow(Drerio_vs_Gmor_DivergenceExpressionSet), " genes."))
dev.off()
svg("S3.svg",width = 16.9,height = 5)
par(mfrow = c(1,3))
par(mar = c(1.5, 0.5, 0.5, 0.1))
par(mai = c(1.4,0.6,0.5,0.1))
par(mgp = c(8,1,0))
PlotPattern(Dmel_Dyakuba_DivergenceExpressionSet[ , 1:14] , TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:3, mid = 4:5, late = 6:12),
shaded.area = TRUE, p.value = TRUE, y.ticks = 5, type = "l", lwd = 6, col = "magenta",
ylab = "", xlab = "Ontogeny", main = "D. melanogaster vs D. yakuba",las = 3, cex.lab = 1.5, cex.axis = 1.5)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("A")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TDI", mgp = c(3,0.5,0), cex.lab = 1.5)
cat(paste0("Dmel_Dyakuba_DivergenceExpressionSet : ",nrow(Dmel_Dyakuba_DivergenceExpressionSet), " genes."))
cat("\n")
PlotPattern(Dmel_Dper_DivergenceExpressionSet[ , 1:14] , TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:3, mid = 4:5, late = 6:12),
shaded.area = TRUE, p.value = TRUE, y.ticks = 5, type = "l", lwd = 6, col = "magenta",
ylab = "", xlab = "Ontogeny", main = "D. melanogaster vs D. persimilis", las = 3, cex.lab = 1.5, cex.axis = 1.5)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("B")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TDI", mgp = c(3,0.5,0), cex.lab = 1.5)
cat(paste0("Dmel_Dper_DivergenceExpressionSet : ",nrow(Dmel_Dper_DivergenceExpressionSet), " genes."))
cat("\n")
PlotPattern(Dmel_Dvir_DivergenceExpressionSet[ , 1:14] , TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:3, mid = 4:5, late = 6:12),
shaded.area = TRUE, p.value = TRUE, y.ticks = 5, type = "l", lwd = 6, col = "magenta",
ylab = "", xlab = "Ontogeny", main = "D. melanogaster vs D. virilis", las = 3, cex.lab = 1.5, cex.axis = 1.5)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("C")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TDI", mgp = c(3,0.5,0), cex.lab = 1.5)
cat(paste0("Dmel_Dvir_DivergenceExpressionSet : ",nrow(Dmel_Dvir_DivergenceExpressionSet), " genes."))
cat("\n")
dev.off()svg("S4.svg",width = 16.9,height = 5)
par(mfrow = c(1,3))
par(mar = c(1.5, 0.5, 0.5, 0.1))
par(mai = c(1.4,0.6,0.5,0.1))
par(mgp = c(8,1,0))
PlotPattern(Ath_Crub_DivergenceExpressionSet , TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:2, mid = 3:5, late = 6:7),
shaded.area = TRUE, p.value = TRUE, y.ticks = 5, type = "l", lwd = 6, col = "darkgreen",
ylab = "", xlab = "Ontogeny", main = "A. thaliana vs C. rubella", las = 3, cex.lab = 1.5, cex.axis = 1.5)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("A")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TDI", mgp = c(3,0.5,0), cex.lab = 1.5)
cat(paste0("Ath_Crub_DivergenceExpressionSet : ",nrow(Ath_Crub_DivergenceExpressionSet), " genes."))
cat("\n")
PlotPattern(Ath_Brapa_DivergenceExpressionSet , TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:2, mid = 3:5, late = 6:7),
shaded.area = TRUE, p.value = TRUE, y.ticks = 5, type = "l", lwd = 6, col = "darkgreen",
ylab = "", xlab = "Ontogeny", main = "A. thaliana vs B. rapa", las = 3, cex.lab = 1.5, cex.axis = 1.5)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("B")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TDI", mgp = c(3,0.5,0), cex.lab = 1.5)
cat(paste0("Ath_Brapa_DivergenceExpressionSet : ",nrow(Ath_Brapa_DivergenceExpressionSet), " genes."))
cat("\n")
PlotPattern(Ath_Cpapaya_DivergenceExpressionSet , TestStatistic = "ReductiveHourglassTest",
permutations = 10000, modules = list(early = 1:2, mid = 3:5, late = 6:7),
shaded.area = TRUE, p.value = TRUE, y.ticks = 5, type = "l", lwd = 6, col = "darkgreen",
ylab = "", xlab = "Ontogeny", main = "A. thaliana vs C. papaya", las = 3, cex.lab = 1.5, cex.axis = 1.5)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("C")),bty = "n",cex = 1.5,inset = c(-0.08,-0.15))
box()
title(ylab = "TDI", mgp = c(3,0.5,0), cex.lab = 1.5)
cat(paste0("Ath_Cpapaya_DivergenceExpressionSet : ",nrow(Ath_Cpapaya_DivergenceExpressionSet), " genes."))
cat("\n")
dev.off()
svg("S5.svg",width = 16.9,height = 5)
par(mfrow = c(1,3))
par(mar = c(1.5, 0.5, 0.5, 0.1))
par(mai = c(1.4,0.6,0.5,0.1))
par(mgp = c(3,1,0))
PlotCorrelation(Drerio_PhyloExpressionSet, Drerio_vs_Frubripes_DivergenceExpressionSet ,
method = "kendall", main.text = "",cex.axis = 1.7,cex.lab = 1.7,xlab = "Phylostratum",
ylab = "Divergencestratum",cex.main = 1.7)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("A")),bty = "n",cex = 1.7,inset = c(-0.08,-0.15))
box()
PlotCorrelation(Drerio_PhyloExpressionSet, Drerio_vs_Xmac_DivergenceExpressionSet,
method = "kendall", main.text = "",cex.axis = 1.7,cex.lab = 1.7,xlab = "Phylostratum",
ylab = "Divergencestratum",cex.main = 1.7)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("B")),bty = "n",cex = 1.7,inset = c(-0.08,-0.15))
box()
PlotCorrelation(Drerio_PhyloExpressionSet, Drerio_vs_Gmor_DivergenceExpressionSet,
method = "kendall", main.text = "",cex.axis = 1.7,cex.lab = 1.7,xlab = "Phylostratum",
ylab = "Divergencestratum",cex.main = 1.7)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("C")),bty = "n",cex = 1.7,inset = c(-0.08,-0.15))
box()
dev.off()
svg("S6.svg",width = 16.9,height = 5)
par(mfrow = c(1,3))
par(mar = c(1.5, 0.5, 0.5, 0.1))
par(mai = c(1.4,0.6,0.5,0.1))
par(mgp = c(3,1,0))
PlotCorrelation(Dmelanogaster_PhyloExpressionSet, Dmel_Dyakuba_DivergenceExpressionSet ,
method = "kendall", main.text = "",cex.axis = 1.7,cex.lab = 1.7,xlab = "Phylostratum",
ylab = "Divergencestratum",cex.main = 1.7)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("A")),bty = "n",cex = 1.7,inset = c(-0.08,-0.15))
box()
PlotCorrelation(Dmelanogaster_PhyloExpressionSet, Dmel_Dper_DivergenceExpressionSet,
method = "kendall", main.text = "",cex.axis = 1.7,cex.lab = 1.7,xlab = "Phylostratum",
ylab = "Divergencestratum",cex.main = 1.7)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("B")),bty = "n",cex = 1.7,inset = c(-0.08,-0.15))
box()
PlotCorrelation(Dmelanogaster_PhyloExpressionSet, Dmel_Dvir_DivergenceExpressionSet,
method = "kendall", main.text = "",cex.axis = 1.7,cex.lab = 1.7,xlab = "Phylostratum",
ylab = "Divergencestratum",cex.main = 1.7)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("C")),bty = "n",cex = 1.7,inset = c(-0.08,-0.15))
box()
dev.off()
svg("S7.svg",width = 16.9,height = 5)
par(mfrow = c(1,3))
par(mar = c(1.5, 0.5, 0.5, 0.1))
par(mai = c(1.4,0.6,0.5,0.1))
par(mgp = c(3,1,0))
PlotCorrelation(Athaliana_PhyloExpressionSet, Ath_Crub_DivergenceExpressionSet ,
method = "kendall", main.text = "",cex.axis = 1.7,cex.lab = 1.7,xlab = "Phylostratum",
ylab = "Divergencestratum",cex.main = 1.7)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("A")),bty = "n",cex = 1.7,inset = c(-0.08,-0.15))
box()
PlotCorrelation(Athaliana_PhyloExpressionSet, Ath_Brapa_DivergenceExpressionSet,
method = "kendall", main.text = "",cex.axis = 1.7,cex.lab = 1.7,xlab = "Phylostratum",
ylab = "Divergencestratum",cex.main = 1.7)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("B")),bty = "n",cex = 1.7,inset = c(-0.08,-0.15))
box()
PlotCorrelation(Athaliana_PhyloExpressionSet, Ath_Cpapaya_DivergenceExpressionSet,
method = "kendall", main.text = "",cex.axis = 1.7,cex.lab = 1.7,xlab = "Phylostratum",
ylab = "Divergencestratum",cex.main = 1.7)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("C")),bty = "n",cex = 1.7,inset = c(-0.08,-0.15))
box()
dev.off()
# read embryo defective genes
Ath <- read.csv("EmbryoDefective_Expression.csv", sep = ";", header = TRUE)
Ath_MeanProfile <- colMeans(Ath[ , 2:8])
std_error <- function(x){sd(x)/sqrt(length(x))}
svg("S8A.svg",width = 8,height = 6)
plot(Ath_MeanProfile,
type = "l", lwd = 6,cex.axis = 1.3, cex.lab = 1.3, ylim = c(2600,4600),
xaxt = "n",xlab = "Ontogeny", ylab = "Expression Level")
axis(1, 1:7, names(Ath[ , 2:8]),cex.axis = 1.3, cex.lab = 1.3)
lines(Ath_MeanProfile + apply(Ath[ , 2:8],2,std_error), col = "lightgrey", lwd = 6)
lines(Ath_MeanProfile - apply(Ath[ , 2:8],2,std_error), col = "lightgrey", lwd = 6)
legend("bottom",legend = c("Mean EDG Expression","std. error"), fill = c("black","lightgrey"), bty = "n", cex = 2)
par(xpd = TRUE)
legend("topleft",legend = expression(bold("A")),bty = "n",cex = 2,inset = c(-0.08,-0.15))
dev.off()
Performing Statistical Tests
# install.packages("dunn.test")
library(dunn.test)
# perform a Kruskal-Wallis Rank Sum Test
kruskal.test(Ath[ , 2:8])
# perform Dunn's test of multiple comparisons using rank sums
dunn.test(Ath[ , 2:8], method = "bh")
svg("S8.svg",width = 8,height = 5)
PlotPattern(Drerio_vs_Amex_DivergenceExpressionSet, TestStatistic = "ReductiveHourglassTest",
modules = list(early = 1:18, mid = 19:36, late = 37:61), shaded.area = TRUE,
type = "l", lwd = 9,xlab = "Ontogeny", ylab = "TDI", main = "D. rerio vs A. mexicanus")
dev.off()This figure shall illustrate how the test statistic for the Reductive Hourglass Test is build.
Here we only show an example for Athaliana_PhyloExpressionSet, but the same procedure can analogously be
applied to all other PhyloExpressionSets and DivergenceExpressionSets.
library(myTAI)
rhScore_real <- rhScore(TAI(Athaliana_PhyloExpressionSet),
method = "min",scoringMethod = "mean-mean",
early = 1:2, mid = 3:5, late = 6:7)
rhScores <- apply(bootMatrix(Athaliana_PhyloExpressionSet, 10000), 1 , rhScore,
method = "min",scoringMethod = "mean-mean",
early = 1:2, mid = 3:5, late = 6:7)
norm_MME <- fitdistrplus::fitdist(rhScores,distr = "norm",method = "mme")
# estimate mean:
mean_est <- norm_MME$estimate[1]
# estimate sd:
sd_est <- norm_MME$estimate[2]
svg("S9.svg",width = 9,height = 5)
norm_distr <- function(x){ return(dnorm(x = x,mean = mean_est,sd = sd_est)) }
# plot the density function and the histogram of rhScores
curve(norm_distr,
xlim = c(min(rhScores),max(c(rhScores,rhScore_real))),
col = "steelblue",lwd = 5,xlab = "scores", ylab="Frequency")
# plot the histogram of rhScores
hist(rhScores,prob = TRUE,add = TRUE, breaks = 100,main = "10,000 permutations")
rug(rhScores)
# plot a red line at the position where we can find the real variance
abline(v = rhScore_real, lwd = 5, col = "red")
dev.off()