Our lab is focused on the design and application of bioinformatics algorithms to elucidate global epigenetic mechanisms in normal development and diseases such as cancer. Our areas of expertise include 1) DNA methylation using Bisulfite-seq; 2) Epigenetic regulation using ChIP-seq; 3) Alternative Polyadenylation (APA); 4) Non-coding RNA; 5) Nucleosome organization using MNase-seq. Since establishing our own bioinformatics lab in early 2008, we have (as of December 2018)
- Published 131 peer-reviewed papers through solid methodology development and extensive collaboration research, including 20 senior-author papers in Nature and Cell series.
- Been well-funded with total external funding >$1.0 million per year, including 4 PI grants from NIH and Texas CPRIT: NIH R01HG007538 (2013-2019), R01CA193466 (2015-2020), R01 CA228140 (2019-2024) and U54CA217297 (2017-2022).
- Mentored the first 7 postdoc trainees to start their tenure track faculty positions in prestigious research institutions in the US (6) and China (1).
1) DNA methylation using Bisulfite-seq. Our lab developed some of the earliest and most widely used bioinformatics software to analyze whole genome bisulfite sequencing (WGBS) data, including the first WGBS mapping program BSMAP (>500 citations) and MOABS for model-based differential methylation analysis. Furthermore, we are among the first to report the sparse conserved single under-methylated CpG (scUMC) and DNA Methylation Canyon as two novel epigenetic features in the genome. Recently, our pan-cancer analysis of WGBS data followed by dCas9-mediated methylation editing, reveals gene-body canyon hyper-methylation as a novel epigenetic mechanism for oncogene activation. In collaboration with Goodell lab at Baylor, we are among the first to study de novo DNA methyltransferase 3A (Dnmt3a) using WGBS in normal and malignant hematopoietic stem cells (HSCs).
2) Epigenetic Regulation using ChIP-seq. Our lab developed some of the most widely cited bioinformatics methods to analyze ChIP-seq data, including MACS for Model-based Analysis of ChIP-seq (>5,800 citations) and MACE for Model based Analysis of ChIP-exo with single nucleotide resolution. Using MACS, we recently discovered Broad H3K4me3 (wider than 4 kb) as a novel epigenetic signature for tumor suppressor genes, such as TP53 and PTEN. In collaboration with several experimental biologists, we used ChIP-seq to gain novel biological insights into the genome-wide functions of several important epigenetic enzymes, including AR, Atoh1, FoxA1, NSD2, SIRT7, ZMYND11, YEATS, MeCP2, ZMYND8, SIRT6 and ENL.
3) Alternative Polyadenylation (APA). We developed the first bioinformatics algorithm DaPars for Dynamic Analyses of Alternative Polyadenylation directly from the widely-used RNA-Seq. In collaboration with Eric Wagner, we used DaPars to identify CFIm25, a master APA regulator, as a glioblastoma (GBM) tumor suppressor. Furthermore, our recent re-analysis of TCGA breast cancer data suggests that the major role of 3ʹ-UTR shortening in tumorigenesis is to repress tumor suppressor genes in trans by disrupting competing-endogenous RNA (ceRNA) crosstalk.
4) Non-coding RNAs using RNA-seq. Our lab developed some of the most widely used bioinformatics methods to analyze such RNA-seq data, including RNA-seq quality control program RseQC (>600 citations) and the first alignment-free coding-potential assessment tool CPAT (>500 citations). In collaboration with several experimental biologists, we reported hundreds of non-coding RNAs that are important for hematopoietic stem cells (HSCs) self-renewal and lineage commitment, and found a high percentage of sequence reads in introns, leading to loss of function through nonsense-mediated decay in castration-resistant prostate cancer (CRPC) bone marrow biopsy specimens.
5) Nucleosome Organization using MNase-seq. Our lab developed a novel bioinformatics pipeline, DANPOS, to aid in better understanding how the nucleosome is removed to allow transcription in different environmental conditions. We used DANPOS to study nucleosome dynamics in various cellular functions and disease processes, such as nucleosome fragility, embryonic stem cell (ESC) differentiation, aging, and promoter nucleosomes in previous reported nucleosome-free regions.