My research interest lies in utilizing large-scale genomics technologies, and computational and statistical tools to systematically study medical and population genetics/genomics. In medical genetics, I am particularly interested in understanding the genetic etiology of common complex human diseases. In population genetics, I am interested in studying the human evolutionary history indicated by genetic signatures in the human genome.
Development of 1000Genomes data processing tools
Reliable identification of genetic variants in re-sequencing data is the essential goal of the 1000 Genomes Project and is particular crucial in the exon-region sequencing endeavor. In the full-scale 1,000 Genomes Project, it is planned that the exonic regions will be sequenced at a much higher coverage than the rest of the genome. This presents a unique opportunity and challenge for bioinformatics pipeline development, as there is yet no software designed specifically for processing high coverage targeted sequencing data. Building on our extensive experience in analyzing the high coverage data from the 1000 Genomes Pilot 3 project, we aim to develop an integrated data processing pipeline and to develop a set of metrics in order to identify genomic variations for downstream analysis.
Medical genetics of common complex diseases
Common complex diseases such as cardiovascular disease, cerebrovascular disease, cancer, and diabetes account for most of the mortalities and morbidities in modern societies. Studies suggested strong influence of genetic variants in disease susceptibilities. Recent advances by large-scale association studies have uncovered many underlying predisposed regions for most of the common diseases. These efforts were paving the way to precisely pinpoint the causal genetic variants and understand the pathogenesis of complex diseases.
Population genomics—Signature of recent positive selection in the human genome
In modern terms, natural selection operates on genetic variations, which provide both evidences to support the mechanism of natural selection and the materials for it to act upon. The selection pressure interacts with individual phenotypes, but ultimately the objects of selection exist within the DNA variations.
HapMap 3 global in description of human variation
..."This map provides an important tool for future genome-wide association studies of diseases that allows scientists to look for both common and rare variations that may be associated with disease or response to drugs," said Dr. Fuli Yu, assistant professor in the Baylor College of Medicine Human Genome Sequencing Center. The Center played a major role in the sequencing studies that are cornerstones of the report that appears in the current issue of the journal Nature...
1000 Genomes project releases pilot data
HOUSTON -- (June 21, 2010) -- The completion of three pilot projects designed to determine how best to build an extremely detailed map of human genetic variation begins a new chapter in the international project called 1,000 Genomes, said the director of the Baylor College of Medicine Human Genome Sequencing Center, a major contributor to the effort.
"Mapping all the shared normal variation in human populations is a critical step to interpreting medically actionable genetic changes," said Dr. Richard Gibbs, also a professor in the department of molecular and human genetics at BCM.
... "We also developed new methods to target variation in genes, and showed that this approach gave maximum information about this important class of human variation", said Dr. Fuli Yu, an assistant professor in the BCM Human Genome Sequencing Center and coordinator of the study. ...
This Week in Genome Research December 23, 2009
... Meanwhile, a group of researchers from the Baylor College of Medicine, Rice University, and Washington University report that they have come up with a way to sift through large amounts of high-throughput re-sequencing data and pick out genetic variants without getting duped by sequencing errors. Their computational tool — called Atlas-SNP2 — takes into account sequence context in training datasets to help distinguish between errors and authentic SNPs with a less than 10 percent false-positive error rate and a false-negative error rate of five percent or so. ...