From OpenWetWare
Jump to navigationJump to search

About        Projects        Publications        PersonalGenomes@Home        Public Data        FAQ        Updates (12/23)


With the anticipated deluge of data from various personal genome projects, all drawing from a large human population, it will be be practical for a single institution or a company to deal with all the data at the pre-consumption level. One of the models that Personal Genome Project (PGP) has embraced is the open source model, in which various interested parties, whether they be interested individuals, academic labs, or for profit entities, contribute to laying down an integrated and highly networked infrastructure for independently collecting, analyzing, and distributing genomics data to end-users (i.e. patients, researchers, pharmaceutical companies).

While the main aim of PGP is not to dictate or advocate a specific model of open and limited genomics information distribution to patients, researchers and biotech industries, we are starting to explorer multiple ways for different groups and entities, both non-profit and for-profit, to collaborate on incentivizing the release, the sharing and the consumption of such data. Rather than merely being a portal to genomic sequences and functional genomics data, PGP promotes such collaborative efforts in order to promote new research, new business models, new ways of consuming genomics and health data and new scientific and computational technologies.

GenomeQuest ( has established itself as one of the leading commercial solutions for bioinformatics, especially in the field of Next Gen sequencing. As PGP scales up to beyond 10 volunteers, it will be important to find out how various established and commercial entities such as GenomeQuest can contribute to the laying down the genomics information highway among researchers and beyond. While the contribution from the open source community will be invaluable, a highly integrated and targeted use of the information derived from PGP and other initiatives like it may also be more pertinent for application-oriented biotech industry and medicine.

The PGP group, headed by John Aach and assisted by Jay Lee, Sasha Wait and Jason Bobe, and the GQ group, Phil Robidoux and Richard Resnick, met in Novermeber 2008 for an introductory meeting. It was followed by several conversations regarding the present need and the anticipated needs/capablities of both PGP and GQ. PGP had released 8 out 10 preliminary exome data to public in October 2008, and PGP was preparing methods for releasing the rest of its data to the scientific community and to public in early 2009. In order to see how the GQ group could contribute, the PGP group and the GQ group initiated a pilot study looking at a subset of PGP volunteers.



  1. Assess the GQ platform and compare it to PGP-generated sequence assemblies and annotations using publicly released PGP exome data.
  2. Assess the GQ platform for scalability and data management solutions pending Step 1.
  3. Assess the GQ platform for other distribution methods for complementary high-level genomics information, including software and web tools development pending Step 2.

Pilot Study

The purpose of this pilot is to demonstrate the feasibility of a working relationship between the PGP team and the GQ team, and to demonstrate the ability of the GQ platform. It is encouraged that the participants interested in this pilot interact through OpenWetWare wiki, in order to communicate with the entire teams and also to log our progress. With this first pilot, GQ and PGP groups will work out the best combination of sensitivity and specificity parameters to be used in the alignment, assembly and SNP detection. For this, we will focus on PGP2 only, and extend to the rest of PGP10 later as the PGP data release pipeline becomes finalized.

  1. PGP Deliverables: Illumina files containing the sequence information, and PGP MAQ pipeline-generated results are here for download. (here).
  2. Publications on exon capture and supplementary information regarding sequencing library preparation, oligo sequences and analysis methods. Nature Methods 2007, Supplementary Information, Probe Table