ISEEM Progress March 2010

From OpenWetWare
Jump to navigationJump to search


Home Project People News For Team Calendar Library

March 2010 Progress Report (Abbreviated format)

Eisen Lab


  • Taxon-specific marker gene project
    • Established a protocol to automatically identify phylogenetic marker candidates for different taxa
      • Generated gene families using BLAST and MCL clustering algorithms
      • Built phylogenetic trees for each taxon
      • Evaluated universality and evenness
      • Built HMM profiles for each taxon
    • Applied protocol to build HMM profiles for 5189 families
    • Identified 62 gene markers for archaea and 324 gene markers for subsets of bacteria (covering at least 4 phyla)
    • Assessed ~300 marker candidates for phylogenomic, metagenomic studies.
      • distribution of marker across all sequenced genomes
      • phylogenetic tree topology


  • BioTorrents project
    • In response to reviewers comments on manuscript, several new features were added to the BioTorrents website.
      • License classification for each torrent
      • Dynamic RSS feeds for all search results
      • Linking of different versions of software and data sets
    • Manuscript was revised and accepted for publication in PLoS One


  • Protein database
    • Modified AMPHORA analysis
      • Wrote a script to conduct a modified AMPHORA analysis using a newer version of HMMer (rc1), Zorro (masking), and multiple threads.
      • Ran script to perform AMPHORA-like analysis on 'GOS all peptide' data using 100 Archaeal and Bacterial marker genes from Dongying's work.
      • Investigated unexpected results.
    • BLAST analysis
      • Ran (and still running) an all versus all Blast on the IMG database for the sequences that did not match one of the 100 marker gene families
  • Looked through the old and new (beta 2.0) versions of CAMERA to get familiar with their interface and workflow tools.


  • In vitro simulation project
    • Developed responses to reviewers (with Jenna Morgan) comments on the paper
    • Figure and file formatting issues
  • BEAST recombination project
    • Worked on BEAST models of recombination 9with Erik Bloomquist) to understand the extent to which AMPHORA protein markers are vertically inherited
    • Connected with graph visualization experts at UC Davis to identify nice ways of exploring the results
  • GPU programming
    • Worked on GPU phylogenetics computing library (with Marc Suchard) to make Bayesian and ML phylogenetics on amino acid and codon models speedier

Green Lab


  • Phylogenetic diversity analysis
    • Analyzed phylogenetic diversity in GOS data set based on additional bacterial and archaeal marker genes
  • Picante method/software
    • Manuscript accepted for publication in Bioinformatics pending minor revision (revising manuscript)
    • Added tutorials to package/website in response to reviewer/user comments
  • Predicting trait values from metagenomic data
    • Developed tools to use phylogenetic information to predict trait values for metagenomic reads given reference data set
    • Applying method to predict 16S copy number for OTUs (as proof of concept)


  • Field theory project
    • Adapted our theoretical framework to take into account temperature variation, as a prototype for more general environmental variation.
  • OTU pipeline project
    • Re-running the clustering step of the pipeline with different clustering methods to gauge the impact of the algorithm on the OTUs produced.
  • Range estimation project
    • Reviewed the mathematics literature on Stochastic Loewner evolution---this is a theory of 'range' boundaries on a two-dimensional landscape, which fluctuate over time. (May be a dual way to characterize neutral, or niche-based, community assembly by looking at boundaries of species ranges.)
    • Considered possible advantages of this approach over our current model, which looks at where each individual is on a landscape.
    • Aim to link this work with Josh and Katie's approach.

Pollard Lab


  • OTU pipeline project
    • Led efforts to conduct a rigorous statistical assessment of the pipeline in collaboration with Sam and Josh
    • Developed software adaptors for OTU pipeline to enable analysis of 25 simulation data sets generated by Sam
    • Compared between-read (phylogeny-based) distances generated by our pipeline on different datasets. Identified a significant positive correlation between simulation and control data.
    • Tested various sources of OTU identification error, including sequence length and sequence conservation (via alignment position).
    • Submitted an abstract on the OTU pipeline project to the International Society of Microbial Ecology's annual meeting.


  • OTU pipeline project
    • Simulated datasets of 16S rRNA reads for testing the performance of the OTU pipeline.
    • Wrote programs in R implementing metrics of tree and distance matrix similarity to compare phylogenies with full-length rRNA sequences versus shorter metagenomic reads from these sequences. Experimented with several normalization factors for different distance measures.
    • Plotted some per-read measures of error to enable a more fine-grained analysis of sources of error in the pipeline.
    • Analyzed pipeline performance to see what kinds of measures of correction may be useful additions to the algorithm.
  • Assessment of phylogenetic methods on protein marker genes
    • Reviewed initial analyses of simulated data sets for protein marker genes.
    • Redesigned simulation parameter settings for a larger, more complete data set. Data is in the process of being generated.


  • OTU pipeline project
    • Developed a method, based on false positive and false negative rates, to assess the reliability of the clustering method used in the pipeline. (Errors in clustering happen when pairs of reads are incorrectly grouped together or incorrectly separated from each other.)
  • Range estimation project
    • Developed mathematical proofs to show that our estimators of range shapes are unbiased. This theoretical work complements numerical simulations that suggest that the estimates are not biased.
    • Submitted abstracts about the range estimation project to the annual meetings of the Ecological Society of America and International Society of Microbial Ecology.

Wu Lab


  • Ecotype Simulation project
    • Performed analysis on Pelagibacter 16S reads from GOS dataset. The Clade Sequence Diversity curve for this data was much more typical of those generated in prior analyses of Bacillus and other taxa using PCR data.
    • Generated preliminary estimates of (i) the number of ecotypes, (ii) rates of ecotype formation, and (iii) periodic selection in the Pelagibacter taxon based on 16S data.
    • Explored modifications of Ecotype Simulation to allow for effective analysis of distantly related taxa without changing to using highly conserved genes.