Moore Notes 11 10 14

From OpenWetWare

Group Call

  • Participants: Katie, Jonathan, Tom, Ladan, Sarah, Stacia, Patrick, Josh, Stephen
  • Eukaryote markers
    • Guillaume collecting model data sets (e.g., Holly Bik's 18S + metagenomic on same samples)
    • Comparison of different marker sets that we already have (they largely do not overlap)
    • Stephen's MicrobeCensus works pretty well with the 30 markers we already have
    • JE following up with Jason Stajich
  • Shotmap
    • Draft polished except for MetaHit section (next to Katie, then Jonathan)
    • Tom and Stephen working on finalizing figures
  • Novel gene families (SFAMs)
    • Phylogenetically diverse
    • Multiple members
    • Stephen's quality filtering
      • Need to check for ncRNAs
    • No PFAM-A annotation
      • Other annotations: PFAM-B, other annotations in tables from UniProt
    • Similar analysis was done for GEBA (at end did structural threading to see if similar to known structures)
      • Many submitted to structural genomics initiative
    • JE: check for conserved sequences that are not real proteins (perhaps annotated as a protein)
      • Relates to ncRNA issue
      • Automated annotation of proteins in genomes is making this worse now (especially in high GC genomes, since fewer stops)
      • Can even be deeply conserved across domains (e.g., reverse strand of protein gyrase B)
    • Tom: see if they share an edge with an annotated SFAM
    • Check if they are:
      • Expressed (will get rid of the fake genes on opposite strands)
      • Only in very high GC genomes
      • High non-synonymous substitution rate (vs. synonymous) or conservative vs. non-conservative on protein sequence
    • Also looking at presence in metagenomes
      • Tom: also looking at co-occurence across environments for annotated vs. unannotated
  • exRNA project
    • microbial RNA in human RNA-seq samples
    • 18S hits (despite ribosome depletion - maybe probes were human specific)
      • Conserved region (subsequence), but full length best hit is pseudomonas syringae
    • Tom: as in KhoeSan project, look for poorly QC-ed host reads
    • Tom: beware of human DNA in assembled bacterial genomes
    • How to get ALL genomes?
      • IMG - but no bulk download
      • PATRIC - bacterial only
  • Next calls:
    • Dec 1 (Patrick)
    • Dec 15 (Tom)