Moore Notes 11 10 14
From OpenWetWare
Jump to navigationJump to search
Group Call
- Participants: Katie, Jonathan, Tom, Ladan, Sarah, Stacia, Patrick, Josh, Stephen
- Eukaryote markers
- Guillaume collecting model data sets (e.g., Holly Bik's 18S + metagenomic on same samples)
- Comparison of different marker sets that we already have (they largely do not overlap)
- Stephen's MicrobeCensus works pretty well with the 30 markers we already have
- JE following up with Jason Stajich
- Shotmap
- Draft polished except for MetaHit section (next to Katie, then Jonathan)
- Tom and Stephen working on finalizing figures
- Novel gene families (SFAMs)
- Phylogenetically diverse
- Multiple members
- Stephen's quality filtering
- Need to check for ncRNAs
- No PFAM-A annotation
- Other annotations: PFAM-B, other annotations in tables from UniProt
- Similar analysis was done for GEBA (at end did structural threading to see if similar to known structures)
- Many submitted to structural genomics initiative
- JE: check for conserved sequences that are not real proteins (perhaps annotated as a protein)
- Relates to ncRNA issue
- Automated annotation of proteins in genomes is making this worse now (especially in high GC genomes, since fewer stops)
- Can even be deeply conserved across domains (e.g., reverse strand of protein gyrase B)
- Tom: see if they share an edge with an annotated SFAM
- Check if they are:
- Expressed (will get rid of the fake genes on opposite strands)
- Only in very high GC genomes
- High non-synonymous substitution rate (vs. synonymous) or conservative vs. non-conservative on protein sequence
- Also looking at presence in metagenomes
- Tom: also looking at co-occurence across environments for annotated vs. unannotated
- exRNA project
- microbial RNA in human RNA-seq samples
- 18S hits (despite ribosome depletion - maybe probes were human specific)
- Conserved region (subsequence), but full length best hit is pseudomonas syringae
- Tom: as in KhoeSan project, look for poorly QC-ed host reads
- Try prinseq (avg score of 20+, trimmed at bases that fall below 25)
- https://github.com/sharpton/meta-qc
- Tom: beware of human DNA in assembled bacterial genomes
- How to get ALL genomes?
- IMG - but no bulk download
- PATRIC - bacterial only
- Next calls:
- Dec 1 (Patrick)
- Dec 15 (Tom)