Moore Notes 12 4 13
From OpenWetWare
Jump to navigationJump to search
Group Call
- Participants: Katie, Jonathan, Tom, Stephen, Guillaume, Dongying, Ladan, Patrick
- Jonathan: Microbial ecology in N Atlantic
- Healy Hamilton (ex-Berkeley/Cal Academy)
- Now at foundation that specializes in biodiversity databases
- NOAA call for proposals for biodiversity databases in N Atlantic
- She will try to convince NOAA to collect microbial samples on the N Atlantic ships
- Tom/Guillaume: Talked to Morgan about PICRUSTs
- Thinks that PICRUSTs failures are probably due to reference database gaps
- Compare Shotmap vs. MG-RAST (SEED converted to KEGG) to confirm that Shotmap abundances agree with MG-RAST
- Will collapse KEGG OGs into pathways
- Will also check the software version
- Josh: Any news on Tara Oceans?
- Stephen: Relative abundance of plasmids across samples
- Could they explain differences in average genome coverage?
- JE: Yes, because copy number of plasmids can be very high (varies a lot)
- JE: Very hard to identify plasmids, though there are a couple gene markers
- Chromosome partitioning factors for plasmids (parA, parB) tend to be phylogenetically clustered, so can be identified
- In some cases, there are genes that consistently occur on plasmids
- Some people look for assemblies that circularize (need to be high coverage)
- SN: What about mapping reads to plasmids in IMG?
- JE: Thinks HMP did this
- JE: Similar to problem of finding free phage DNA
- Could they explain differences in average genome coverage?
- Patrick: Correcting functional signals for taxonomic abundance
- To what extent are differences in gene family abundance driven by taxonomic variation?
- JE: PICRUSTs gives the expected distribution of functional genes for a taxonomic arrangement
- Ladan: slides
- Extreme set clustering of halophile protein families
- Predicting GO annotations of genes based on clustering with annotated genes
- Comparison of extreme sets vs. MCL, with two different rules for assigning GO to nodes within a cluster
- How to determine precision and recall?
- KP: Treat each annotation of a GO term to a gene (yes vs. no) as an independent test
- TS/KP: Could partition genes based on characteristics (e.g., number of GO terms, extreme set properties) and check if performance differs across partitions
- Stephen: Try extreme sets on a benchmark where MCL has already been validated