Moore Notes 1 21 15
From OpenWetWare
				
				
				Jump to navigationJump to search
				
				
Discussion of TARA Oceans data
- Participants: Katie, Josh, Tom, Guillaume, Stephen
 
- Stephen:
- Data embargo issue
 - Updated summary (slides)
 
 
- Analysis discussion:
- What do we want to do with the data?
- Start with aims of proposal
 
 - How to preprocess?
- They will likely release EGGNOG abundances
 - They may map reads to assemblies (gene catalog)
 - Do we need something more/different?
- Database
 - Classification thresholds
 - AGS normalization
 
 
 - diamond vs. rapsearch2
- Do a quick comparison (correlation) of bit scores
- If highly correlated, can use previously identified thresholds
 
 
 - Do a quick comparison (correlation) of bit scores
 - Many (667) samples to process
- Prioritize the prokaryote size fraction, then protists, then viruses
 - Prioritize open ocean (all?), surface waters (approximately 216 samples)
 - Start with metagenomes
 
 - Josh will look at ecological variability (MESS plots)
- Can we do global predictions?
 - Are there samples we would drop and therefore do not need to run for read classification?
 
 - How much QC is needed
- Stephen: Probably hasn't been done, but also not necessary
 - Better to keep track of quality and use that info downstream
 - Illumina looks better than 454
 - Could QC one library and compare protein family abundances pre and post QC
 
 - Size fractions reliable?
 - Stephen will start AGS analyses right away
 
 - What do we want to do with the data?