Moore Notes 3 4 09

Group Conference Call

  • Data sharing
    • genbeo (behind a firewall - need to sftp, a few terrabytes)
    • put a note on the wiki with readme info
    • include non-metagenomic datasets, maybe in a subdirectory
  • CollectiveX or other conversation system
    • how to have huge discussions? Let's try it.
  • MetaSim
    • problems with command line
    • how many species to include and how related (random?)
    • fasta file vs. alignments to AMPHORA profiles (folks want alignments)
  • ComboDb vs. AMPHORA
    • different sets of sequences (e.g. for rpoB)
    • almost all sequences in ComboDb are in AMPHORA profiles
      • ask Martin where he got the sequences
      • would be nice if we can link between ComboDb and AMPHORA with a unique id
  • tree building
    • pplacer discussion
    • computational time for ML tree estimation
      • on 8 core computer full ML with 700 seqs = 9 hours without bootstrapping
      • using reference seq tree as a guide tree speeds it up to 1-2 hours
      • 100s of sequences are OK, 1000s are not
    • Steve's results: only with reads vs. with reference sequences (in two ways)
      • reference tree helps to parcel out reads, without it the reads get lumped
      • could be a problem for clades where we have reads but no reference seqs
  • possible visit of Josh to Eugene