Moore Notes 9 16 09
- Steve asked about next step for GOS phylogenetic diversity analyses
- can we add the 100 additional gene families?
- additional gene family HMM data should be available from Dongying
- these include archaea but not eucarya
- could we take out the BLAST vs. E. coli step of the process? (Sam has code)
- Sam has simulation data for rpoB and another gene are available now on edhar - try them out
- there hasn't been a lot of testing of the 100 new gene families
- they were selected for universality and evenness, not congruence with the genome phylogeny
- the first 31 AMPHORA marker genes were chosen based on universality, and theory that they should be decent phylogenetic markers
- there hasn't been a huge amount of work to test whether the AMPHORA genes give the same phylogenetic signal
- TODO: let's quantify how much the different gene families give the same phylogeny as each other and the genome tree
- Tom is working on a pipeline to use non-phylogenetic-marker protein families, find and align reads from those families in metagenomic data
- Two aims are to detect gene family novelty using metagenomic data and to evaluate communities at the gene family diversity level
- General approach is similar to AMPHORA but focus is on a very different set of gene families that are not universal and have lots of variation in copy number, etc.
- Example paper - DeLong data phylogenetic diversity analyses. Some ideas for analyses came from group discussions. Jonathan suggested running analysis on combined gene families. Srijak ran AMPHORA on the data. Jess B. ran DOTUR on the CAMERA 16S. Steve ran all other analyses. Steve and Jess are writing the paper.
- by our list of criteria, would discussion be enough to merit authorship?
- sometimes a conversation could be important enough to merit authorship
- identifying authorship should be an ongoing conversation that may change as a paper develops and people's contributions change
- seeing a draft or outline of the paper should help people decide if they should be an author so let's circulate it. This will give people a chance to give feedback and make decisions about authorship.
- Example paper - James theory paper linking beta diversity to species-area curves
- This paper has lots of theory work but will also involve GOS 16S OTU data analysis
- Tom has done a lot of the OTU processing work
- Anyone who would be involved as an author will probably need to do some work editing and developing the paper
- Example paper - Josh and Katie also working on a paper using the GOS 16S OTU data and beta diversity/theory development
- Should this be merged with the paper James is describing or remain separate?
- The papers are complementary but might be distinct enough to merit 2 different papers with same data
- Will be important for people to maintain separate identity, even if the ideas in the papers are linked
- We don't necessarily want every paper from the project to have huge author lists, so we can identify individual scientific contributions
- Involving people in the writing/development of the paper should help make it clear who should be a coauthor
- Example paper - metagenomic data simulation pipeline
- Still at an early stage. Some people have contributed ideas or code at different stages of the process, hard to say at this point whether that would constitute authorship. Not sure whether the simulator itself could be a paper (i.e. methods/software paper) or whether it should be applied to a problem (i.e. evaluating different phylogenetic methods).
- Moore foundation is interested in us writing up methods as papers especialy when they're going to be implemented by CAMERA
- Most of the contributions to this project so far have been discussions, or some code that wasn't incorporated in the end. "Discussions are part of a collaborative project and don't merit authorship". Code written for the project, even if unused in the final project, could constitute authorship depending on the total scope of what's in the paper and the contribution to the development of the project. If someone's contribution is iffy, involving them in the further development of the paper could make it clearer that they deserve to be an author.
General consensus is that circulating early drafts will give people a chance to contribute and make it clearer who will be an author on which papers. This will also allow us to get different perspectives on the papers. Not everyone will be an author on every paper.
- iSEEM report is due in two weeks. Jess will copy and paste template of last report and add a section on draft papers and authorship discussions. Everyone insert their work into the template. Lots of detail is good, figures are good. We can link to a PDF of draft papers in the report.
- Dongying working on writing up the additional gene family HMM work
- Need to work on the comparative phylogenetic analysis of the different gene families
- Quantifying phylogenetic distances between individual genes and the concatenated gene family tree, used this to rank the metrics. Should develop this further, how else could this be done?
- Circulate what has been done so far? The data simulation pipeline could be used to evaluate this as well, Sam just needs the data.
- Jess and Jonathan have been talking with a potential postdoc whose interests are a great fit with iSEEM work and the Green/Eisen/Pollard labs
- Wants to build whole transcription factor networks for organisms in metagenomic communities. Might be tricky since we don't know the organisms in these data sets.
- Could be useful for him to talk to people in the group, he may touch base