Moore Notes 11 10 09

From OpenWetWare

James' update

  • Looking at neutral theory of biodiversity, makes assumptions about processes driving biodiversity (simple starting point)
    • Key points:
      • Many species at same trophic level
      • May be sessile/motile
      • Dispersal/movement
      • Birth, death, speciation
    • Combine the above with a spatial landscape of stochasticity, this simple model becomes difficult to solve
    • Key results in PNAS pub (released today!)
  • Paper makes predictions about beta diversity and species area relationship (SAR)
    • SAR has triphashic shape, at small scale species increase linearly, at large scales the range becomes very small for a species (linear, but more slowly). In the middle there's a more slowly increasing region.
    • Paper derives what this model looks like and the describes the parameters of distance decay.
  • Next step is to take OTU from metagenomic data and use model to compare prediction of increase in taxa over area
    • JAE: Assuming model is correct, could see how diff definitions of OTU best fit the model
      • Good point, the model is so simple it definitely is not correct, want to know how params not often discussed affect model
    • Sam: how does this model relate to macro orgs?
      • Data evaluated falls into cononical range of SAR
    • Josh: question about exponent/power law relationship
      • Gradient is very slowly changing in this region, no exact power law. Even though you might observe power law shape with data from nature, may not be a true power law (tjs - fuzzy, missed a few details here)
    • JAE: do you think params you use could be fundamentally different in microbes than other orgs or are you assuming that they apply equally across orgs?
      • Typically you do find diff exps in power laws between microbes and macros. If this analysis holds up, then this indicates that the difference is real.
    • Josh: If you apply micro estimates to macro orgs, they generally lead to much lower estimates that expected

Josh's update

  • General question is whether microbial diversity scales with spatial scale in manner similar to macro orgs. It could be that micros are cosmopolitian compared to macros or they may be similar (heterogeneous). One way to study is through SAR
    • As we sample in larger areas, if we add species quickly, then endemism, etc is high.
  • SAR: people sample from regions of different sizes; hard to do with microbes because cannot census microbes within 1km area.
    • Thus, we want to infer how similarity between communities decays over distance
      • Use to make inferences of SAR
    • Developed method to do this, requires knowing functional form of distance decay relationship
      • Model suggests it is quadratic
      • Gather data, plot what relationship looks like, identify y-intercept and from this one can make inferences about exponent in SAR
    • Using macros to start - census of plots enables robust perspective of how method performs
      • Found our model fits very well to 6 of these macro datasets, should be reasonable for micros
      • Slope and intercept application of SAR to these datasets has also been conducted
    • The SAR are definately not power laws over the whole range, log-log shows curves. Over broad range it can be different values, and this is the triphasic model that James discussed
    • JAE: What data did you use?
      • 3 are tropical forests, 2 are amphibians and mammals in western hemisphere, 1 is California plants. For example, one of the tropical forests is a 1km square area where every plant has been surveyed. From this we can sub survey.
    • JAE: How does "all plants" connect to metagenomic datasets?
      • We used all plants that are there - a means of validation of the data
      • JAE: for metagenomic data you could analyze the entire dataset or constrain to look at one branch of the tree, trying to understand the scope of the test dataset that works with this method
      • Tropical plant is all land plants > 1 m in size (similar limitations/exhaustion on other datasets)
      • JAE: Do people see significantly diff patterns when they look at different taxa
      • The exponents don't vary too much, mostly around canonical range of 0.2-0.25, though the scale may vary. for the tropical vegetation plot, it was all woody plants (a guild) - orgs at same trophic level - might be differences if looking at specific family of plant
      • JAE: Just worried about how to apply to metagenomic sample of water - you have potentially every taxa in there (bac and arch), may not be part of exact same community. Might be useful to run analysis with different subpartitions (just bac, just proteo-bac, etc) to help diagnose complications
      • The fits are typically quadratic - see progress report that demonstrates fit. Lots of different way to calc similarity (Sjorensen, etc) and all of the predictions are upheld well by the data
      • A parameter is estimated first before the fit. The results do differ between communities
      • JAE: The ocean data will be the biggest dataset, diversity will be high. with hot spring, can sample same biofilm at diff temps. Ocean is highly mixed, many communities. There may be structure put into the forest studies.
      • Constraining analysis to different taxon groups should help account for this
    • Distance decay relationships were calculated by uniformly randomly dropping plots down on sampling region. Do again, and for each count species in each plot and distance between them. if overlapping plots, distance is equal to size of the plot
      • What would happen if plots were a lot bigger (100m by a side) - might better simulate the metagenomic situation where communities are intermixed? 100m is just a suggestion, might try bigger and smaller.
      • Could also uniformly randomly sample taxa from a plot rather than full census - this is high on the agenda (to test what happens if data is missing)
        • Prelim analysis suggests this model is robust to sampling error
    • James and Josh have lots to collaborate on, Josh will go up there in two weeks to initiate more direct collaborations
      • Monday 23 and Tuesday 24
    • James: do you find the power law is slowly varying with scale even if power law isn't exact? What does SAR look like when plotted?
      • The method is limited to estimating slope of SAR at relatively small areas. It increase steeply initially and then the rate the slope changes decreases. Beyond that it's difficult to make estimates. I guess we could say I'm seeing first two parts of triphasic, hard to tell beyond that.
    • James are Josh are making different measurements for decay of similarity
      • James: abundance based (prob that two individuals separated by distance r belong to the same taxon) - susceptible to 16S biases that could skew abundance based metrics
      • Josh: not abundance based, but complimentary
    • JAE: Could parameters in model be used to study incredibly close samples - what is scale you're aiming at
      • Rob Knight paper used 454-16S PCR to study microbes in human body
      • The assumption of model is that ranges of taxa can be represented by polygons. So long as true, should give good predictions.

Other stuff

  • JAE was at JGI scientific advisory board meeting yesterday - more places are trying to do metagenomics with Illumina sequences
    • Read length: up to 120 bp at reasonable quality (maybe 200bp), but that's maximum
    • More reads for less money seems to be in vogue
    • For simulations, 100bp seems good
  • 454 Titanium is ~500 bp, simulate at 400bp because mixed DNA samples brings avg. read length