# Moore Notes 2 2 10

From OpenWetWare

Jump to navigationJump to search
Group Call

- Reports
- Quarterly OK?
- Not sure yet

- Annual due in 1 month
- Template added to wiki by Katie http://iseem.private.openwetware.org/wiki/Main_Page#Progress_Reports

- Quarterly OK?

- Emails to the group can be sent through the Google Group

- Distance-decay figures based on 16S OTUs
- Plots of F(r) (probability that individuals from two communities are from the same OTU) versus log(r) (log of great circle distance separating sites). Plots1 Plots2 Plots3 Plots4
- Compare to Condit paper

- Tried for a variety of OTU cutoffs and excluding some of the samples.
- These are quite messy, there isn't a clear relationship.
- JG notes these aren't much messier than most distance decay relationships she has seen.
- How were the OTUs clustered (cf. Sogin paper) - could this have some effect?
- What about sensitivity of these methods to OTU richness, could we try a different metric of community similarity?
- The Sorenson distance vs. spatial distance plots aren't that much cleaner

- Plots of F(r) (probability that individuals from two communities are from the same OTU) versus log(r) (log of great circle distance separating sites). Plots1 Plots2 Plots3 Plots4

- How could we improve these distance decay figures?
- A lot of it is limitations of the data set
- One particular limitation is the sparse spatial sampling - we have very few sites separated by relatively small spatial distances, for example.
- What about the idea of using PCR-based data. What are the issues we need to consider to integrate metagenomic and PCR-based data?
- Tom & Josh working on a related analysis - for PCR data we are comfortable comparing pairwise sequence distances, not so much for metagenomic data. But PCR missing rare taxa. Could we combine these data sets somehow?
- JE suggests what if we start by looking at just PCR data. Two issues with metagenomic data. One is limited number of distance combinations possible in the data sets. Second is small number of rRNA in many metagenomic samples. Could we increase sample size using AMPHORA? Alternatively, for now, the theory is important and it's easier to apply it to PCR data.
- TS notes that many of the available PCR data sets are fairly shallow sampling of communities. JE points out though that a single plate of PCR analysis would give more sequences than some of the GOS samples with <10 16S reads identified.
- Sogin census of marine life 16S 454 reads for marine samples - where are these data?
- More generally can we look in GreenGenes, RDP, and CAMERA for environmental rRNA PCR data? GenBank PopSet [1] was formerly used as a place to deposit rRNA data, not sure if it's being used any more. JE will look into this - Pollard group have been looking around and haven't found many available data.
- Stoneking saliva microbiome data set is available - multiple spatial locations and multiple people at each location.

- Theory discussion
- Can we use observed patterns (i.e. of perimeter/area of species ranges) to infer processes that gave rise to those patterns?
- How is our estimation of the perimeters of species ranges affected by the scale length (length of the ruler we use to measure ranges)?
- Josh circulated figures descrbining this. Plots
- We're using distance decay relationships to estimate parameters such as species range area, perimeter, etc. How well do we estimate the true underlying pattern under a variety of simulation parameters?
- i.e. the estimators do a good job of estimating area. The smallest distance you have between plots is analagous to the length of the ruler you have to measure perimeter. If the ruler is longer than the indentedness of the perimeter or the angularity of the polygons, you'll do a bad job of estimating the true perimeter. Plots
- This suggests that sample design and the spatial layout of plots in empirical data sets will influence our ability to infer processes/patterns from these data sets. i.e. with the GOS data set, you might not be able to estimate certain parameters such as perimeter and angularity of species ranges precisely given the limitations of the data.

- James & Josh working on linking their theory to work with empirical data. i.e. Connect this theory about our ability to infer the shape of these polygons with our ability to estimate importance of different ecological processes.
- Mechanistic assembly theory for boundaries (vs. individuals)
- How to make microbiologists care about the difference between polygons with different amounts of edge angularity? For example, explain the consequences of sample design on our ability to infer pattern/process. The length scale/size of the ruler you're using to measure thing would be expected to influence the patterns we observe.
- Explain this in terms of spatial scaling and how we design ecological studies - i.e. the idea that the world as perceived by an elephant and virus are very different and to study them you'd want to use a ruler of appropriate size
- It would be interesting to talk about these results in light of the ongoing discussions about the processes that give rise to species ranges with different sizes, shapes, etc.
- How should we design a study to measure microbial ranges?

- Niche space (vs. geographic space)
**Main conclusions:**The GOS data set is going to impose limits on what can be inferred, and James & Josh need to make sure their theoretical work is using the same length scale.- Next step is to continue to develop & link James & Josh's theoretical work. Can you start with a single spatial dimension and then expand to 2D?

- Conflict next week, skip PI call. Will return to regular group call on Feb. 16 (Katie will be late).

- Some useful sources of rRNA PCR data
- NCBI PopSet is probably the best - see for example http://www.ncbi.nlm.nih.gov/popset/?term=rrna
- In there I found an interesting new paper: http://www.nature.com/ismej/journal/vaop/ncurrent/full/ismej2009134a.html