Moore Notes 10 6 10

From OpenWetWare
Jump to navigationJump to search

Group Call

  • MaxEnt Niche Mapping Project (Josh leading discussion)

General idea is to model the distribution of microbial species. We do this via MaxEnt by taking correlates of taxon presence and metadata (temperature, salinity, etc) and then predicting whether said taxon is present in a nonsampled site given known metatdata about the site. This process allows us to infer from prior observations the presence of a taxon in a location for which we have not taken a biological sample.

Sequence data comes from three sources: GOS shotgun samples, SILVA/MegX database, and MICROBIS (ICOMM data). A fairly sizeable number of samples from across the ocean, though the sampling density is not uniform across the ocean. All sequences of interest are 16S rRNA. Unclear if data is limited to solely Bacteria or any given phylogenetic lineage (this may be another source of sampling bias - museum bias).

Environmental layer data comes from Reddey et al. and the world ocean atlas. Layers include mean surface temperature (half-degree resolution), primary productivity, mean dissolved O2, mean salinity, and annual standard dev. of sea surface temperature. It is unclear in the literature how to select layers, no rigorous or systematic way to make selection.

Good question from Jess: some samples have similar environmental metadata layers collected at time of sampling. Can we use this information? Answer: To do the mapping, need high resolution grid of layers across the globe. Could use data collecting at sampling, but the remote data maps better to the global data we're using for inferences. Plus, most sequences don't have metadata associated with them.

?: why use SD and mean for some and not for others? A: this is based on the Reddey et al paper - they identify "good" layers given their throrough analysis. Also, the Tittensor (?) paper identifies layers that should be good predictors of diversity.

?: Can we use the data to make predictions about which layers would be best? A: Some sort of model selection approach would be great, but such a method hasn't been applied currently in the literature. There are some relatively simple statistical methods that could be adopted. Josh has run several subsets using different layers, qualitatively the results appear to be similar.

Species distribution model is MaxEnt: approximate the niche by assuming the maximization of entropy in the niche space. Seems to work really well with museum data where you only have presence data, sparse data. What happens with a niche value for which we have no data? Probably given a 0, for our maps we should grey out such locations on our maps. Sequences are taxonomically clustered using RDP, building maps for taxonomic Orders for now. Each map represents the distribution of a particular Order.

Also building maps for alpha diversity across Orders. High richness in coastal regions, taking out coastal samples might help ensure that this isn't some sort of overfitting. Have built several maps of alpha diversity using diffferent subsets of the data and they all exhibit similar qualitative patterns. This is also consistent with similar diversity patterns of macroorganisms. Josh has also plotted mean ordinal richness versus latitude. There are three peaks in ordinal richness. Might want to plot sampling density versus latitude to see if they are correlated - could help explain if sampling bias is driving observation. Error bars across latitudes appear consistent across the plot, suggesting that sampling is similar across latitudes.

Evaluating patterns of beta diversity as well. Looking at each cell and averaging the beta diversity across adjacent grid cells. Finds that beta diversity is highest near the poles. Distance will be changing with latitude, so we will have to correct for this.

Ending on a discussion of Rappaport's rule: high latitudes have taxa with larger ranges than low latitudes. Mapped mean range area for each cell and the plot looks very similar to beta diversity. Hard to tell if there's a latitude specific pattern from map. Looking at mean range breadth, however, shows a clear distinction between high and low latitudes. This is a metric that is often used to evaluate Rappaport's rule.

Next steps: verify that the patterns observed are real and not an artifact of the data. Polish some of the statistics, write the paper!