Motivation and Background: Difference between revisions

Revision as of 09:14, 29 July 2014

Modeling of species distributions in parks holds many values for the scientific community, but for stewardship of park resources by the NPS, it is critical.
- Only having species occurrences as points is of limited usefulness to park managers, since they cannot infer what is between the points.
Knowing with some probability where species are in large natural areas is essential to taking actions to protect them, including monitoring, stewardship of rare species, reacting to a species that is suddenly found to be at-risk, and modeling future scenarios that place species in jeopardy.
Currently there are many threats to natural systems and native species at Great Smoky Mountains National Park.
- The biological complexity, interactive stressors and limited agency resources at the Smokies, make knowing where to take the most effective actions imperative.

Maxent is a method for generating predictive distributions given a set of occurrence data and known environmental variables at those locations.
- This predicted distribution is constrained such that it is close to the empirical average of environmental variables at the occurrence locations.
- Among all possible models that fulfill these constraints the model of maximum entropy is the model which fits only the minimum constraints
- (i.e. it avoids over-fitting by choosing the most unconstrained model possible given the constraints set by the environmental variables at presence locations).
Maxent has been used extensively is physics and economics applications.
- It is just one among many different options for generating species prediction distributions using environmental variables at species presence site (GARP, GLM, GAM), but has several advantages. Taken from Phillips et al. (2006), maxent:

requires only presence data, not presence/absence data
can use both continuous and categorical variables
the optimization is efficient,
has a concise probabilistic definition,
it avoids over-fitting through regularization
can address sampling bias formally,
output is continuous (not just yes/no), and
is generative rather than discriminative which makes it better for small sample sizes.

There is some criticism against using Maxent for species distribution modelling. Specifically, Maxent considers only presence data instead of both presence and absence data. As a result, capture probabilities are not explicitly included in the model. This is nearly anathema in the field of Wildlife Biology where predictions based on mark-recapture studies have been the norm for years.
There are at least 3 practical answers to this criticism:

The first is to be explicit about the prediction probabilities that maxent produces.

Rather than modelling the probability of an occurrence, maxent models the probability that an occurrence at a given location is different from a randomly selected location.
The difference from true occurrence prediction is subtle, and in many cases probably does not matter.

Second, outside of animal studies, presence data, not presence/absence data or multiple observer data, is the norm. We know of no published data on plants where multiple observers were used to assess the observation probability of a species. Longitudinal studies are common, but they are not used in the same way that mark-recapture studies are used with animals.
Finally, because of the advantages outlined above, maxent is the easiest model to implement for the large amount of species that must modeled in the GRSM.

Developing an in-house model with all the advantages of maxent that includes both presence/absence data would be extremely costly.
It is likely that support for presence/absence data will be included in future versions of maxent, at which point the predictions surfaces can easily be recalculated without the cost of developing an in-house solution.