GSMNP:Notebook/Maxent/Motivation and Background
Motivation and Background
- Modeling of species distributions in parks holds many values for the scientific community, but for stewardship of park resources by the NPS, it is critical.
- Only having species occurrences as points is of limited usefulness to park managers, since they cannot infer what is between the points.
- Knowing with some probability where species are in large natural areas is essential to taking actions to protect them, including monitoring, stewardship of rare species, reacting to a species that is suddenly found to be at-risk, and modeling future scenarios that place species in jeopardy.
- Currently there are many threats to natural systems and native species at Great Smoky Mountains National Park.
- The biological complexity, interactive stressors and limited agency resources at the Smokies, make knowing where to take the most effective actions imperative.
- Maxent is a method for generating predictive distributions given a set of occurrence data and known environmental variables at those locations.
- This predicted distribution is constrained such that it is close to the empirical average of environmental variables at the occurrence locations.
- Among all possible models that fulfill these constraints the model of maximum entropy is the model which fits only the minimum constraints
- (i.e. it avoids over-fitting by choosing the most unconstrained model possible given the constraints set by the environmental variables at presence locations).
- Maxent has been used extensively is physics and economics applications.
- requires only presence data, not presence/absence data
- can use both continuous and categorical variables
- the optimization is efficient,
- has a concise probabilistic definition,
- it avoids over-fitting through regularization
- can address sampling bias formally,
- output is continuous (not just yes/no), and
- is generative rather than discriminative which makes it better for small sample sizes.
Strengths & Weaknesses
- There is some criticism against using Maxent for species distribution modelling. Specifically, Maxent considers only presence data instead of both presence and absence data. As a result, capture probabilities are not explicitly included in the model. This is nearly anathema in the field of Wildlife Biology where predictions based on mark-recapture studies have been the norm for years.
- There are at least 3 practical answers to this criticism:
- The first is to be explicit about the prediction probabilities that maxent produces.
- Rather than modelling the probability of an occurrence, maxent models the probability that an occurrence at a given location is different from a randomly selected location.
- The difference from true occurrence prediction is subtle, and in many cases probably does not matter.
- Second, outside of animal studies, presence data, not presence/absence data or multiple observer data, is the norm.
- We know of no published data on plants where multiple observers were used to assess the observation probability of a species. Longitudinal studies are common, but they are not used in the same way that mark-recapture studies are used with animals.
- Finally, because of the advantages outlined above, maxent is the easiest model to implement for the large amount of species that must modeled in the GRSM.
- Developing an in-house model with all the advantages of maxent that includes both presence/absence data would be extremely costly.
- It is likely that support for presence/absence data will be included in future versions of maxent, at which point the predictions surfaces can easily be recalculated without the cost of developing an in-house solution.