# GSMNP:Notebook/MaxEnt

(Difference between revisions)
 Revision as of 22:58, 6 August 2014 (view source) (→Preparing the Data)← Previous diff Revision as of 23:08, 6 August 2014 (view source) (→Prepare the Environmental Layers)Next diff → Line 102: Line 102:

The names of each environmental layer should be less than 13 characters. Optionally, categorical layers should begin with prefix (e.g. c_). If maxent is ever run from the command line, these layers can be switched from continuous (the default) to categorical based on their prefix using the command option togglelayertype.

The names of each environmental layer should be less than 13 characters. Optionally, categorical layers should begin with prefix (e.g. c_). If maxent is ever run from the command line, these layers can be switched from continuous (the default) to categorical based on their prefix using the command option togglelayertype.

There are many ways to ensure that the environmental layers have matching spatial attributes, but here I present a method that uses the Spatial Analyst toolbar in ArcMap. I assume that the environmental layers are already in the standard ArcInfo binary grid format and that they have the same projection (3.1.3). If some environmental layers are stored as polygon shapefiles, then they must be converted to ArcInfo binary grids from: Spatial Analyst[itex]\Rightarrow[/itex]Convert[itex]\Rightarrow[/itex]Features to Raster... (details for starting Spatial Analyst are given below). The cell size for the output grid may be determined beforehand, or should be taken to be the largest cell size of the environmental layers already stored as grids.

There are many ways to ensure that the environmental layers have matching spatial attributes, but here I present a method that uses the Spatial Analyst toolbar in ArcMap. I assume that the environmental layers are already in the standard ArcInfo binary grid format and that they have the same projection (3.1.3). If some environmental layers are stored as polygon shapefiles, then they must be converted to ArcInfo binary grids from: Spatial Analyst[itex]\Rightarrow[/itex]Convert[itex]\Rightarrow[/itex]Features to Raster... (details for starting Spatial Analyst are given below). The cell size for the output grid may be determined beforehand, or should be taken to be the largest cell size of the environmental layers already stored as grids.

- #Begin by loading all the environmental grids as layers in ArcMap. 2. Make sure the Spatial Analyst tool bar is available. If not: + #Begin by loading all the environmental grids as layers in ArcMap. + #Make sure the Spatial Analyst tool bar is available. If not: #*Tools[itex]\Rightarrow[/itex]Extensions[itex]\Rightarrow[/itex]Spatial Analyst (check to activate) #*Tools[itex]\Rightarrow[/itex]Extensions[itex]\Rightarrow[/itex]Spatial Analyst (check to activate) #*View[itex]\Rightarrow[/itex]Toolbars[itex]\Rightarrow[/itex]Spatial Analyst #*View[itex]\Rightarrow[/itex]Toolbars[itex]\Rightarrow[/itex]Spatial Analyst Line 115: Line 116: #Create an analysis mask using the current environment in Spatial Analyst. This mask will align the NODATA cells for each of the output environmental layers. #Create an analysis mask using the current environment in Spatial Analyst. This mask will align the NODATA cells for each of the output environmental layers. #*Open up Raster Calculator: Spatial Analyst [itex]\Rightarrow[/itex]Raster Calculator... #*Open up Raster Calculator: Spatial Analyst [itex]\Rightarrow[/itex]Raster Calculator... - #*Create a mask that is the intersection of the defined cells in each grid using the following statement: mask = [grid1] | [grid2] | [grid3] | ...

where grid1, grid2, ... are the layer names of the environmental grids (Note: The square brackets should be typed). This statement takes advantage of the fact that, when performing operations on a series of grids, even a single NODATA value in a layer will cause the output of that cell to be NODATA. This statement assumes that for a defined cell, at least one grid has a non-zero value. The resulting grid mask is stored in the grid directory of the analysis workspace and has a value of 1 where the mask is defined and NODATA elsewhere.

+ #*Create a mask that is the intersection of the defined cells in each grid using the following statement:
mask = [grid1] | [grid2] | [grid3] | ...

where grid1, grid2, ... are the layer names of the environmental grids (Note: The square brackets should be typed). This statement takes advantage of the fact that, when performing operations on a series of grids, even a single NODATA value in a layer will cause the output of that cell to be NODATA. This statement assumes that for a defined cell, at least one grid has a non-zero value. The resulting grid mask is stored in the grid directory of the analysis workspace and has a value of 1 where the mask is defined and NODATA elsewhere.

#*Set the newly created mask grid to be the mask for the Spatial Analyst environment at: Spatial Analyst[itex]\Rightarrow[/itex]Options... [itex]\Rightarrow[/itex]General[itex]\Rightarrow[/itex]Analysis Mask: mask #*Set the newly created mask grid to be the mask for the Spatial Analyst environment at: Spatial Analyst[itex]\Rightarrow[/itex]Options... [itex]\Rightarrow[/itex]General[itex]\Rightarrow[/itex]Analysis Mask: mask - ##Duplicate your environmental layers into grids that have the correct spatial attributes. + #Duplicate your environmental layers into grids that have the correct spatial attributes. #*Spatial Analyst[itex]\Rightarrow[/itex]Raster Calculator... #*Spatial Analyst[itex]\Rightarrow[/itex]Raster Calculator... - #*Create new grids with the same name as the original grids in the working directory: grid1 = [grid1] grid2 = [grid2] ... Raster Calculator does not actually replace the original grids. Instead, grids of the same name are created in the working directory, that have the appropriate spatial attributes. + #*Create new grids with the same name as the original grids in the working directory:
grid1 = [grid1]
grid2 = [grid2]
...
Raster Calculator does not actually replace the original grids. Instead, grids of the same name are created in the working directory, that have the appropriate spatial attributes. - #*Remove the old grids from the ArcMap data frame. 6. Convert the newly created grids into ASCII grids + #*Remove the old grids from the ArcMap data frame. + #Convert the newly created grids into ASCII grids #*Activate the ArcToolbox Raster to ASCII tool in Batch mode: ArcToolbox[itex]\Rightarrow[/itex]Conversion Tools[itex]\Rightarrow[/itex]From Raster[itex]\Rightarrow[/itex](Right Click) Raster to ASCII[itex]\Rightarrow[/itex]Batch #*Activate the ArcToolbox Raster to ASCII tool in Batch mode: ArcToolbox[itex]\Rightarrow[/itex]Conversion Tools[itex]\Rightarrow[/itex]From Raster[itex]\Rightarrow[/itex](Right Click) Raster to ASCII[itex]\Rightarrow[/itex]Batch #*The Raster to ASCII batch window has two fields Input raster and Output ASCII raster file. #*The Raster to ASCII batch window has two fields Input raster and Output ASCII raster file. Line 127: Line 129: Input raster 1 grid1 2 grid2 ... Output ASCII raster file Path to workspace\ascii\grid1 Path to workspace\ascii\grid2 Input raster 1 grid1 2 grid2 ... Output ASCII raster file Path to workspace\ascii\grid1 Path to workspace\ascii\grid2 After following these steps, the ascii folder in the analysis workspace will have all of the grids necessary for analysis in maxent. ArcMap should not be closed at this point, however, because the binary grids will still need to be used. After following these steps, the ascii folder in the analysis workspace will have all of the grids necessary for analysis in maxent. ArcMap should not be closed at this point, however, because the binary grids will still need to be used. + ===Prepare the Species Occurrence Data=== ===Prepare the Species Occurrence Data=== Here, I assume that all species occurrence data have been projected to match the environmental layers (3.1.3), that the data exist as a point shapefile, and that one field of the shapefile contains the species name. Here, I assume that all species occurrence data have been projected to match the environmental layers (3.1.3), that the data exist as a point shapefile, and that one field of the shapefile contains the species name.

## Revision as of 23:08, 6 August 2014

Search this Project

## Project Description

• A project notebook for use of Maximum Entropy Species Distribution Modelling within Great Smoky Mountains National Park. Based in part on a document produced by R. Todd Jobe [1] and Benjamin Zank, "Modelling species distributions for the Great Smoky Mountains National Park using Maxent." Image:Jobe 2008 MaxEnt.pdf

## Introduction

The goal of this document is to provide help for managers and researchers at Great Smoky Mountains National Park (GRSM) in modeling species distributions using maximum entropy (maxent) methods. It provides a reference for the maxent software (Phillips and Dudik 2008): the standard for modeling species distributions.

In the sections following we provide help for:

This document is designed to supplement, not replace the help files contained in the maxent software. It is strongly recommended that users also read (Phillips et al. 2006), (Phillips and Dudik 2008), and the tutorial.doc, which is packaged with the maxent software. Also, this document is structured for users working on a Windows system that has an installation of ArcGIS (ESRI 2006), though maxent can be run equally well on other operating systems (2) and with other GIS software.

A brief "Motivation and Background" section discusses the rationale for using maxent in GRSM.

## Getting the Software

There are many different software packages that can optimize data using maximum entropy methods. In this document, however, we focus on the most common software package for biologists (Maxent).

• The program is written in Java. This makes it cross-platform, which means that the code runs equally well on Unix, Macintosh and Windows operating systems.
• Most computer systems come with the Java run-time environment pre-installed or it is download Java during the course of Internet use.

Installing Java

• Java is a computer programming language and is not always pre-installed on a personal computer.

To see if Java is installed: Image:Fig1.tiff

• Open a command line interface. This varies depending on your operating system.
• Windows
• Start --> Run --> cmd
• Mac OS
• Go --> Applications --> Utilities --> Terminal
• type: java -version
• If the above command returns an error, then Java is not properly installed. It can be downloaded from http://java.com.

Installing / Running Maxent Java Application (Graphical User Interface) The main file to consider once the maxent files are downloaded from the website are: #maxent.jar and #maxent.bat.

• maxent.jar is the Java executable. It can be called from the command line using the java command: java -jar maxent.jar, but in Windows, it is simpler to launch the .jar file by clicking the .bat file (discussed immediately below).

Windows

• The maxent.bat file is a windows batch file
• double-clicking from the windows interface starts the maxent.jar executable.
• Both the .jar and .bat files are small.
• When performing an analysis, it makes sense to just copy these two files into a single workspace (typically a newly created folder/directory for the given project) to hold both the input data and outputs (3.2).

Help Files

• The maxent software contains a considerable amount of help documentation available from the user interface.
• There is also an excellent tutorial provided at the website where maxent is downloaded.
• It is strongly recommended that users go through the tutorial prior to using maxent on real data.

Unix Image:Fig2.tiff

## Preparing the Data

Maxent requires precise formatting of the species occurrence data and the environmental data. Further, the spatial attributes of all data must be identical. This section is meant to guide users through the preliminary decision about species and environments that must be made, and then help users convert their data into formats appropriate for analysis in maxent.

### Preliminary Decisions

There are some decision made up front which will alter how every other part of the analysis proceeds. Species and environmental layers must be selected which conform to certain geographic requirements, and the spatial attributes of all these layers must be defined.

### Choose Species

Maxent can build models for multiple species at one time. The species to be modelled must have geolocated occurrences. It is advantageous if the precision of these geolocations are also known. Environmental maps can be adjusted to match the precision of the geolocations. If any temporally sensitive environmental data are included (e.g. temperature for a particular year, or fire history), then the species observation dates must coincide with dates for which the environmental data are valid.

### Choose Environmental Variables

The predictions of any model will be improved if the selected environmental layers reflect the ecology of the organism. These associations may not be known for many species beforehand, however. Including every remotely sensed variable available is another option, and maxent provides estimates of the importance for each environmental variable included in the model (5.2). Maxent also provides a tuning parameter that adjusts the degree over-fitting (4). So, the kitchen-sink approach to variable inclusion works better in maxent than other approaches. At a bare minimum, species respond broadly to gradients of temperature and moisture. Three variables that approximate these gradients in GRSM are elevation, topographic convergence index, and hillshade (Jobe 2006).

### Choose a Projection

You must choose a projection that matches precisely among all data types. This includes having the same datum among all data types. Data layers for GRSM are typically projected as Universal Transverse Mercator (UTM) zone 17, and either have the NAD27 or WGS84 datum. WGS84 is preferred, but the choice of datum and projection does not matter as long as both the occurrence data and all the environmental are exactly the same. Projecting digital elevation models (DEMs) is not recommended if any other environmental layer is derived from them (e.g. slope, hillshade, hydrological models). The resampling required for projection introduces striations in the derived layers. It is best practice to project all other layers to match the projection of the DEM. Alternatively, derive layers from the DEM in the original projection, reproject all the grids. In ArcGIS you can use ArcToolbox to project both rasters and features. To project all layers to a common projection use the batch project option:

• Start ArcGIS
• Load all unprojected grids into the document.
• ArcToolbox$\Rightarrow$Data Management Tools$\Rightarrow$Projections and Transformations$\Rightarrow$(Right-click) Project Raster$\Rightarrow$Batch...
• Highlight each raster in the workspace and drag them to the field Input Raster.
• For the first raster (double-click) Output coordinate system
• Select a coordinate system from the box using an imported grid or browsing for a projection.
• Copy and paste the resulting value into each row of Output coordinate system.
• Repeat for Geographic Transformation if necessary.

At the end you should have new set of environmental layers, all sharing the same projection.

### Prepare a Workspace

It is simpler to create one folder for a given analysis. Here, we term this the workspace. The files maxent.bat and maxent.jar should be copied into this workspace. Also, two sub-folders should be created in the workspace: grid, which will hold the prepared ArcGrid binary environmental layers, and ascii, which will hold the prepared ESRI ASCII environmental layers.

### Prepare the Environmental Layers

The environmental layers set the geographic extent of the analysis window in the maxent software. So, it is best to prepare these layers before the species occurrence data, because some of the occurrences may lie outside this window and will have to be pared accordingly (3.4)

Maxent expects environmental data to be in ESRI ASCII grid format (AAIGrid). These grids can contain either continuous, or categorical data. If the grid is categorical, each category must be coded as an integer value. Environmental layers must share the same extent, the same grain, and the same mask (i.e. NODATA cells). In short, each layer must be identical except for the values contained in the data cells.

The names of each environmental layer should be less than 13 characters. Optionally, categorical layers should begin with prefix (e.g. c_). If maxent is ever run from the command line, these layers can be switched from continuous (the default) to categorical based on their prefix using the command option togglelayertype.

There are many ways to ensure that the environmental layers have matching spatial attributes, but here I present a method that uses the Spatial Analyst toolbar in ArcMap. I assume that the environmental layers are already in the standard ArcInfo binary grid format and that they have the same projection (3.1.3). If some environmental layers are stored as polygon shapefiles, then they must be converted to ArcInfo binary grids from: Spatial Analyst$\Rightarrow$Convert$\Rightarrow$Features to Raster... (details for starting Spatial Analyst are given below). The cell size for the output grid may be determined beforehand, or should be taken to be the largest cell size of the environmental layers already stored as grids.

1. Begin by loading all the environmental grids as layers in ArcMap.
2. Make sure the Spatial Analyst tool bar is available. If not:
• Tools$\Rightarrow$Extensions$\Rightarrow$Spatial Analyst (check to activate)
• View$\Rightarrow$Toolbars$\Rightarrow$Spatial Analyst
3. Set the analysis environment of Spatial Analyst
• Spatial Analyst$\Rightarrow$Options...
• General
• Working Directory:Path to Analysis Workspace\grid – Mask: <None>
• Extent
• Analysis Extent: Intersection of Inputs
• Cell Size
• Analysis Cell Size: Maximum of Inputs, or a predefined cell size that is greater than or equal to the largest cell size in your grid.
4. Create an analysis mask using the current environment in Spatial Analyst. This mask will align the NODATA cells for each of the output environmental layers.
• Open up Raster Calculator: Spatial Analyst $\Rightarrow$Raster Calculator...
• Create a mask that is the intersection of the defined cells in each grid using the following statement:
mask = [grid1] | [grid2] | [grid3] | ...

where grid1, grid2, ... are the layer names of the environmental grids (Note: The square brackets should be typed). This statement takes advantage of the fact that, when performing operations on a series of grids, even a single NODATA value in a layer will cause the output of that cell to be NODATA. This statement assumes that for a defined cell, at least one grid has a non-zero value. The resulting grid mask is stored in the grid directory of the analysis workspace and has a value of 1 where the mask is defined and NODATA elsewhere.

• Set the newly created mask grid to be the mask for the Spatial Analyst environment at: Spatial Analyst$\Rightarrow$Options... $\Rightarrow$General$\Rightarrow$Analysis Mask: mask
5. Duplicate your environmental layers into grids that have the correct spatial attributes.
• Spatial Analyst$\Rightarrow$Raster Calculator...
• Create new grids with the same name as the original grids in the working directory:
grid1 = [grid1]
grid2 = [grid2]
...
Raster Calculator does not actually replace the original grids. Instead, grids of the same name are created in the working directory, that have the appropriate spatial attributes.
• Remove the old grids from the ArcMap data frame.
6. Convert the newly created grids into ASCII grids
• Activate the ArcToolbox Raster to ASCII tool in Batch mode: ArcToolbox$\Rightarrow$Conversion Tools$\Rightarrow$From Raster$\Rightarrow$(Right Click) Raster to ASCII$\Rightarrow$Batch
• The Raster to ASCII batch window has two fields Input raster and Output ASCII raster file.
• Drag the grid layers from ArcMap to Input raster.
• Rename the default values of Output ASCII raster file to grids of the same name, but in the ascii folder:

Input raster 1 grid1 2 grid2 ... Output ASCII raster file Path to workspace\ascii\grid1 Path to workspace\ascii\grid2 After following these steps, the ascii folder in the analysis workspace will have all of the grids necessary for analysis in maxent. ArcMap should not be closed at this point, however, because the binary grids will still need to be used.

### Prepare the Species Occurrence Data

Here, I assume that all species occurrence data have been projected to match the environmental layers (3.1.3), that the data exist as a point shapefile, and that one field of the shapefile contains the species name.

1. Clip occurrences to the maximum extent. Occurrences cannot have geolocations outside of the environmental layers. To guarantee this, the occurrence data must be clipped to the environmental layer.
• Convert the mask grid to a polygon:

Spatial Analyst$\Rightarrow$Convert$\Rightarrow$Raster to Features

• Input raster: Path to Workspace\grid\mask – Field: VALUE
• Output geometry type: Polygon
• Generalize lines: unchecked
• Output features: Path to Workspace\plyMask
• Clip the occurrence data using the new mask
• ArcToolbox$\Rightarrow$Analysis Tools$\Rightarrow$Extract$\Rightarrow$Clip
• Input Features: Path to Occurrence Shapefile
• Clip Features: Path to Workspace\plyMask.shp
• Output Feature Class: Path to Workspace\pntOccurrences.shp
1. Add XY coordinate fields to the attribute table of the occurrence data, if they do not already exist.

ArcToolbox$\Rightarrow$Data Management Tools$\Rightarrow$Add XY

• Input Features: Path to Workspace\pntOccurrences.shp
1. Export the species occurrences attributes table as a .dbf file.
• Add pntOccurrences.shp to ArcMap as a layer.
• Right-click pntOccurrences$\Rightarrow$Open Attribute Table
• Options$\Rightarrow$Export
• Export: All Records
• Output table: Path to Workspace\tblOccurrences.dbf
2. Convert the .dbf file to a .csv.
• Open tblOccurrences.dbf in Microsoft Excel.
• Delete all fields except for the Species Name, X, and Y.
• Ensure the fields are ordered: Species Name,X,Y.
• Save the file as a .csv: pntOccurrences.csv

The end result of creating the species occurrence data should be a comma-separated values (csv) file, pntOccurrences.csv, with three fields (no header row): species, x, & y. This is the file that will be input to maxent.

### Last Steps

An output folder must be created in the workspace to hold the results from the Maxent model (an easy folder name is output.

Optionally, you may also generate an samples with data (SWD) file for the species and the environment. Details of this format are given in the maxent tutorial, but basically it saves model run time if the environmental data at the sample points is added to the species occurrence file. Maxent optimizes the relationship between occurrences and environment using a random sample of 10,000 random points. You can skip this step in the Maxent model run by doing it yourself in ArcGIS. The procedure for generating SWD files for the observations and the environmental data is this:

2. Run the tool Intersect Point Data
• Point file to intersect:Your species vector layer
• Raster: Select all environmental layers
3. Export the species vector layer as a *.dbf and then as a *.csv file as described in 3.4.
4. Generate a point shapefile containing 10,000 random points within the mask layer from Arc- Toolbox. Data Management Tools$\Rightarrow$Feature Class$\Rightarrow$Create Random Points
• Output Location : path to workspace
• Output Point Feature Class : environ
• Constraining Feature Class : mask
• Number of Points : Long, 10000

Add XY coordinates as in 3.4

1. Extract the environmental data to the environ layer using the Intersect Point Data tool as above.
2. Export the environ shapefile as a *.dbf and then as a *.csv file as described in 3.4

The end result of these steps will be two files, species.csv and environ.csv. These can be loaded as the species and environmental files, respectively, in the Maxent GUI or specified at the command line (4). Maxent will still need to use the contents of the ASCII folder for generating prediction layers if that option is selected.

## Running the Model

• Information concerning maximum entropy species distribution modeling in Great Smoky Mountains National Park.
• Collection of works concerning Maximum Entropy models:
• Phillips, S.J., Anderson, R.P., Schapire, R.E. (2006) Maximum entropy modeling of species geographic distributions. Ecological Modelling, 190, 231-259.
• Phillips, S.J., Dudic, M., Schapire, R.E. (2004) A maximum entropy approach to species distribution modeling. Proceedings of the Twenty-First *International Conference on Machine Learning, 655-662.
• Phillips, S.J., Dudic, M. (2008) Modelling of species distributions with Maxent: new extensions and a comprehensive evaluation. Ecography, 31, 161-175.
• Phillips, S.J., Dudic, M., Elith, J., Graham, C.H., Lehmann, A., Leathwick, J., Ferrier, S. (2009) Sample selection bias and presence-only distribution models: implications for background and pseudo-absence data. Ecological Applications, 19, 181-197.
• http://www.nics.tennessee.edu/faq

## Notes

• This project is currently under development as part of a Spring 2014 Practicum work for Tanner Jessel.
• There is some related information at http://mountainsol.wordpress.com

Recent changes