Species Distribution Models (SDMs), also known as Ecological Niche Models (ENMs), predict the presence and absence of species by interpolating identified relationships between collection data, stored in Natural History Museums and Herbaria, and environmental data. In Chapter 1 it was demonstrated how to prepare the collection data, and in Chapter 2 the environmental data were created. In this chapter we identify the relationships between species presence records and environmental data with a distribution modelling application, and subsequently interpolate the relationships to the research area of interest; here Borneo. We start with downloading the MaxEnt application from http://www.cs.princeton.edu/~schapire/maxent/. You can also download the MaxEnt tutorial from this website. Besides MaxEnt there are many other algorithms and applications such as GARP (http://nhm.ku.edu/desktopgarp/), BioMapper (http://www2.unil.ch/biomapper/), and Generalized Dissimilarity Modelling (http://www.biomaps.net.au/gdm/), amongst others! Most modelling algorithms are also available in R (R Development Core Team 2014). MaxEnt is a Java application, so you need Java to be installed on your computer (http://www.java.com). You open MaxEnt by clicking the Maxent.bat file (Fig. 15).
Maxent uses the maximum entropy algorithm which is defined as follows:
MaxEnt, or the maximum entropy method for species’ distribution modelling, estimates the most uniform distribution (‘maximum entropy’) across the study area, given the constraint that the expected value of each environmental predictor variable under this estimated distribution matches its empirical average (average values for the set of species’ presence records) (Phillips et al. 2006).
Figure 15. Maxent 3.2.19 interface.
Click the button ‘Settings’ and check the option ‘Remove duplicate presence records’, set the ‘Random test percentage’ to zero, and set the ‘Max number of background points’ to 10,000 (Fig. 16). Make sure there are more background points than presence records.
Figure 16. The ‘Maximum Entropy Parameter’ setting dialog from Maxent.
The most widely applied method to validate SDMs is the Area Under the Curve (AUC) of the Receiver Operator Curve (ROC) (Fielding and Bell 1997, McPherson et al. 2004, Raes and ter Steege 2007). The advantage of the AUC value over other measures of model accuracy (i.e. Cohen’s kapa, sensitivity, specificity) is that it is a) threshold independent, and b) prevalence insensitive. Setting a threshold means that continuous MaxEnt values, running from 0-1, do not have to be converted to discrete presence/absence values. There are several techniques to set thresholds (Liu et al. 2005), but this is not required for the AUC value.
Prevalence is the proportion of the data representing species’ presence, or presences / (presences + absences). The fact that the AUC value is relatively insensitive to prevalence is of special relevance because when absences are lacking, which is often the case, they are replaced by pseudo-absences, or background points. A sufficiently large sample of pseudo-absences is needed to provide a reasonable representation of the environmental variation exhibited by the geographical area of interest, typically 1,000-10,000 points. These large numbers of pseudo-absences automatically result in low prevalence values. The number of records by which a species is represented in herbaria and natural history museums ranges from 1 to 150-200 records. Even when a species is represented by 200 unique presence-only records and 1000 pseudo-absences are used, prevalence is only 16.7% (200/1200).
AUC values range from 0 to 1, with a value of 0.5 indicating model accuracy not better than random, and a value of 1.0 indicating perfect model fit (Fielding and Bell 1997). An AUC value can be interpreted as indicating the probability that, when a presence site (site where a species is recorded as present) and an absence site (site where a species is recorded as absent) are drawn at random from the population, the presence site has a higher predicted value than the absence site (Phillips et al. 2006). SDMs with an AUC value of 0.7 are considered to be reliable, values over 0.8 as good.
A major drawback of using pseudo-absences instead of true absences, however, is that the maximum achievable AUC value indicating perfect model fit, is no longer 1, but 1-a/2 (where a is the fraction of the geographical area of interest covered by a species’ true distribution, which typically is not known). Nevertheless, random prediction still corresponds to an AUC value of 0.5. Therefore, standard thresholds of AUC values indicating SDM accuracy (e.g. the threshold of AUC>0.7 that is often used), do NOT apply (Raes and ter Steege 2007). Therefore be very cautious when SDMs based on presence-only data are validated with AUC values. This problem can be solved by testing against a null model. This procedure is described in detail in Raes and ter Steege 2007, but this goes beyond this practical.
Macaranga_auriculata.html
– file in the folder ‘Maxent Results’.Macaranga_auriculata.html
file. This table contains a list of thresholds.
Among the most widely used ones are: ‘10 percentile training presence’, ‘Equal training sensitivity and
specificity’, and ‘Maximum training sensitivity plus specificity’.Although you have made all efforts to accurately georeference your specimens it is wise to be cautious about the identification and georeferencing performed by other people. I therefore suggest to be conservative and use the ‘10 percentile training presence’ – threshold to allow 10% of your specimens to fall outside the predicted area. The corresponding ‘Logistic threshold’ is 0.296 and results in a ‘Fractional predicted area’ of 0.281 (Fig. 18).
Figure 17. The ROC curve with the reported AUC value of 0.900 for M. auriculata.
Figure 18. Commonly used thresholds.
PointLocalities.div
.Macaranga_auriculata.asc
.Figure 19. Maxent model of_ M. auriculata_.
Figure 20. Thresholded Maxent model of M. auriculata.
With the instructions described above you can now develop your own species distribution model.
Choose your species and go to the GBIF portal gbif.org.
Go to the Data option and select ‘Explore species’. You can select the higher taxon of interest to make a pre-selection below the search bar (e.g. Mammals, Flowering Plants).
In the example you see the data for Viscum album.
Your report should have: