The following R markdown file performs the steps needed to calculate niche overlap between the extant Ungulates. Here we implement two different approaches to calculate the niche per species:
The following libraries and scripts need to be loaded:
library(raster, quietly = T)
library(knitr, quietly = T)
library(maxent, quietly = T)
library(rJava, quietly = T)
library(maptools, quietly = T)
library(dismo, quietly = T)
library(sp, quietly = T)
library(ape, quietly = T)
library(rgdal, quietly=T)
library(dplyr, quietly=T)
# our local code. the goal is that this will eventually be properly portable
# code that lives inside the ./script folder of the repository
source("../script/MaxEnt_function.R")
In addition, here we define the global variables that we will reuse throughout the code:
# this is the location of the root directory of the git repository in the local
# file system. normally you would be running the code from within rstudio, using
# a project that is initialized within that directory, so the working directory
# should automatically be set.
REPO_HOME <- paste(getwd(), "/../", sep = "")
The environmental data used in this research are based on climatic variables, topography, and soil characteristics. Climatic information about the present was extracted from the widely used Bioclim dataset, which includes 19 bioclimatic layers. The datasets contain information such as ‘precipitation in the driest quarter’ or ‘maximum temperatures of the coldest month’ and are constructed based on monthly remote sensing data between 1950 and 2000 (Hijmans et al., 2005, Title et al., 2018). The dataset can directly be downloaded with the getData() function from the raster
package. It is possible to adjust the spatial resolution res
to 30 arc seconds, 2.5, 5, and 10 arc minutes.
# This function from the raster package loads bioclimatic layers. It first
# attempts to do this from files in data/GIS/wc5, but if these are not found it
# will attempt to download them and cache them locally. Since the files are
# small enough to commit to github we will normally have these files already.
# This could change if we move to a resolution that is finer than 5 arcminutes,
# as the layer files will then grow above 15MB.
gis.layers <- raster::getData(
"worldclim",
var = "bio",
res = 5,
path = paste(REPO_HOME, "/data/GIS", sep = ""),
download = T
)
The other environmental datasets that we used are the new ENVIREM variables that give additional climatic information to the Bioclim datasets. We used median elevation variables from the Harmonized World Soil Database (HWSD), which are based on NASA’s Shuttle Radar Topographic Mission to calculate worldwide slope and aspect. We used indirect height measures such as slope and aspect because the height variables are directly correlated with the temperature Bioclim datasets. To capture soil characteristics, we used organic carbon, pH CaCL, bulk density and clay percentage datasets obtained from the land-atmosphere interaction research group at Sun Yat-sen University.
# The location of the TIFF file with the stacked layers described directly above
files.names <- list.files(paste(REPO_HOME, "/data/GIS/5_deg", sep = ""))
# Turn the file names into layer names: strip the prefix (which might include
# the resolution) and strip the file extension
gis.layers.names <- files.names
gis.layers.names <- gsub('current_5arcmin_','',gis.layers.names)
gis.layers.names <- gsub('.tif','',gis.layers.names)
# Combine the layer names with those we've already read from BIOCLIM
gis.layers.names <- c(names(gis.layers),gis.layers.names)
# Iterate over files
for (i in 1:length(files.names)) {
# Stack with previously read layers
gis.layers <- stack(
gis.layers,
# Read as raster
raster(
# Construct file name
paste(REPO_HOME, "/data/GIS/5_deg/", files.names[i], sep = "")
)
)
}
# Apply all names
names(gis.layers) <- gis.layers.names
rm(gis.layers.names, files.names, i)
plot(gis.layers)
In a project that includes many taxa, such as this one, it is helpful if we can work incrementally by having separate input and output files per taxon. To set this up we first read in the list of taxa from /data/filtered/taxa.txt
and create corresponding output directories:
# read listing
taxa <- scan(paste(REPO_HOME, "/data/filtered/taxa.txt", sep = ""), sep = "\n", what = character())
# make output directories, if needed
for (i in 1:length(taxa)) {
taxon.dir.name <- sprintf("%s/results/per_species/%s/", REPO_HOME, taxa[i])
if (!dir.exists(taxon.dir.name)) {
dir.create(taxon.dir.name, recursive = T)
}
}
rm(i, taxon.dir.name)
When calculating the species distribution models a dataframe containing all the occurence points is needed, which is created and filled below:
# create an empty dataframe
occurrences.df = data.frame(
matrix(
vector(), 0, 4,
dimnames=list(
c(),
c("gbif_id", "taxon_name", "decimal_latitude", "decimal_longitude")
)
),
stringsAsFactors=F
)
# loop to add the occurence data to the empty dataframe
for ( i in 1:length(taxa) ) {
csv.file <- sprintf("%s/data/filtered/%s.csv", REPO_HOME, taxa[i])
csv <- read.csv(csv.file, header = T)
occurrences.df <- rbind(occurrences.df, csv)
}
# finalize data frame: parse longitudes and latitudes
occurrences.df$decimal_longitude <- as.numeric(as.character(occurrences.df$decimal_longitude))
occurrences.df$decimal_latitude <- as.numeric(as.character(occurrences.df$decimal_latitude))
rm(csv.file,csv,i)
Below the raw occurrences are plotted on a map for each species, which goes to /results/per_species/<taxon>/occurrences.png
:
data(wrld_simpl)
# example of the occurrence point plots
taxon <- taxa[68]
taxon.name <- gsub( "_", " ", taxon)
# select the longitude and latitude of the focal taxon from the data frame
taxon.occurrences <- dplyr::filter(occurrences.df, taxon_name == taxon.name)
lon <- dplyr::select(taxon.occurrences, decimal_longitude)$decimal_longitude
lat <- dplyr::select(taxon.occurrences, decimal_latitude)$decimal_latitude
# plot a simple map ±5 degrees long and lat around the occurrences
plot(
wrld_simpl,
axes = TRUE,
xlim = c(min(lon) - 5, max(lon) + 5),
ylim = c(min(lat) - 5, max(lat) + 5),
col = "gray93",
main = sprintf("Filtered occurrences for %s", taxon.name)
)
points(lon, lat, col = "red", pch = 20, cex = 0.75)
box()
# loop through all the occurrence files and save them in their repositories
for (i in 1:length(taxa)) {
# construct the name of the occurrences file for the current taxon
taxon <- taxa[i]
occ.map.file <- sprintf("%s/results/per_species/%s/occurrences.png", REPO_HOME, taxon)
# generate the map
if (!file.exists(occ.map.file)) {
taxon.name <- gsub( "_", " ", taxon)
# select the longitude and latitude of the focal taxon from the data frame
taxon.occurrences <- dplyr::filter(occurrences.df, taxon_name == taxon.name)
lon <- dplyr::select(taxon.occurrences, decimal_longitude)$decimal_longitude
lat <- dplyr::select(taxon.occurrences, decimal_latitude)$decimal_latitude
# plot a simple map ±5 degrees long and lat around the occurrences
png(file = occ.map.file)
plot(
wrld_simpl,
axes = TRUE,
xlim = c(min(lon) - 5, max(lon) + 5),
ylim = c(min(lat) - 5, max(lat) + 5),
col = "gray93",
main = sprintf("Filtered occurrences for %s", taxon.name)
)
points(lon, lat, col = "red", pch = 20, cex = 0.75)
box()
dev.off()
# clean up
rm(taxon.name,taxon.occurrences,lat,lon)
}
rm(occ.map.file, taxon)
}
rm(i, wrld_simpl)
For this study we use the dismo::maxent
algorithm, which we invoke from the Maxent_function
as implemented in this script.
In the following section we run the Maxent analysis for every taxon. To assess the validity of the model we follow the method of Raes and Ter Steege. This approach seeks to solve the problem that we do not have true absences (because absence of evidence is no evidence of absence). We do this by considering the area within the buffer around the true occurrences of the focal species, and then sample from among the localities within that buffer where occurrences for other species have been recorded, but not for the focal species. This way we avoid sampling pseudo-absences in locations that are, for example, not reachable, i.e. observer biases.
By doing the resampling a sufficient number of times we can establish a null distribution and assess whether the true AUC fits better than the AUCs generated given pseudo-absences. If the true AUC fits better, it is written to a file valid_maxent_model.rda
, otherwise to a file invalid_maxent_model.rda
.
# containers for models with low/high accuracy
list_species_model_low_accuracy <- list()
list_species_model_high_accuracy <- list()
# iterate over n species
for (i in 1: length(taxa) )
{
name <- taxa[i]
valid.model.file <- sprintf(
"%s/results/per_species/%s/valid_maxent_model.rda", REPO_HOME, name)
invalid.model.file <- sprintf(
"%s/results/per_species/%s/invalid_maxent_model.rda", REPO_HOME, name)
# check if we have the model cached
if (!file.exists(valid.model.file) && !file.exists(invalid.model.file)) {
taxon.name <- gsub('_', ' ', name)
message(taxon.name)
taxon.occurrences <- dplyr::filter(occurrences.df, taxon_name == taxon.name)
# Run maxent:
# - buffer environment to ± 9 degrees lat/long relative to the occurrences
# - remove colinear layers
# - run dismo::maxent()
species_model <- Maxent_function(taxon.occurrences, gis.layers)
# select species model from the output list
spmod <- species_model[[1]]
# select trainingAUC
trainingAUC <- spmod@results[[5,1]]
## null model
# - sample occurence points within the buffered environment
# - run the maxent model 100 times using the selected layers
visited_areas <- dplyr::filter(occurrences.df, taxon_name != taxon.name)
modelenvironment <- species_model[[2]] # selected layers
polygonextent <- species_model[[4]] # buffered environment
AUCvalues <- nullModel_without_spatial(
visited_areas,
modelenvironment,
polygonextent,
nrow(taxon.occurrences),
rep = 100
)
auc <- sort(unlist(AUCvalues), decreasing = FALSE)
# select 95th number in the list
confidence.interval<- auc[95]
# populate two stacks, one with the models with
# low accuracy and one with high accuracy
if (trainingAUC > confidence.interval) {
model <- species_model[[1]]
list_species_model_high_accuracy[[name]] <- model
save(model, file = valid.model.file)
}
else {
model <- species_model[[1]]
list_species_model_low_accuracy[[name]] <- model
save(model, file = invalid.model.file)
}
# valid model was previously constructed, load it
} else if (file.exists(valid.model.file)) {
env <- new.env()
nm <- load(valid.model.file, env)[[1]]
list_species_model_high_accuracy[[name]] <- env[[nm]]
# invalid model was previously constructed, load it
} else if (file.exists(invalid.model.file)) {
env <- new.env()
nm <- load(invalid.model.file, env)[[1]]
list_species_model_low_accuracy[[name]] <- env[[nm]]
}
}
rm(env,i,name,nm,invalid.model.file,valid.model.file)
Having constructed the models, we now record their scores in a spreadsheet:
# combine two lists of valid and invalid models
AUC.csv <- paste(REPO_HOME, "/results/maxent/model_summaries/AUCvalues.csv", sep = "")
if (!file.exists(AUC.csv)) {
output_AUC_valid <- data.frame(
matrix(
ncol = 3,
nrow = length(list_species_model_high_accuracy)
)
)
colnames(output_AUC_valid) <- c("taxon", "trainingAUC", "validation")
# iterate over accurate models and add to data frame
for (i in 1:length(list_species_model_high_accuracy)) {
open_species_model <- list_species_model_high_accuracy[[i]]
name <- names(list_species_model_high_accuracy[i])
name_underscore <- gsub("_", " ", name)
trainingAUC <- open_species_model@results[[5, 1]]
output_AUC_valid[i, ] <- c(name_underscore, trainingAUC, "valid")
}
# initialize data frame
output_AUC_invalid <- data.frame(
matrix(
ncol = 3,
nrow = length(list_species_model_low_accuracy)
)
)
colnames(output_AUC_invalid) <- c("taxon", "trainingAUC", "validation")
# iterate over inaccurate models and add to data frame
for (i in 1:length(list_species_model_low_accuracy)) {
open_species_model <- list_species_model_low_accuracy[[i]]
name <- names(list_species_model_low_accuracy[i])
name_underscore <- gsub("_", " ", name)
trainingAUC <- open_species_model@results[[5, 1]]
output_AUC_invalid[i, ] <- c(name_underscore, trainingAUC, "invalid")
}
# concatenate data frames and write to CSV
combined_auc <- rbind(output_AUC_invalid, output_AUC_valid)
write.csv(combined_auc, file = AUC.csv)
}
For the accurate models we then write the respective importance of the selected variables to a figure, and the same for the response curves to these variables:
# plot variable importance
maxent.model <- list_species_model_high_accuracy[[10]]
taxon.name <- names(list_species_model_high_accuracy[10])
plot.title <- sprintf("Variable importance for %s", taxon.name)
plot(maxent.model, main = plot.title)
# plot response curves
response(maxent.model)
# iterate over accurate models
for (i in 1:length(list_species_model_high_accuracy)) {
maxent.model <- list_species_model_high_accuracy[[i]]
taxon.name <- names(list_species_model_high_accuracy[i])
plot.title <- sprintf("Variable importance for %s", taxon.name)
# construct file name for variable importance
variable.map.file <- sprintf(
"%s/results/per_species/%s/valid_maxent_variable_importance.png",
REPO_HOME,
taxon.name
)
if (!file.exists(variable.map.file)) {
png(file = variable.map.file)
plot(maxent.model, main = plot.title)
dev.off()
}
# construct file for variable responses
response.map.file <- sprintf(
"%s/results/per_species/%s/valid_maxent_response_curve.png",
REPO_HOME,
taxon.name
)
if (!file.exists(response.map.file)) {
png(file = response.map.file, res = 80, units = "px")
response(maxent.model)
dev.off()
}
}
The outcome of the species distribution models can be used to create a prediction of the areas that are suitable for the species. We can do this world- wide, which allows us to calculate (potential) niche overlap even for species that natively live on different continents. However, the projections that we thus generate disregard barriers to dispersal and are therefore quite unnatural.
For more plausible projections we restrict the extent in which we project to the biomes in which the occurrences were recorded to begin with.
# combine all prediction rasters, either by doing the projection or by reading from a file
prediction_rasters <- stack()
crs.string <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
# the stacking of all the suitability rasters uses a lot of memory. You can change 1: lenth(taxa) to for example the first 20 species 1:20. When calculating the Schoener's distance in {calc_dist} all the suitability rasters need to be stacked (use 1:length(taxa)) and therefore it is recommended to use a computer with sufficient memory.
for ( i in 1:20
#length(taxa)
) {
taxon.name <- gsub( '_', ' ', taxa[i] )
# file name: load if exists, create otherwise
valid.model.prediction <- sprintf(
"%s/results/per_species/%s/valid_maxent_prediction.rda",
REPO_HOME,
taxa[i]
)
if ( file.exists(valid.model.prediction) ) {
# load file
message(sprintf("loading prediction for taxon %s from file %s", taxon.name, valid.model.prediction))
env <- new.env()
nm <- load(valid.model.prediction, envir = env)[[1]]
prediction_rasters <- stack(prediction_rasters, env[[nm]])
}
else {
message(sprintf("computing prediction for taxon %s, will write to file %s", taxon.name, valid.model.prediction))
maxent.model <- list_species_model_high_accuracy[[i]]
# select non correlated layers from model environment in the world environment
noncorrelated <- names(maxent.model@presence)
subsetworldEnv <- subset(gis.layers, noncorrelated)
# make projection of the whole earth based on the maxent SDM
species.pred <- dismo::predict(maxent.model, subsetworldEnv)
# create occurrence points for focal taxon
points <- dplyr::filter( occurrences.df, taxon_name == taxon.name )
points <- dplyr::select( points, decimal_longitude, decimal_latitude )
coordinates(points) <- ~ decimal_longitude + decimal_latitude
proj4string(points) <- CRS(crs.string)
# select realms in which occurrence points are recorded
#polys.sub <- realms[!is.na(sp::over(realms, sp::geometry(points))), ]
#polys.sub <- gUnaryUnion(polys.sub)
# set all the values outside the polygon to 0
#maskedprediction <- mask(
#species.pred,
#polys.sub,
#updatevalue=0,
#updateNA=TRUE
# )
# remove the 10% lowest prediction values under the occurence points
projection.values <- extract(species.pred, points, method="simple")
sort.projection.values <- sort(unlist(projection.values), decreasing = FALSE)
percentage.number <- round((0.1* dplyr::count(as.data.frame(sort.projection.values))), digits = 0)
threshold.value <- sort.projection.values[percentage.number[1,1]]
species.pred[species.pred < threshold.value] <- 0
# save file, stack projection contents
save(species.pred, file = valid.model.prediction)
prediction_rasters <- stack(prediction_rasters, species.pred)
}
}
# if you only stack the first 20 used taxa[1:20]
names(prediction_rasters) <- taxa[1:20]
taxa.prediction.rasters<-names(prediction_rasters)
crs.string <- "+proj=longlat +ellps=WGS84 +towgs84=0,0,0,0,0,0,0 +no_defs"
realms <- readOGR(paste(REPO_HOME, "/data/GIS/Realms/newRealms.shp", sep = ""))
## OGR data source with driver: ESRI Shapefile
## Source: "C:\Users\elkeh\Documents\Stage_Naturalis\Scripts_R\trait-geo-diverse-ungulates\Data\GIS\Realms\newRealms.shp", layer: "newRealms"
## with 11 features
## It has 5 fields
realms <- spTransform(realms, CRS(crs.string))
plot(realms)
for ( i in 1: length(taxa.prediction.rasters) ) {
# construct restricted projection file name
prediction.restricted.file <- sprintf(
"%s/results/per_species/%s/valid_maxent_prediction_restricted.rda",
REPO_HOME,
taxa.prediction.rasters[i]
)
if (!file.exists(prediction.restricted.file)) {
projection.raster<- prediction_rasters[[i]]
taxa.name<- gsub("_", " ", taxa.prediction.rasters[i])
# create occurrence points for focal taxon
points <- dplyr::filter( occurrences.df, taxon_name == taxa.name )
points <- dplyr::select( points, decimal_longitude, decimal_latitude )
coordinates(points) <- ~ decimal_longitude + decimal_latitude
proj4string(points) <- CRS(crs.string)
#select realms in which occurrence points are recorded
polys.sub <- realms[!is.na(sp::over(realms, sp::geometry(points))), ]
polys.sub <- gUnaryUnion(polys.sub)
# set all the values outside the polygon to 0
maskedprediction <- mask(
projection.raster,
polys.sub,
updatevalue=0,
updateNA=TRUE )
save(maskedprediction, file = prediction.restricted.file)
}
# construct prediction map file name
prediction.map.file <- sprintf(
"%s/results/per_species/%s/prediction_map.png",
REPO_HOME,
taxa.prediction.rasters[i]
)
if (!file.exists(prediction.map.file)) {
png(file = prediction.map.file)
plot(maskedprediction)
dev.off()
}
# construction prediction map with occurrences file name
prediction.occurence.map.file <- sprintf(
"%s/results/per_species/%s/prediction_occurence_map.png",
REPO_HOME,
taxa.prediction.rasters[i]
)
if (!file.exists(prediction.occurence.map.file)) {
# get occurrences of focal taxon, superimpose on prediction map
taxa.name<- gsub("_", " ", taxa.prediction.rasters[i])
png(file = prediction.occurence.map.file)
plot(maskedprediction)
points(
points$decimal_longitude,
points$decimal_latitude,
col = "red",
pch = 20,
cex = 0.2
)
box()
dev.off()
}
}
# location of a triangular, inverse distance matrix file
for ( i in 1:length(taxa)) {
name <- taxa[i]
readme.file <- sprintf('%s/results/per_species/%s/README.md', REPO_HOME, taxa[i])
if(!file.exists(readme.file)) {
taxa_remove<- gsub( "_", " ", name)
template<-
'# {{taxa_name}}
## Distribution of occurrence points
The following map shows the distribution of the filtered
[occurrences](../../../data/filtered/{{taxa_underscore}}.csv) for {{taxa_name}}
used in [the Maxent model](valid_maxent_model.rda).
![](occurrences.png)
## Variable importance
The variable importance graph shows the relative importance of the abiotic
raster layers in the model
![](valid_maxent_variable_importance.png)
## Response curves
The response curve graphs show the modeled response of {{taxa_name}} to the
selected abiotic raster layers.
![](valid_maxent_response_curve.png)
## Model predictions
To be able to compute the potential distribution overlap, worldwide, between
all pairs of species, a model [prediction](valid_maxent_prediction.Rda) is
available. However, for prediction maps that take into account large-scale
biogeographic barriers to dispersal, we use a
[restricted prediction](valid_maxent_prediction_restricted.rda) that limits
the projection to the
[zoogeographical realm(s)](../../../data/GIS/Realms/newRealms.shp) in which the
raw occurrences are located. The maps below are based on this restricted
projection.
The first map shows the predicted suitable areas on earth based on the niche
preferences for {{taxa_name}} calculated in the Maxent model. The second map
shows the suitable area map with the original occurrence points superimposed.
![](prediction_map.png)
![](prediction_occurence_map.png)
'
data <- list( taxa_name = taxa_remove, taxa_underscore = name )
text <- whisker.render(template, data)
writeLines(text, readme.file)
}
}
To assess whether the predicted suitable niche spaces differ per species we calculate their niche overlap. We use Schoener’s D to calculate niche overlap since it has been suggested to be the best suited index for maxent SDM outputs (Rödder & Engler, 2011). The index ranges from 0 which is no overlap to 1 which is a complete overlap and is based on the model prediction maps.
# location of a triangular, inverse distance matrix file
overlap.file <- paste(REPO_HOME, "/results/maxent/comparative/schoener/overlap.csv", sep = "")
if (!file.exists(overlap.file)) {
# file doesn't exist, calculate Schoener's Distance and store file
overlap <- calc.niche.overlap(prediction_rasters, stat = "D", maxent.args)
write.table(overlap, file = overlap.file, sep = ",", quote = F)
} else {
# file exists, read it
overlap <- read.table(overlap.file, sep = ",")
}
Having calculated the Schoener’s D distances, we can now use these to make a symmetrical distance matrix, and perform a neighbor joining clustering to visualize the distances in niche space.
dendrogram.file <- paste(REPO_HOME, "/results/maxent/comparative/schoener/inverse_overlap_nj.tree", sep = "")
if (!file.exists(dendrogram.file)) {
# create a distance matrix, i.e. the inverse of the overlap
dmatrix <- as.dist(1 - overlap)
# plot an unrooted dendrogram
tree <- nj(dmatrix)
write.tree(tree, file = dendrogram.file)
} else {
tree <- read.tree(file = dendrogram.file)
}