Background

The Netherlands Biodiversity API

The Netherlands Biodiversity API (NBA) facilitates access to the Natural History Collection at Naturalis Biodiversity Center. Next to museum specimen records and metadata, access to taxonomic classification and nomenclature, to geographical information, and to multimedia files is provided. By using the powerful Elasticsearch engine, the NBA facilitates searching for collection- and biodiversity data in near real-time. Furthermore, by incorporating information from taxonomic databases, taxonomic name resolution can be accomplished with the NBA. Persistent Uniform Resource Identifiers (PURLs) ensure that each specimen accessible via the NBA is represented by a citeable unambiguous web reference. Access to our data is provided via a RESTful interface and several clients such as the BioPortal, a web application for browsing biodiversity data that is served by the NBA. For more information about the NBA, please see our detailed documentation.

R access

The R programming language is established as a common tool in scientific research, with growing adoption by researchers in biodiversity research. Hence, to ease the access to the NBA for researchers, we developed this R client.

Full client vs wrapper functions

nbaR aims to be a full client of the NBA API, meaning that it implements all endpoints and the entire NBA object model. The client thus facilitates all API queries possible. Complex objects returned by the API, such as Specimen or Taxon objects are implemented as R6 classes. This includes also objects used for querying (QuerySpec and QueryCondition, respectively, see also here).

For for many queries, the full functionality of the NBA won’t be required. The package therefore offers a wrapper function for each endpoint that does not use R6 classes but common R data structes, such as list and data.frame. However, querying capabilities are limited for these wrappers. Below we will show how to set up some simple queries using the wrapper functions.

Quick start: querying the NBA using wrapper functions

The data in the NBA consists of four main data types (see NBA docs):

  • Specimen
  • Taxon
  • Multimedia
  • Geo

Wrapper functions start with the data type (lower-case letter) and an underscore (specimen_*, taxon_*) etc. There is a wrapper function for each endpoint (see here for all endpoints); camelCase naming is replaced by snake_case. The NBA endpoint getDistinctValues for specimen data, for instance, is called by the function specimen_get_distinct_values.

Specimen services provide the interface to the Naturalis collection and to species occurrences (see here), wheras Taxon services provide data from taxonomic checklists (see here). Multimedia services give access to photos, videos and sound data (see here); Geo services store polygon data for geographical regions and nature reserves (see here).

Querying specimen records

Suppose we want to look up specimens of the genus Mola (sunfish). To find out what field of the NBA we could query, we can use the function specimen_get_paths() (see ?specimen_get_paths for documentation).

## [1] "sourceSystem.code" "sourceSystem.name" "sourceSystemId"   
## [4] "recordURI"         "unitID"            "unitGUID"

Note that paths of nested objects are seperated via a .. To search for a specific genus, we can query the field identifications.scientificName.genusOrMonomial. The specimen_query method lets us query for a specific field, where the query parameters are given as a named list (a named vector also works!):

## [1] 4
##  [1] "sourceSystem"             "sourceSystemId"          
##  [3] "id"                       "unitID"                  
##  [5] "unitGUID"                 "sourceInstitutionID"     
##  [7] "sourceID"                 "owner"                   
##  [9] "licenseType"              "license"                 
## [11] "recordBasis"              "kindOfUnit"              
## [13] "collectionType"           "preparationType"         
## [15] "fromCaptivity"            "objectPublic"            
## [17] "multiMediaPublic"         "identifications"         
## [19] "title"                    "numberOfSpecimen"        
## [21] "gatheringEvent"           "associatedMultiMediaUris"
## [23] "theme"

Return type can either be list or data.frame (the default). Note that nested structures in the data frame are represented as list columns (for instance the field associatedMultiMediaUris). which lists, if given, all links to multimedia resources for the specimens:

## [[1]]
## NULL
## 
## [[2]]
##                                                        accessUri
## 1 http://medialib.naturalis.nl/file/id/RMNH.ART.175/format/large
##       format        variant
## 1 image/jpeg MEDIUM_QUALITY
## 
## [[3]]
## NULL
## 
## [[4]]
##                                                                   accessUri
## 1 http://medialib.naturalis.nl/file/id/RMNH.PISC.D.2059_HL_1_3/format/large
## 2   http://medialib.naturalis.nl/file/id/RMNH.PISC.D.2059_HL_1/format/large
## 3   http://medialib.naturalis.nl/file/id/RMNH.PISC.D.2059_HL_2/format/large
##       format        variant
## 1 image/jpeg MEDIUM_QUALITY
## 2 image/jpeg MEDIUM_QUALITY
## 3 image/jpeg MEDIUM_QUALITY

Querying taxon records

Taxonomic information can be retrieved using the taxon_ functions. Taxon records come from two sources, the Dutch species register (Nederlands Soortregister, NSR) and the Catalogue of Life (COL).

To see how many records are from each source, we can query for all distinct values (and counts) for a specific field (see taxon_get_paths) for all fields in the taxon data:

taxon_get_distinct_values("sourceSystem.name")
## $`Species 2000 - Catalogue Of Life`
## [1] 1962192
## 
## $`Naturalis - Dutch Species Register`
## [1] 47646
## $COL
## [1] 1962192
## 
## $NSR
## [1] 47646

To query, for instance all the species listed in the Catalogue of life for the genus Mola, we can use the wrapper function taxon_query:

## [1] "mola"    "tecta"   "ramsayi"

Let’s see if we can find vernacular (common) names for the species Mola ramsayi:

##                   name         language
## 1           拉氏翻车鲀 Mandarin Chinese
## 2 Australisk klumpfisk          Swedish
## 3     Suidelike sonvis        Afrikaans
## 4     Southern sunfish          English
## 5           拉氏翻車魨 Mandarin Chinese
## 6        Lõuna-kuukala         Estonian

Geo queries

The Geo data type in the NBA holds polygon data for countries, Dutch municipalities etc, and Dutch nature reserves. For more information please refer to the API documentation. To retreive e.g. a polygon, encoded in the geoJSON format for a country, we can query as follows:

Multimedia queries

Multimedia items accessible via the NBA include items captured from physical specimens (e.g. photos and videos) but also from human observations (e.g. recordings of bird sounds).

As an example, we will retrieve records that represent sounds that were recorded in the country Cape Verde. The sound data accessible via the NBA is stored in the Xeno-Canto database, hosted at the Naturalis Biodiversity Center. The field sourceSystem.code for these records is XC; the country of occurrence is stored in the field gatheringEvents.country.

##  [1] "http://data.biodiversitydata.nl/xeno-canto/observation/XC405139"
##  [2] "http://data.biodiversitydata.nl/xeno-canto/observation/XC156912"
##  [3] "http://data.biodiversitydata.nl/xeno-canto/observation/XC156924"
##  [4] "http://data.biodiversitydata.nl/xeno-canto/observation/XC405556"
##  [5] "http://data.biodiversitydata.nl/xeno-canto/observation/XC156923"
##  [6] "http://data.biodiversitydata.nl/xeno-canto/observation/XC405139"
##  [7] "http://data.biodiversitydata.nl/xeno-canto/observation/XC405560"
##  [8] "http://data.biodiversitydata.nl/xeno-canto/observation/XC405566"
##  [9] "http://data.biodiversitydata.nl/xeno-canto/observation/XC405560"
## [10] "http://data.biodiversitydata.nl/xeno-canto/observation/XC405141"

Limitations of wrapper functions

It is important to note that querying power is limited using the wrapper functions. They relate to basic, human readable NBA queries (see here).

  • Size of result set: As by NBA default, wrapper functions only return the first 10 hits of a query.
  • Operators: Only full matches (operator EQUALS) are considered in wrapper query functions. Partial matching is only available in the full API client
  • Locical conjunctions: If multiple query conditions are given, wrapper functions only allow a simple AND conjunction. For more complex logical query constructs including OR operators or negations, the full API client must be used.

The wrappers are thus designed for easy access for simple queries. In many situations it might be necessary to use the full API client which offers (almost) the entire functionality of the NBA API. Detailed documentation for the full client can be found here.