The following slide show provides an overview of species concepts and the application of species delimitation techniques to natural history collection specimens:
Species delimitation - species limits and character evolution
Ratnasingham, S & Hebert, PDN 2013. A DNA-Based Registry for All Animal Species: The Barcode Index Number (BIN) System. PLoS ONE 8(7): e66213 doi:10.1371/journal.pone.0066213 (pdf)
Correspondence between species present in eight datasets and OTUs recognized through single linkage clustering with sequence divergence thresholds ranging from 0.1–6.0%.
Puillandre N, Lambert A, Brouillet S & Achaz G 2012. ABGD, Automatic Barcode Gap Discovery for primary species delimitation. Mol Ecol. 21(8): 1864-77 doi:10.1111/j.1365-294X.2011.05239.x (pdf)
$ curl -L -O http://wwwabi.snv.jussieu.fr/public/abgd/last.tgz
$ tar xzvf last.tgz
$ cd Abgd
$ make
$ export PATH="${PATH}":`pwd`
We should now have an executable called abgd
on the $PATH. This accepts
aligned FASTA as input, so let’s analyze one of the files we have:
# inside w1d3 folder
$ mkdir Danaus_ABGD
$ abgd -o Danaus_ABGD -a Danaus.mafft.fas
Resulting files, showing the barcode gap inflection point:
Fujisawa T & Barraclough TG. 2013. Delimiting Species Using Single-Locus Data and the Generalized Mixed Yule Coalescent Approach: A Revised Method and Evaluation on Simulated Data Sets Systematic Biology 62(5): 707–724 doi:10.1093/sysbio/syt033 (pdf)
The analysis can be performed through a web service, and results for the Danaus consensus tree in the following clusters:
Which are distributed across the clades near the tips:
How many (using line count, wc -l
)
distinct taxonomic names do we have in the alignment:
$ grep '>' Danaus.mafft.fas | cut -f 1 -d '-' | sort | uniq | wc -l
15
J Zhang, P Kapli, P Pavlidis, A Stamatakis 2013. A general species delimitation method with applications to phylogenetic placements. Bioinformatics 29(22): 2869–2876 doi:10.1093/bioinformatics/btt499
Using the same tree as for GMYC on the bPTP web server obtains an MLE of 16 species with potential for (far) greater splitting:
Accptance rate: 0.69020000000000004
Merge: 49798
Split: 50202
Estimated number of species is between 14 and 135
Mean: 78.03
Mutanen, M et al. 2016. Species-Level Para- and Polyphyly in DNA Barcode Gene Trees: Strong Operational Bias in European Lepidoptera, Systematic Biology 65(6): 1024–1040 doi:10.1093/sysbio/syw044
So how are the putative species from BoLD actually entangled?
For each taxon:
$ curl \
-F "infile=@BEAST/Danaus.consensus.trees.nwk" \
-F "format=newick" \
-F "separator=-" \
-F "astsv=true" \
-F "cgi=true" \
http://monophylizer.naturalis.nl/cgi-bin/monophylizer.pl > Danaus.monophyly.tsv
Which produces a spreadsheet that identifies the exact matches
(i.e. monophyletic
) and where there is entanglement among species (i.e. paraphyletic
or polyphyletic
).