Are we experiencing the sixth great extinction?
What is the current pace of extinction? Is it accelerating? How does it compare to background extinction rates?
- Accessing data from a RESTful API
- Error handling
- JSON data format
- Regular expressions
- Working with missing values
Getting started (based on live code session)
CURL and REST
httr package to make a single API query against the following endpoint:
Examine the response and the content of the response. Can you tell if the call was successful? What was the return type object? Can you parse the return into an R object? Can you represent the return data as a data.frame?
Working with Regular Expressions
One of the entries in the response contains a field that may contain some information on when the species went extinct. Identify the appropriate column and extract this information using a regular expression, as discussed in the live code exercise.
Calculating Extinction Rates: Putting it all together
First, to know what queries to make to the IUCN REST API, we need a list of extinct species names. This information can be downloaded from the IUCN website, but unfortunately this is not easily automated. Thus we’ll download the data file using a copy already prepared for the course:
extinct = read_csv("https://espm-157.github.io/extinction-module/extinct.csv")
Parsed with column specification: cols( .default = col_character(), `Species ID` = col_integer(), `Red List criteria version` = col_double(), `Year assessed` = col_integer() )
See spec(...) for full column specifications.
# A tibble: 834 x 23 `Species ID` Kingdom Phylum Class Order <int> <chr> <chr> <chr> <chr> 1 44072 PLANTAE TRACHEOPHYTA MAGNOLIOPSIDA ROSALES 2 195373 PLANTAE TRACHEOPHYTA MAGNOLIOPSIDA EUPHORBIALES 3 37854 PLANTAE TRACHEOPHYTA MAGNOLIOPSIDA EUPHORBIALES 4 199821 PLANTAE TRACHEOPHYTA MAGNOLIOPSIDA EUPHORBIALES 5 82 ANIMALIA ARTHROPODA INSECTA EPHEMEROPTERA 6 167 ANIMALIA MOLLUSCA GASTROPODA STYLOMMATOPHORA 7 170 ANIMALIA MOLLUSCA GASTROPODA STYLOMMATOPHORA 8 173 ANIMALIA MOLLUSCA GASTROPODA STYLOMMATOPHORA 9 174 ANIMALIA MOLLUSCA GASTROPODA STYLOMMATOPHORA 10 179 ANIMALIA MOLLUSCA GASTROPODA STYLOMMATOPHORA # ... with 824 more rows, and 18 more variables: Family <chr>, # Genus <chr>, Species <chr>, Authority <chr>, `Infraspecific # rank` <chr>, `Infraspecific name` <chr>, `Infraspecific # authority` <chr>, `Stock/subpopulation` <chr>, Synonyms <chr>, `Common # names (Eng)` <chr>, `Common names (Fre)` <chr>, `Common names # (Spa)` <chr>, `Red List status` <chr>, `Red List criteria` <chr>, `Red # List criteria version` <dbl>, `Year assessed` <int>, `Population # trend` <chr>, Petitioned <chr>
Write a function to extract the rationale for the extinction for all extinct species in the data set (see above file)
Test your function on a subset of the data before attempting the full data set. Use our
dplyr pipe syntax to iterate over your function.
Now create a function that can extract the date from the rationale, and include this function in your data analysis pipeline.
Histogram of Extinction Dates
We can get a sense for the tempo of extinctions by plotting extinctions since 1500 in 25-year interval bins.
Question 1: Extinctions by group
A. Compute the number of extinctions from 1500 - 1900 and from 1900 to present of each of the following taxonomic groups:
Compare your estimates to Table 1 of Ceballos et al (2015).
Question 2: Weighing by number of species
The number of species going extinct per century in a given taxonomic group will be influenced by how many species are present in the group to begin with. (For an obvious example, the number of vertebrate extinctions is always going to be higher than the number of mammal extinctions, since mammals are vertebrates). Overall, these numbers do not change greatly over a period of a few hundred years, so we were able to make the relative comparisons between the roughly pre-industrial and post-industrial periods above.
As discussed by Tony Barnosky in the introductory video (or in Ceballos et al (2015) paper), if we want to compare these extinction rates against the long-term palentological record, it is necessary to weigh the rates by the total number of species. That is, to compute the number of extinctions per million species per year (MSY; equivalently, the number extinctions per 10,000 species per 100 years).
- First, we will compute how many species are present in each of the taxonomic groups. To do so, we need a table that has not only extinct species, but all assessed species. We will once again query this information from the IUCN API.
This is going to involve a lot of data – more than the API can serve in a single chunk. Instead, the API breaks the returns up into groups of 10,000 species per page (see API docs: http://apiv3.iucnredlist.org/api/v3/docs#species). Luckily, the API also tells us the total number of species:
The code below queries the first page. How many pages will we need to get all the data? Modify the example below to collect all of the data into a single DataFrame. Note the use of
append to add data to an existing data.frame with matching column labels.
- Based on the complete data, write queries that count the number of species in each group. Then use these numbers to compute MSY, the number extinctions per 10,000 species per 100 years, for each of the groups listed in Question 1. How do your estimates compare to the overall historical average of about 2 MSY?
Question 3: Improving our algorithm
In parsing the data with regular expressions, we encountered certain data that resulted in missing values. Identify and investigate the strings for which we were not able to extract a date value.
- Why did the date extraction fail?
- Can you deduce an approximate date by examining the text?
- Can you modify the regular expression to reduce the number of missing values?
- How do these missing values impact our overall estimate of the extinction rate? (In which direction, and by approximately what amount?)
Question 4: Looking forward (bonus)
Plot the MSY rates in intervals of 50 years for each of the groups as a line plot (compare to Figure 1a of Ceballos et al (2015) paper). Compute the slope of these curves to forecast the extinction rate in 2100.