Code and data associated with: Searching the web builds fuller picture of arachnid trade

No Thumbnail Available

Restricted Availability

Date

2021-12-04, 2021-12-04

Persistent identifier of the Data Catalogue metadata

Editor

Journal title

Journal volume

Publisher

Publication Type

dataset
dataset

Peer Review Status

Repositories

Access rights

Open

ISBN

ISSN

Description

Data and code used in the paper: Searching the web builds fuller picture of arachnid trade. Throughout the methods we have indicated the stage of analysis each data component was used and the code script connected. We have numbered to code and data supplements to reflect as closely as possible the order in which data generation and summary was undertaken. The following provide additional details linked to each of the data files. Data S1 - Website data: lang = language of the search engine used, ad hoc websites had language described after discovery; engine = the search engine used; page = the page on which the website appeared from the search engine; searchdate = search date in YYYY-mm-dd HH:MM:SS; link = link to the webpage, redacted to protect website identity; reviewdate = date revewied for arachnids being sold and search strategy; sells = whether the website sells arachnids (1 == sells); allow = whether the site explcicilt forbids automated searching (1 == allows, NA when search method was not fully automated, e.g., single page); type = the type of the website (e.g., trade, classified ads); order = whether arachnids where organised in a particular ways; target = a refined target URL to start search; method = the search method chosen, see methods for details; refine = any refinement or filter than could constrain the scope of the website to be searched; spages = the number of pages required to cycle through to cover the entire stock (also separated by ; if multiple cycles where needed or multiple single pages could be easily collected); prelimCheck = whether the website passed initial checks for arachnid selling; notes = any details that might need special attention during searches; webID = code used for subsequent data summary. Data S2 - Raw keyword searches outputs: species keywords. sp = the modern species or genus that a keyword is associated with; page = the number of the page the keyword was detected on; keyw = the exact keyword that was detected; spORgen = whether the keyword was a species binomial or just genus; termsSurrounding = the words surrounding a genus keyword detection (only applies to Data S3); webID = the website ID. Data S3 – Raw keyword searches outputs: genus keywords. sp = the modern species or genus that a keyword is associated with; page = the number of the page the keyword was detected on; keyw = the exact keyword that was detected; spORgen = whether the keyword was a species binomial or just genus; termsSurrounding = the words surrounding a genus keyword detection (multiple detections separated by ;); webID = the website ID. Data S4 - Raw keyword search outputs: temporal sample. sp = the modern species or genus that a keyword is associated with; page = the number of the page the keyword was detected on; keyw = the exact keyword that was detected; spORgen = whether the keyword was a species binomial or just genus; termsSurrounding = the words surrounding a genus keyword detection (multiple detections separated by ;); webID = the website ID; timestamp.parse = the timestamp extracted from the archived web page; year = a simplified timestamp including only the year. Data S5 - LEMIS data used. An arachnid filtered version of 74,75. Data S6 - CITES trade database data used 76. Data S7 - CITES appendices data used 77. Data S8 - IUCN Redlist data used 78. Data S9 - Compiled final dataset, with data deriving from WSC, Scorpion files, ITIS, WAM and the data collection process. speciesId = a numeric code, one per species; clade = the clade the species belongs to; family = the family the species belongs to; genus = the genus of the species; species = the species epithet; author = the species authority name; year = the species authority year; parentheses = whether parentheses are needed with the authority; distribution = WSC original distribution descriptions; invalid = whether the species is considered valid; source = the species source, either World Spider Catalogue, Scorpion files, ITIS or WAM; accName = the species binomial being used as our accepted name; allNames = the accepted species binomial and all synonyms; allGenera = the accepted genus, and all other genera the species has belonged to at one point; onlineTradeSnap = whether the species was detected via a match to the accName in the snapshot data; onlineTradeSnap_Any = whether the species was detected via any synonym in the snapshot data; onlineTradeSnap_genus = whether the genus was detected via a match to the genus in the snapshot data; onlineTradeSnap_genusAny = whether the genus was detected via any synonym in the snapshot data; onlineTradeTemp = whether the species was detected via a match to the accName in the temporal data; onlineTradeTemp_Any = whether the species was detected via any synonym in the temporal data; onlineTradeTemp_genus = whether the genus was detected via a match to the genus in the temporal data; onlineTradeTemp_genusAny = whether the genus was detected via any synonym in the temporal data; onlineTradeEither = whether the species was detected via a match to the accName in the temporal data or snapshot data; onlineTradeEither_Any = whether the species was detected via any synonym in the temporal data or snapshot data; LEMIStrade = whether the species was detected via a match to the accName in the LEMIS data; LEMIStrade_Any = whether the species was detected via any synonym in the LEMIS data; LEMIStrade_genus = whether the genus was detected via any synonym in the LEMIS data; LEMIStrade_genusAny = whether the genus was detected via any synonym in the LEMIS data; CITEStrade = whether the species was detected via a match to the accName in the CITES trade database data; CITEStrade_Any = whether the species was detected via any synonym in the CITES trade database data; CITEStrade_genus = whether the genus was detected via any synonym in the CITES trade database data; CITEStrade_genusAny = whether the genus was detected via any synonym in the CITES trade database data; CITESapp = the CITES appendix the species is listed under using an exact match to the accName; CITESapp_Any = the CITES appendix the species is listed under using any match to any of the species’ synonyms; redlist = the IUCN Redlist category the species is listed under using an exact match to the accName; redlist_Any = the IUCN Redlist category the species is listed under using any match to any of the species’ synonyms; extactMatchTraded = the species is detected in any of the trade sources via a match to the accName; anyMatchTraded = the species is detected in any of the trade sources via a match to any species’ synonym. Data S10 - Forum listings of “What species are you currently keeping” from an online fora posted between 9th September 2021 and 9th October 2021, to provide an idea of online discussions. Each user with a separate list is provided in a separate tab. Morph_collector is the same as poster1, but the potential cryptic species or morphs are noted separately to make them clearer. Data S11 – Distribution information for spiders. Only two columns used in summaries: accName = the accepted name used throughout summaries; NAME = the country name the spider occurs in. Data S12 - Distribution information for scorpions. species = the accepted name used throughout summaries; NAME = the country name the scorpions occurs in. Code S1 - Search URL Extract.R Code S2 - Retrieve web data.R Code S3 - Temporal Classified Ads.R Code S4 - Keyword Generation.R Code S5 - Keyword Search.R Code S6 - LEMIS filter and summary.R Code S7 - Compiling results.R Code S8 - Summary Figures.R Code S9 - Temporal Figures.R Code S10 - New description figure.R Code S11 - Term exploration.R Code S12 - LEMIS summary and mapping.R

Keyword (yso)

Publication Series

Journal title

Location of the original dataset