Bioinformatics pipeline for Wind is a primary driver of fungal dispersal across a mainland-island system

Furneaux, Brendan; Naranjo-Orrico, Domenica; Ovaskainen, Otso; Purhonen, Jenna; Arancibia, Paulina; Burg, Skylar; Moser, Niklas; Niku, Jenni; Tikhonov, Gleb; Zakharov, Evgeny; Monkhouse, Norman; Abrego, Nerea

Bioinformatics pipeline for Wind is a primary driver of fungal dispersal across a mainland-island system

2024-07-09, 2024-07-09

Furneaux, Brendan

Naranjo-Orrico, Domenica

workflow
workflow

Open

Bioinformatics pipeline for "Wind is a primary driver of fungal dispersal across a mainland-island system" This information can be found in the README.md file inside the folder================ This pipeline is a pre-release version of the OptimOTU pipeline, which isimplemented in R using the [{targets}](https://books.ropensci.org/targets/)pipeline tool.It is probably difficult to run it outside of the [CSC](https://www.csc.fi)Puhti computing cluster where it was developed and executed for this project,and it is included primarily for reference purposes to accompany the paper. In principle the steps to execute would be: 1)  Unzip the archive (if you are reading this, you have probably    already accomplished this step.) ```shunzip konnevesi_pipeline.zipcd konnevesi``` 2)  Install all prerequisites listed in `conda/GSSP.yaml`, either using    [conda](https://conda.io) or by other means. If using conda,    activate the conda environment. ```shconda env create -f GSSP/conda/GSSP.yamlconda activate GSSP``` 3)  Download ProtaxFungi and unzip it in the main konnevesi directory (or a    sister directory). ```shtar -xzf protaxFungi.tar.gz``` 4)  Download the raw sequence files from ENA project    [PRJEB76596](https://www.ebi.ac.uk/ena/browser/view/PRJEB76596) into    `sequences/01_raw/KN_22`. 5)  Download [USEARCH](https://www.drive5.com/usearch/) version 11.0.667    and unzip in `bin/`. ```shwget -nd -P bin/ https://www.drive5.com/downloads/usearch11.0.667_i86linux32.gzgunzip bin/usearch11.0.667_i86linux32.gzchmod +x bin/usearch11.0.667_i86linux32``` 6)  download [reference data for the Unite sh_matching    pipeline](https://files.plutof.ut.ee/public/orig/9C/FD/9CFD7C58956E5331F1497853359E874DEB639B17B04DB264C8828D04FA964A8F.zip)    and unzip in `data/sh_matching_data`. ```shwget https://files.plutof.ut.ee/public/orig/9C/FD/9CFD7C58956E5331F1497853359E874DEB639B17B04DB264C8828D04FA964A8F.zipunzip -j 9CFD7C58956E5331F1497853359E874DEB639B17B04DB264C8828D04FA964A8F.zip data/shs_out.txt data/sanger_refs_sh.fasta -d data/sh_matching_datarm 9CFD7C58956E5331F1497853359E874DEB639B17B04DB264C8828D04FA964A8F.zip``` 6)  In R (for execution on a single computer): ```rlibrary(targets)tar_make()``` Alternatively, `sbatch run_node.sh` submits the pipeline as a single jobusing 8 cores to a SLURM cluster. `sbatch run_clustermq.sh` and`sbatch run_crew.sh` submit a “master” job using a single core, whichwill itself submit additional "worker" jobs using 8 cores each. Theclustermq backend submits a fixed number of workers as an array job,while the crew backend dynamically submits workers depending on thepipeline's needs.  Adapting these to run on a different cluster wouldinvolve modifying `run_node.sh`, `run_clustermq.sh`, and/or `run_crew.sh`,as well as potentially loading environment modules, a conda environment,etc. For ClusterMQ or crew execution, a new template like`slurm/puhti_clustermq.tmpl` and `slurm/puhti_crew.tmpl` would also berequired, as well as modifications to `run_clustermq.R` or `run_crew.R`.

https://doi.org/10.5281/zenodo.12697559