RUNNING TAPIRS

install and setup before running

These instructions are to run Tapirs after installation and set up have been carried out.

DRY RUN TAPIRS

Make sure you are in the top-level directory containing the snakefile then type snakemake -npr or snakemake -s snakefile --printshellcmds -n -k to dry-run the workflow.

If all has gone well Snakemake will report the jobs it needs to perform without any complaint. If not (as is common in most experiments) you will need to diagnose and fix any minor issues. Reading the problem-solving documentation might help. Some errors are only detected in the real run, not the dry run, and they often concern the format of data files, as these have not been checked by a dry run.

RUN TAPIRS

Run Tapirs with either the snakemake --printshellcmds --cores 4 command. Cores can be set to any number depending on your machine, and even basic machines should have 4.

Tapirs should now run, processing the data, assigning taxonomy using blast and kraken2, and writing reports.

When it finishes you should also ask it to write a report with the command

snakemake --report reports/snakemake_report.html

EXCLUDE ANALYSES

If you wish to run Tapirs without invoking one of analysis programs (eg Kraken2 or blast) then you can specify this in the config file.

REMOVING FILES FROM PREVIOUS RUNS

Snakemake can clean up files it as previously created. This is useful if you have reports and intermediate results from previous runs that you wish to remove before a new run. The Snakemake docs have a FAQ on cleaning files, in short though try snakemake some_target --delete-all-output --dry-run The--dry-run flag checks what will be removed before you do it, when it looks fine rerun without --dry-run.

We highly recommend performing a --dry-run as --delete-all-output is as dangerous to your results as it sounds.

REFERENCES

Altschul, S. F. et al. (1990) ‘Basic local alignment search tool’, Journal of molecular biology, 215(3), pp. 403–410. doi: 10.1016/S0022-2836(05)80360-2

Chen, S. et al. (2018) ‘fastp: an ultra-fast all-in-one FASTQ preprocessor’, Bioinformatics . Oxford University Press, 34(17), pp. i884–i890. doi: 10.1093/bioinformatics/bty560

Daniel McDonald, Jose C. Clemente, Justin Kuczynski, Jai Ram Rideout, Jesse Stombaugh, Doug Wendel, Andreas Wilke, Susan Huse, John Hufnagle, Folker Meyer, Rob Knight, and J. Gregory Caporaso. The Biological Observation Matrix (BIOM) format or: how I learned to stop worrying and love the ome-ome. GigaScience 2012, 1:7. doi:10.1186/2047-217X-1-7

Ondov, B. D., Bergman, N. H. and Phillippy, A. M. (2011) ‘Interactive metagenomic visualization in a Web browser’, BMC bioinformatics, 12, p. 385. doi: 10.1186/1471-2105-12-385

Rognes, T. et al. (2016) ‘VSEARCH: a versatile open source tool for metagenomics’, PeerJ, 4, p. e2584. [doi: 10.7717/peerj.2584]

Wood, D. E., Lu, J. and Langmead, B. (2019) ‘Improved metagenomic analysis with Kraken 2’, Genome biology, 20(1), p. 257. doi: 10.1186/s13059-019-1891-0