INSTALLATION
Note
These documents assume a unix system like OSX or Linux, on MS Windows it may be best to work in a Linux Virtual Machine
Although you can install and run Tapirs without too many steps you will need some very basic knowledge of the command line. A basic knowledge of Snakemake will help you to modify and configure Tapirs. Snakemake is a relatively easy workflow manager, but we recommend that you familiarise yourself with it, perhaps carry out the tutorial.
Install miniconda
Conda (miniconda, Anaconda) is a package and environment manager and is required here to install software and their dependencies. If you choose to install Anaconda (instead of miniconda) it will also install a lot of scientific software packages. Miniconda is much more lightweight and is our recommended option.
Follow the installation instructions for Miniconda for your operating system.
- If unsure on OSX choose "Miniconda3 MacOSX 64-bit pkg" (or similar name ending in pkg) as this gives a typical Apple install gui.
git clone Tapirs
Apple OSX and Linux should both come with git already installed. At the command line type git --version and you should see the version number. If instead it reports command not found: git or similar then it is not installed. You can go to the git website to get installation advice or slightly easier might be to try conda install git at the command line.
If you have git installed then clone the repository to your local machine with git clone https://github.com/EvoHull/Tapirs.git
You could alternatively download the repository from the Tapirs github repository using the green "Clone or download" button. Then expand the zip file and navigate into the directory.
Create a working environment
It is best practice to install the software you require for a specific software project (eg Tapirs) in a dedicated 'environment'. This environment will contain only the software you choose for this project and hopefully avoid software conflicts. Conda is the tool for creating and using software environments.
When snakemake runs it can be told to create separate environments to optimally run each rule (a "sequence-quality-control" environment, a "blast" environment, a "kraken2" environment). Currently however the software requirements for Tapirs are relatively straightforward and we have everything in a single "tapirs" environment. The software required in this environment is specified in the workflow/envs/env.yaml file (we also have an alternative tapirs.yaml for development). Clearly specifying software in this way is an important component of reproducibility.
The first time you run Tapirs it can take a while (>10 minutes) to download software and create the environment, subsequent runs will not require this step. We recommend that you create and populate this environment before you start running Tapirs.
Install all software from the environment.yaml list
Although you can install software packages one at a time it is not very efficient. Instead we have created a list of the required software in a text file in the workflow/envs/ directory called env.yaml. Conda can be told to create or update your environment by reading all the software packages this file.
Install all required software now:
conda env create --file workflows/envs/env.yaml
This will take a few minutes to install all the required software and their dpendencies. If in future you modify the environment file you can update it with:
conda env update -f workflows/envs/env.yaml
Accept the defaults (Yes) of any install questions you afre asked.
You will need to make sure that the 'tapirs' environment is active, or else the required software will not be available to be run by snakemake.
conda activate tapirs
If you are unsure what environments you have, and which is active, you can run:
conda info --envs
If you get errors when running Tapirs suggesting that some "software-name" is unknown it is most likely an issue with environments, start by checking that the "tapirs" environment is active.
The Snakemake workflow manager software was listed in the env.yaml file and has already been installed if you have carried out the instructions above. You could test this with a snakemake -help command. If you get an error such as command not found: snakemake its likely that the tapirs environment is not active, try: conda activate tapirs
Testing the Installation
You should now have installed all the software required for your analysis. You will also require some data and reference databases. Below we give instructions of running Tapirs with test data and databases in oerder to verify the installation. Subsequently you will remove the outputs of the test and repeat with your own data and databases.
Install the taxonomy data
The three commands below will download and expand the taxonomy data
wget ftp://ftp.ncbi.nih.gov/pub/taxonomy/new_taxdump/new_taxdump.zip
unzip new_taxdump.zip -d resources/databases/new_taxdump
rm new_taxdump.zip
Dry run Tapirs
We have provided test data
- a list of samples /config/hull_test.tsv
- blast and kraken2 databases in resources/databases/
- sequence data in resources/libraries
Dry run Tapirs with:
snakemake -npr
If you don't get any obvious errors but rather a (yellow) list of jobs to be carried out then you should be ready to go.
snakemake --cores 4
You can omit the number to run it on all availabale cores. This will begin the analysis of the test data. It will only take a couple of minutes to complete. Again examine the output for obvious errors (usually in red text) but everything is likely fine if it completes with something like:
Finished job 0.
29 of 29 steps (100%) done
remove test output and prepare to run your own data
The output from any snakemake run can be cleaned up with --delete-alloutput but we recommend a dry run first to see what it will delete
snakemake --delete-all-output -n
It should list files in the results directory, and you can delete them all with
snakemake --delete-all-output
Instructions of what files are required are provided on the Tapirs setup page.