Genomics_WS2023

Installing Linux

You may choose either to dual boot your existing system or install a virtual box

  • Dual booting Windows and Linux - Tutorial - Bit complicated for beginners, but worth the hassle
  • Installing Ubuntu on virtual box - Video - This will be slow. Do not run memory intensive programs.

If you want to install Anaconda instead of Mambaforge, the above video has instructions.

For Mambaforge installation proceed with this document.

Mambaforge

An alternative for Anaconda

  • Download and run install_mamba.sh script

    # Change permissions of the `.sh` file, if required  
    $ chmod +x install_mamba.sh
    # Run the file
    $ ./install_mamba.sh
  • This will download Mambaforge script, install and add .condarc to the home directory

  • Follow the on-screen instructions during installation

  • Restart the terminal

Check the installation

$ mamba env list

Installing Packages

We will be installing 21 packages along with their dependencies in 16 environments

  • Download install_chk.txt - A list of envs to check from, during installation
  • Download install_tools.sh
  • Move both the files to home directory and run install_tools.sh

If you want to install the packages manually, open the install_tools.sh file with a text editor (like gedit) and see the corresponding commands for creating each environment and installing associated tools. Say for example, if you wnt to install fastqc and bbduk in qc environment you have to run the following command…

$ mamba create -n qc -c bioconda fastqc bbmap -y

Or you can create the environment first and install the tools later

$ mamba create -n qc
$ mamba activate qc
# This will activate qc environment
(qc)$ mamba install -c bioconda fastqc bbmap

You can search for the available packages at this site

Remember to use mamba instead of conda, if you have installed mambaforge.

CheckM installation takes more time, it will be done separately.

  • Open another terminal window
  • Download install_checkm.sh
  • Move install_checkm.sh to home directory and run it
  • Do not run CheckM, unless you have at least 40GB of SWAP

Check the installations


Other optional Packages

$ sudo apt install rename
$ sudo apt install pigz

Datasets

Genomes

  1. Helicobacter pylori

  2. Lactobacillus apis

Amplicons

  1. UMV
  2. Indian

Directory structure

Download ws_jul2023.zip and extract to your home directory

ws_jul2023
|- ampliseq
   |- raw_reads
   |- manifest_1
   |- manifest_2
   |- metadata_1.tsv
   |- metadata_all.tsv
   |- silva138_AB_V4_classifier.qza
|- bga
   |- raw_reads
   |- ref_genome
   |- bb_adapters.fa

Basic Linux Commands

Try this document or search online for Bash Tutorials



Pre-requisites | QC & mapping | de novo | Ampliseq

Scripts made by: Abhishek Khatri
Document prepared by: Anwesh Maile