Day 1 - Bacterial Genome Analysis
Raw reads QC
Check the quality with
FastQC
Navigate to
bga
directory and activateqc
envOpen FastQC GUI. Analyze and save the reports
Check the Basic statistics, Per base quality, Sequence length distribution, Overrepresented sequences and Adapter content sections
Make a directory
1_raw_qc
and save results to it
BBDuk
Copied adapters.fa from
~/mambaforge/envs/qc/opt/bbmap-39.01-0/resources/adapters.fa
to current directory as bb_adapters.fa
Run
bbduk.sh
(qc)$ mkdir 2_bb_out (qc)$ bbduk.sh -Xmx4g in1=raw_reads/hpy_a45_1.fastq.gz \ in2=raw_reads/hpy_a45_2.fastq.gz \ out1=2_bb_out/hpy_a45_1.fastq.gz \ out2=2_bb_out/hpy_a45_2.fastq.gz \ ref=bb_adapters.fa \ k=23 mink=7 ktrim=r hdist=1 qtrim=r trimq=20 minlen=100 tpe tbo
Expected result
Input: 1741880 reads 436749684 bases. QTrimmed: 1521272 reads (87.34%) 103531426 bases (23.70%) KTrimmed: 394948 reads (22.67%) 13236420 bases (3.03%) Trimmed by overlap: 8601 reads (0.49%) 88509 bases (0.02%) Total Removed: 170588 reads (9.79%) 116856355 bases (26.76%) Result: 1571292 reads (90.21%) 319893329 bases (73.24%)
Open FastQC GUI. Analyze and save the reports
Trimmomatic (Alternative)
Run
trimmomatic
. You can use the same adapters file(qc)$ mkdir trim_out (qc)$ cd trim_out (qc)$ trimmomatic PE -phred33 \ ../raw_reads/hpy_a45_1.fastq.gz \ ../raw_reads/hpy_a45_2.fastq.gz \ hpy_a45_1_paired.fq.gz hpy_a45_1_unpaired.fq.gz \ hpy_a45_2_paired.fq.gz hpy_a45_2_unpaired.fq.gz \ ILLUMINACLIP:../bb_adapters.fa:2:30:10 \ SLIDINGWINDOW:4:20 MINLEN:100
Expected result
Input Read Pairs: 870940 Both Surviving: 5 99798 (68.87%) Forward Only Surviving: 160897 (18.47%) Reverse Only Surviving: 28249 (3.24%) Dropped: 81996 (9.41%)
Open FastQC GUI. Analyze and save the reports
After filtering the raw reads, you can choose either of the following methods depending on the availability of a reference genome or intra-species variations.
Basically, if you have a reference genome and do not expect much variation from it, then the reads can be mapped to the reference. Else, de novo assembly is preferred.
Mapping to a Reference Genome
Index the reference genome sequence
Align Reads separately
Create SAM file and convert it to BAM
Sort and index BAM file
Creating a consensus
Convert to fasta & change the identifier
Exporting unmapped reads (Optional)
Check the rRNA Genes (Optional)
(bam2fastq)$ mamba activate annotator (annotator)$ barrnap -o cons_rrna.fa < hpy_a45_consensus.fa > cons_rrna.gff
Pre-requisites | QC & mapping | de novo | Ampliseq
````