MrHAMER2

Overview

MrHAMER2 is a pipeline for generating of high accuracy single molecule Nanopore reads. The pipeline accepts FASTQ-format sequence files as input and outputs a multi-FASTQ of corrected non-chimeric reads, aligned reads and QC stats.

This is a heavily customized and enhanced version of https://github.com/nanoporetech/pipeline-umi-amplicon from Oxford Nanopore Technologies.

The pre-print is available at https://www.biorxiv.org/content/10.1101/2024.12.19.629526

Features

The pipeline performs the following steps:

Reads are mapped to reference genome using minimap2
Separate into amplicons
Extract UMI sequences for all reads
Cluster UMI sequences per amplicon using vsearch and compute high accuracy consensus reads
Remove PCR-derived chimeras using Chimera_Buster (optional)

Getting Started

This pipeline has only been tested on Ubuntu 20.04.5 LTS.

Requirements

The following software packages must be installed prior to running:

miniconda3 - please refer to installation instructions.

Installation

After installing miniconda3, install the pipeline as follows:

git clone https://github.com/gallardo-seq/MrHAMER2

Change to directory:

cd MrHAMER2

Create conda environment with all dependencies:

conda env create -f environment.yml

Activate environment:

conda activate MrHAMER2

Install Auxiliary Python Scripts

cd lib && pip install . && cd ..

Copy modified Medaka files provided by MrHAMER2 to conda environment:

cp medaka_mod/* $CONDA_PREFIX/lib/python3.8/site-packages/medaka

Deactivate environment:

conda deactivate

Testing

To test if the installation was successful run:

conda activate MrHAMER2
snakemake -j 4 --configfile config.yml

Usage

In order for the pipeline to run, the following files/folders must be in the working directory:

File	Location in MrHAMER2 directory
Snakefile	./Snakefile
config.yml	./config.yml

Input

To run the pipeline the following input files are required:

Input	Description
Long-read sequences	Folder containing FASTQ files or a single concatenated FASTQ file.
Reference genome	FASTA file containing the reference genome.
Target BED file	A BED file containing the chromosome, start and end coordinate and the name of all amplicons.
Medaka Model file	A tar.gz file that provides the model for Medaka processing. NOTE: The medaka model file for R10.4 chemistry with e8.2 pores at 400 bps speed is provided. For more information on Medaka models, see medaka model info. If a different model is needed, please download from medaka model download.

Config File

Before starting the pipeline, confirm the following varibles are correct in the config.yml file:

Input	Description
sample_name	The name that the output folders/files will use.
input_fastq	The location of the long-read sequences fastq file.
reference_fasta	The location of the reference genome fasta file.
targets_bed	The location of the target BED file.
You can also adjust the optional parameters listed below:
Parameter	Description
-------	-------------
chimera_buster_on	If set to true, the Chimera_Buster module will be run at the end of the pipeline and will output a fastq file of nonchimeric corrected reads.
chimera_buster_MT	Designates the maximum number of mismatched/indel bases allowed when comparing UMIs. Changing this value will change the specificity and throughput of the Chimera_Buster module. Default is 1.
chimera_buster_HS	When set to true, chimeric reads are rechecked to account for any clustering issues earlier in the pipeline. WARNING: this part of the code is slow and is only recommended for low input samples. Default is False.
minimap2_param	This sets the parameters that Minimap2 will use throughout the pipeline. Please refer to the manual for Minimap2 linked above for more details on the availble parameters.
umi_errors	Designates the maximum differences between UMI in read and UMI pattern.
min_reads_per_cluster	Minimum number of reads required for a consensus read.
max_reads_per_cluster	Maximum number of 1D reads used for a consensus read.
min_overlap	Minimum overlap with target region.
balance_strands	If set to True, the pipeline inforces balancing of forward and reverse 1D reads in clusters.
medaka_model	Location of Medaka model used to compute consensus reads.
fwd_context	Sequence of tail on 5' end of the forward UMI primer.
rev_context	Sequence of tail on 5' end of the reverse UMI primer.
fwd_umi	Pattern of the UMI in the forward UMI primer.
rev_umi	Pattern of the UMI in the reverse UMI primer.
min_length	Minimum combined UMI length.
max_length	Maximum combined UMI length.

Running the pipeline

Once the config.yml is updated with the correct information and the MrHAMER2 environment is activated, simply run the following command:

snakemake -j 32 --configfile config.yml

-j specifies how many threads will be used by the pipeline.

Outputs

The main output file created by the pipeline is:

Output	Description
{sample_name}_{target}_final.fastq	A multi-fastq of all corrected non-chimeric reads.

After the a pipeline analysis has completed, these files can be found at {sample_name}/{sample_name}_{target}_final.fastq e.g. MrHAMER2_test/MrHAMER2_test_Env_final.fastq.

Help

For issues or bugs, please report them on the issues page.

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
lib		lib
medaka_mod		medaka_mod
test_files		test_files
LICENSE.txt		LICENSE.txt
MrHAMER2-logo.png		MrHAMER2-logo.png
README.md		README.md
Snakefile		Snakefile
config.yml		config.yml
environment.yml		environment.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

MrHAMER2

Overview

Features

Getting Started

Requirements

Installation

Testing

Usage

Input

Config File

Running the pipeline

Outputs

Help

License

About

Releases

Packages

Languages

License

gallardo-seq/MrHAMER2

Folders and files

Latest commit

History

Repository files navigation

MrHAMER2

Overview

Features

Getting Started

Requirements

Installation

Testing

Usage

Input

Config File

Running the pipeline

Outputs

Help

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages