metamage is a workflow for taxonomic classification, assembly, binning and annotation of short-read host-associated metagenomics datasets.

    graph TD;
        reads[(Short-read paired-end metagenomics data)]-->hostread(Trimming and host read removal)
        hostread-->|Reads| tax(Taxonomic classification with Kaiju)
        hostread-->|Reads| assem(Assembly with MEGAHIT)
        assem-.->|Assembled contigs| metaq(MetaQuast evaluation)
        assem-->|Assembled contigs| func(Functional annotation)
        assem-->|Assembled contigs| binprep(Binning preparation)
        hostread-->|Reads| binprep
        assem-->|Assembled contigs| bin(Binning with MetaBAT2)
        binprep-->|Depth file| bin

It's composed of:

Read pre-processing and host read removal

  • fastp for read trimming and other general pre-processing 1
  • BowTie2 for mapping to the host genome and extracting unaligned reads 2


Functional annotation

  • Macrel for predicting Antimicrobial Peptide (AMP)-like sequences from contigs 4
  • fARGene for identifying Antimicrobial Resistance Genes (ARGs) from contigs 5
  • Gecco for predicting biosynthetic gene clusters (BCGs) from contigs 6
  • Prodigal for protein-coding gene prediction from contigs. 7


Taxonomic classification of reads

  • Kaiju for taxonomic classification 10
  • KronaTools for visualizing taxonomic classification results

Output tree

  • |metamage
    • |{sample_name}
      • |{sample_name}_bt_idx - Host genome BowTie index
      • |{sample_name}_bt_unaligned - Reads that didn't align to the host genome
      • |fastp_results - Results from trimming with fastp
      • |kaiju
      • |MEGAHIT
      • |MetaQuast - Assembly evaluation report
      • |{sample_name}_assembly_idx - BowTie Index from assembly data
      • |{sample_name}_assembly_sorted.bam - Reads aligned to assembly contigs
      • |METABAT
      • |fargene_results
      • |gecco_results
      • |macrel_results
      • |prodigal_results

Where to get the data?

  • Kaiju indexes can be generated based on a reference database but you can also find some pre-built ones in the sidebar of the Kaiju website.

  • Reference host genomes can be acquired from a variety of databases, for example Ensembl.


