diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml index 69c6579..fd92937 100644 --- a/.github/workflows/ci.yml +++ b/.github/workflows/ci.yml @@ -75,10 +75,10 @@ jobs: - name: Check out pipeline code uses: actions/checkout@0ad4b8fadaa221de15dcec353f45205ec38ea70b # v4 - - name: Set up Nextflow + - name: Install Nextflow uses: nf-core/setup-nextflow@v2 with: - version: "${{ matrix.NXF_VER }}" + version: "${{ matrix.NXT_VER }}" - name: Set up Apptainer if: matrix.profile == 'singularity' diff --git a/CITATIONS.md b/CITATIONS.md index 04d8ed4..a783d75 100644 --- a/CITATIONS.md +++ b/CITATIONS.md @@ -10,13 +10,41 @@ ## Pipeline tools -- [FastQC](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) +- [BaSiCPy](https://basicpy.readthedocs.io/en/stable/) -> Andrews, S. (2010). FastQC: A Quality Control Tool for High Throughput Sequence Data [Online]. + > Peng, T., Thorn, K., Schroeder, T., Wang, L., Theis, F. J., Marr, C., Navab, N. A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images. Nature Communication 2017 June 08; 8(1):14836. doi: [10.1038/ncomms14836](http://doi.org/10.1038/ncomms14836). + +- [ASHLAR](https://labsyspharm.github.io/ashlar/) + + > Muhlich, J. L., Chen, Y., Yapp, C., Russell, D., Santagata, S., Sorger, P. K. Stitching and registering highly multiplexed whole-slide images of tissues and tumors using ASHLAR. Bioinformatics 2022 October; 38(19):4613–4621. doi: [10.1093/bioinformatics/btac544](https://doi.org/10.1093/bioinformatics/btac544). + +- [Backsub](https://github.com/SchapiroLabor/Background_subtraction) + + > Schapiro, D., Sokolov, A., Yapp, C. et al. MCMICRO: a scalable, modular image-processing pipeline for multiplexed tissue imaging. Nat Methods 2022; 19:311–315. doi: [10.1038/s41592-021-01308-y](https://doi.org/10.1038/s41592-021-01308-y) + +- [Coreograph](https://github.com/HMS-IDAC/UNetCoreograph) + + > Schapiro, D., Sokolov, A., Yapp, C. et al. MCMICRO: a scalable, modular image-processing pipeline for multiplexed tissue imaging. Nat Methods 2022; 19:311–315. doi: [10.1038/s41592-021-01308-y](https://doi.org/10.1038/s41592-021-01308-y) + +- [Cellpose](https://cellpose.readthedocs.io/en/latest/index.html) + + > Pachitariu, M., Stringer, C. Cellpose 2.0: how to train your own model. Nat Methods 2022; 19:1634–1641. doi: [10.1038/s41592-022-01663-4](https://doi.org/10.1038/s41592-022-01663-4) + +- [Mesmer](https://deepcell.readthedocs.io/en/master/) + + > Greenwald, N.F., Miller, G., Moen, E. et al. Whole-cell segmentation of tissue images with human-level performance using large-scale data annotation and deep learning. Nat Biotechnol 2022; 40:555–565. doi: [10.1038/s41587-021-01094-0](https://doi.org/10.1038/s41587-021-01094-0) + +- [MCQuant](https://github.com/labsyspharm/quantification) + + > Schapiro, D., Sokolov, A., Yapp, C. et al. MCMICRO: a scalable, modular image-processing pipeline for multiplexed tissue imaging. Nat Methods 2022; 19:311–315. doi: [10.1038/s41592-021-01308-y](https://doi.org/10.1038/s41592-021-01308-y) + +- [SciMap](https://scimap.xyz/) + + > Nirmal et al. SCIMAP: A Python Toolkit for Integrated Spatial Analysis of Multiplexed Imaging Data. Journal of Open Source Software 2024; 9(97):6604, doi: [10.21105/joss.06604](https://doi.org/10.21105/joss.06604) - [MultiQC](https://pubmed.ncbi.nlm.nih.gov/27312411/) -> Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. + > Ewels P, Magnusson M, Lundin S, Käller M. MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics. 2016 Oct 1;32(19):3047-8. doi: 10.1093/bioinformatics/btw354. Epub 2016 Jun 16. PubMed PMID: 27312411; PubMed Central PMCID: PMC5039924. ## Software packaging/containerisation tools @@ -39,3 +67,7 @@ - [Singularity](https://pubmed.ncbi.nlm.nih.gov/28494014/) > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675. + +## Test Data + +> Schapiro, D., Sokolov, A., Yapp, C. et al. MCMICRO: a scalable, modular image-processing pipeline for multiplexed tissue imaging. Nat Methods 2022; 19:311–315. doi: [10.1038/s41592-021-01308-y](https://doi.org/10.1038/s41592-021-01308-y) diff --git a/README.md b/README.md index 49db8b2..f419bed 100644 --- a/README.md +++ b/README.md @@ -26,66 +26,76 @@ If you want to run the original MCMICRO pipeline outside of nf-core, please see . - +The nf-core/mcmicro pipeline is an end-to-end processing pipeline that transforms multi-channel whole-slide images into single-cell data. It takes samplesheet and markersheet files as input and perfoms registration, segmentation and quantification. Multiple segmentation modules are available and can be run in parallel. The pipeline can also optionally perform background and shading correction, background subtraction, and supports tissue microarrays. It returns a pre-segmentation image file, a segmentation mask image, and a cell x feature array spreadsheet. - - +![nf-core/mcmicro metro diagram](assets/mcmicro_metro.png) - +`markersheet.csv`: - +```csv +channel_number,cycle_number,marker_name +1,1,DNA 1 +2,1,Na/K ATPase +3,1,CD3 +4,1,CD45RO +``` - +Each row of the markersheet represents a single channel in the associated sample image. The first column `channel_number` is an identifier for the respective channel. The second column `cycle_number` corresponds to the cycle number of the image and it must match the `cycle_number` in the supplied samplesheet. The third column `marker_name` is the name of the marker for the given channel and cycle. - - -> [!WARNING] -> Please provide pipeline parameters via the CLI or Nextflow `-params-file` option. Custom config files including those provided by the `-c` Nextflow option can be used to provide any configuration _**except for parameters**_; see [docs](https://nf-co.re/docs/usage/getting_started/configuration#custom-configuration-files). +``` - +For more details and further functionality, please refer to the [usage documentation](https://nf-co.re/mcmicro/usage) and the [parameter documentation](https://nf-co.re/mcmicro/parameters). - +[output documentation](https://nf-co.re/mcmicro/output). - +We thank the following people for their assistance in the development of this pipeline: - +- [heylf](https://github.com/heylf) +- [Florian Wuennemann](https://github.com/FloWuenne) +- [Phil Ewels](https://github.com/ewels) +- [Adam Taylor](https://github.com/adamjtaylor) ## Contributions and Support @@ -98,8 +108,6 @@ For further information or help, don't hesitate to get in touch on the [Slack `# - - If you use nf-core/mcmicro for your analysis, please cite it using the following article: [Schapiro et al. 2022 Nat. Methods](https://www.nature.com/articles/s41592-021-01308-y) An extensive list of references for the tools used by the pipeline can be found in the [`CITATIONS.md`](CITATIONS.md) file. diff --git a/assets/markers-test_full.csv b/assets/markers-test_full.csv new file mode 100644 index 0000000..0dc5030 --- /dev/null +++ b/assets/markers-test_full.csv @@ -0,0 +1,9 @@ +channel_number,cycle_number,marker_name +1,1,DNA_6 +2,1,ELANE +3,1,CD57 +4,1,CD45 +5,2,DNA_7 +6,2,ELANE7 +7,2,CD577 +8,2,CD457 diff --git a/assets/mcmicro_metro.png b/assets/mcmicro_metro.png new file mode 100644 index 0000000..adc9c39 Binary files /dev/null and b/assets/mcmicro_metro.png differ diff --git a/assets/methods_description_template.yml b/assets/methods_description_template.yml index 400d6d9..0560b18 100644 --- a/assets/methods_description_template.yml +++ b/assets/methods_description_template.yml @@ -3,8 +3,6 @@ description: "Suggested text and references to use when describing pipeline usag section_name: "nf-core/mcmicro Methods Description" section_href: "https://github.com/nf-core/mcmicro" plot_type: "html" -## TODO nf-core: Update the HTML below to your preferred methods description, e.g. add publication citation for this pipeline -## You inject any metadata in the Nextflow '${workflow}' object data: |

Methods

Data was processed using nf-core/mcmicro v${workflow.manifest.version} ${doi_text} of the nf-core collection of workflows (Ewels et al., 2020), utilising reproducible software environments from the Bioconda (Grüning et al., 2018) and Biocontainers (da Veiga Leprevost et al., 2017) projects.

diff --git a/assets/samplesheet-test_full.csv b/assets/samplesheet-test_full.csv new file mode 100644 index 0000000..4e71dd0 --- /dev/null +++ b/assets/samplesheet-test_full.csv @@ -0,0 +1,5 @@ +sample,cycle_number,channel_count,image_tiles +TEST1,1,4,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/imaging/ome-tiff/cycif-tonsil-cycle1.ome.tif +TEST1,2,4,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/imaging/ome-tiff/cycif-tonsil-cycle2.ome.tif +TEST2,1,4,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/imaging/ome-tiff/cycif-tonsil-cycle2.ome.tif +TEST2,2,4,https://raw.githubusercontent.com/nf-core/test-datasets/modules/data/imaging/ome-tiff/cycif-tonsil-cycle3.ome.tif diff --git a/conf/test.config b/conf/test.config index efb6960..b54f011 100644 --- a/conf/test.config +++ b/conf/test.config @@ -37,4 +37,8 @@ process { memory = "6.GB" cpus = 2 } + withName: ".*:CELLPOSE"{ + memory = "12.GB" + cpus = 4 + } } diff --git a/conf/test_full.config b/conf/test_full.config index cddd1a7..be8607a 100644 --- a/conf/test_full.config +++ b/conf/test_full.config @@ -17,8 +17,29 @@ params { // Input data for full size test // TODO nf-core: Specify the paths to your full test data ( on nf-core/test-datasets or directly in repositories, e.g. SRA) // TODO nf-core: Give any required params for the test so that command line flags are not needed - input = params.pipelines_testdata_base_path + 'viralrecon/samplesheet/samplesheet_full_illumina_amplicon.csv' - // Genome references - genome = 'R64-1-1' + max_cpus = 2 + max_memory = '6.GB' + max_time = '6.h' + + // Input data + input_cycle = "${projectDir}/assets/samplesheet-test_full.csv" + marker_sheet = "${projectDir}/assets/markers-test_full.csv" + + // TODO: Add samplesheet and markersheet to testdata repo then switch to this format: + //input_cycle = params.pipelines_testdata_base_path + 'mcmicro/samplesheet/???.csv' + + illumination = "basicpy" + segmentation = "mesmer,cellpose" +} + +process { + withName: ".*:DEEPCELL_MESMER" { + memory = "6.GB" + cpus = 2 + } + withName: 'CELLPOSE'{ + memory = "12.GB" + cpus = 4 + } } diff --git a/docs/output.md b/docs/output.md index ddad7d8..9377779 100644 --- a/docs/output.md +++ b/docs/output.md @@ -6,31 +6,150 @@ This document describes the output produced by the pipeline. Most of the plots a The directories listed below will be created in the results directory after the pipeline has finished. All paths are relative to the top-level results directory. - - ## Pipeline overview The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes data using the following steps: -- [FastQC](#fastqc) - Raw read QC -- [MultiQC](#multiqc) - Aggregate report describing results and QC from the whole pipeline -- [Pipeline information](#pipeline-information) - Report metrics generated during the workflow execution +- [Directory Structure](#directory-structure) +- [Illumination Correction](#illumination-correction) + - [BaSiCPy](#basicpy) +- [Registration](#registration) + - [ASHLAR](#ashlar) +- [Background Subtraction](#background-subtraction) + - [Backsub](#backsub) +- [TMA Core Separation](#tma-core-separation) + - [Coreograph](#coreograph) +- [Segmentation](#segmentation) + - [Mesmer](#mesmer) + - [Cellpose](#cellpose) +- [Quantification](#quantification) +- [MultiQC](#multiqc) +- [Pipeline information](#pipeline-information) + +### Directory Structure + +``` +{outdir} +├── backsub +├── illumination_correction +│   └── basicpy +├── multiqc +│   ├── multiqc_data +│   ├── multiqc_plots +│   └── multiqc_report.html +├── pipeline_info +├── quantification +│   └── mcquant +│   └── {segmentation module} +├── registration +│   └── ashlar +├── segmentation +│   └── {segmentation module} +└── tma_dearray + └── masks + +``` + +### Illumination Correction + +#### BaSiCPy + +[BaSiCPy](https://nf-co.re/modules/basicpy/) is a python package for background and shading correction of optical microscopy images. It is developed based on the Matlab version of BaSiC tool with major improvements in the algorithm. + +
+Output files -### FastQC +- {sample_name}-dfp.tif : Tiff fields for dark field illumination correction +- {sample_name}-ffp.tif : Tiff fields for flat field illumination correction -
+
+ +### Registration + +#### ASHLAR + +[ASHLAR](https://nf-co.re/modules/ashlar/) combines multi-tile microscopy images into a high-dimensional mosaic image. + +
Output files -- `fastqc/` - - `*_fastqc.html`: FastQC report containing quality metrics. - - `*_fastqc.zip`: Zip archive containing the FastQC report, tab-delimited data file and plot images. +- {sample_name}.ome.tif : A pyramidal, tiled OME-TIFF file created from input images.
-[FastQC](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/) gives general quality metrics about your sequenced reads. It provides information about the quality score distribution across your reads, per base sequence content (%A/T/G/C), adapter contamination and overrepresented sequences. For further reading and documentation see the [FastQC help pages](http://www.bioinformatics.babraham.ac.uk/projects/fastqc/Help/). +### Background Subtraction + +#### Backsub + +[Backsub](https://nf-co.re/modules/backsub/) performs a pixel-by-pixel channel subtraction scaled by exposure times of pre-stitched tif images. + +
+Output files + +- markers_bs.csv : Marker file adjusted to match the background corrected image +- .backsub.ome.tif : Background corrected pyramidal ome.tif + +
+ +### TMA Core Separation + +#### Coreograph + +[Coreograph](https://nf-co.re/modules/coreograph/) uses UNet, a deep learning model, to identify complete/incomplete tissue cores on a tissue microarray. It has been trained on 9 TMA slides of different sizes and tissue types. + +
+Output files + +- {core_number}.tif : Individual cropped tissue core images +- centroidsY-X.txt : A text file listing centroids of each core in format Y, X +- masks/{core_number}\_mask.tif : Binary mask image for each tissue core +- TMA_MAP.tif : A TMA map showing core number labels and mask outlines + +
+ +### Segmentation + +#### Cellpose + +[Cellpose](https://nf-co.re/modules/cellpose/) segments cells in images + +
+Output files + +- {sample_name}.ome_cp_masks.tif : labelled mask output from cellpose in tif format + +
+ +#### Mesmer + +[Mesmer](https://nf-co.re/modules/deepcell_mesmer/) segmentation for whole-cell + +
+Output files + +- mask\_{sample_name}.tif : File containing the mask. + +
+ +### Quantification + +#### Mcquant + +[Mcquant](https://nf-co.re/modules/mcquant/) extracts single-cell data given a multi-channel image and a segmentation mask. + +
+Output files + +- \*.csv : Single-cell feature table for all selected segmenters, for each segmented cell compartment. + +
+ +### Quality Control ### MultiQC +Aggregate report describing results and QC from the whole pipeline +
Output files @@ -47,6 +166,8 @@ Results generated by MultiQC collate pipeline QC from supported tools e.g. FastQ ### Pipeline information +Report metrics generated during the workflow execution +
Output files diff --git a/docs/usage.md b/docs/usage.md index 0872e06..dea55fc 100644 --- a/docs/usage.md +++ b/docs/usage.md @@ -6,8 +6,6 @@ ## Introduction - - ## Samplesheet input You will need to create a samplesheet with information about the samples you would like to analyse before running the pipeline. We currently accept 2 formats for the input samplesheets. One format is one row per sample and the other is one row per sample per cycle. Use the parameter `input_sample` for one row per sample or the parameter `input_cycle` for one row per sample per cycle, to specify its location. It has to be a comma-separated file with a header row and either two (input_sample) or four (input_cycle) columns as shown in the examples below. @@ -24,7 +22,7 @@ You will need to create a samplesheet with information about the samples you wou ### Samplesheet with one row per sample per cycle -The `sample` identifier must be the same for multiple cycles of the same sample. All the files from the same sample will be run in a single run of Ashlar in the cycle order that they appear in the samplesheet. If illumination correction is requested using Basicpy each cycle will be corrected separately. +The `sample` identifier must be the same for multiple cycles of the same sample. All the files from the same sample will be run in a single run of ashlar in the cycle order that they appear in the samplesheet. If illumination correction is requested using basicpy, each cycle will be corrected separately. ```csv title="samplesheet_cycle.csv" sample,cycle_number,channel_count,image_tiles @@ -33,18 +31,18 @@ TEST1,2,10,/path/to/image/cycif-tonsil-cycle2.ome.tif TEST1,3,10,/path/to/image/cycif-tonsil-cycle3.ome.tif ``` -| Column | Description | -| --------------- | --------------------------------------------------------------------------- | -| `sample` | Custom sample name. | -| `cycle_number` | Integer giving the cycle for the file in the current row. | -| `channel_count` | Integer giving the total number of channels in the file in the current row. | -| `image_tiles` | Full path to the input image file. | +| Column | Description | +| --------------- | ----------------------------------------------------------------------------- | +| `sample` | Custom sample name. | +| `cycle_number` | Integer value of the cycle for the file in the current row. | +| `channel_count` | Integer value of the total number of channels in the file in the current row. | +| `image_tiles` | Full path or URL to the input image file. | An [example one row per sample per cycle samplesheet](../assets/samplesheet_1_row_sample_cycle.csv) has been provided with the pipeline. ### Samplesheet with one row per sample -All per-cycle image files in the `image_directory` for a given sample will be run in a single run of Ashlar. If illumination correction is requested using Basicpy each cycle will be corrected separately. +This is similar to the above case except each row just contains a column for each `sample` name and a columnn containing a directory where all the files for a given sample are located. All per-cycle image files in the `image_directory` for a given sample will be run in a single run of ashlar. If illumination correction is requested using basicpy, each cycle will be corrected separately. ```csv title="samplesheet_sample.csv" sample,image_directory @@ -58,9 +56,39 @@ TEST1,/path/to/image/directory An [example one row per sample samplesheet](../assets/samplesheet_1_row_sample.csv) has been provided with the pipeline. +## Markersheet input + +Each row of the markersheet represents a single channel in the associated sample image. The columns `channel_number`, `cycle_number` and `marker_name` are required. + +```csv +channel_number,cycle_number,marker_name +1,1,DNA 1 +2,1,Na/K ATPase +3,1,CD3 +4,1,CD45RO +``` + +| Column | Description | +| ---------------- | --------------------------------------------------- | +| `channel_number` | Integer identifier for the respective channel. | +| `cycle_number` | Integer identifier for the image cycle. | +| `marker_name` | Name of the marker for the given channel and cycle. | + +:::note +`cycle_number` must match the `cycle_number` in the supplied samplesheet. +::: + +### optional markersheet columns + +| Column | Description | +| ----------------------- | ---------------------------------------------- | +| `filter` | Microscope filter common name. | +| `excitation_wavelength` | Excitation wavelength for this channel, in nm. | +| `emission_wavelength` | Emission wavelength for this channel, in nm. | + ## Running the pipeline -# One row per sample per cycle +### One row per sample per cycle The typical command for running the one row per sample per cycle pipeline is as follows: @@ -68,7 +96,7 @@ The typical command for running the one row per sample per cycle pipeline is as nextflow run nf-core/mcmicro --input_cycle ./samplesheet_cycle.csv --outdir ./results --marker_sheet markers.csv -profile docker ``` -# One row per sample +### One row per sample The typical command for running the one row per sample pipeline is as follows: @@ -103,15 +131,43 @@ nextflow run nf-core/mcmicro -profile docker -params-file params.yaml with: -```yaml title="params.yaml" -input: './samplesheet.csv' -outdir: './results/' -genome: 'GRCh37' -<...> +```yaml +input_cycle: "samplesheet_cycle.csv" +outdir: "./output" +marker_sheet: "markers.csv" ``` You can also generate such `YAML`/`JSON` files via [nf-core/launch](https://nf-co.re/launch). +### Pipeline stages and associated input parameters + +#### Illumination Correction + +Illumination correction can optionally be performed before registration. It is triggered by the `--illumination` flag which can currently only be followed by the option `basicpy`. We plan on supporting other modules for illumination correction in the future. +When `basicpy` is selected the nf-core module basicpy is run on the input image(s). Basicpy is a python package for background and shading correction of optical microscopy images. More information about it can be found on the [basicpy nf-core module website](https://nf-co.re/modules/basicpy/). + +#### Registration + +Registration is a required step of the pipeline and the only module currently supported is ashlar. Ashlar is a software package that combines multi-tile microscopy images into a high-dimensional mosaic image. More information about ashlar can be found on the [ashlar website](https://labsyspharm.github.io/ashlar/). We plan to support other modules for registration in the future. + +#### Background Subtraction + +This is an optional step that occurs immediately following registration. It is triggered by the `--backsub` flag. When this flag is selected, the module backsub is run on the output from the registration step. The backsub module performs pixel-by-pixel channel subtraction scaled by exposure times of pre-stitched tif images. More information about it can be found on the [backsub nf-core module website](https://nf-co.re/modules/backsub/). + +#### TMA Core Separation + +This is an optional step that occurs immediately following background subtration if that optional step was run or after registration if is was not. It is triggered by the `--tma_dearray` flag. When this flag is selected, the coreograph module is run on the output from either the background subtraction step or the registration step if background subtration was not performed. Coreograph separates the input image into a set of images for each of the cores. It uses UNet, a deep learning model, to identify complete/incomplete tissue cores on a tissue microarray. It has been trained on 9 TMA slides of different sizes and tissue types. More information about it can be found on the [coreograph nf-core module website](https://nf-co.re/modules/coreograph/) + +#### Segmentation + +This is a required step that follows the TMA Core Separation step. The workflow will run the deepcell_mesmer module by default, but other options are available by using the `--segmentation` flag. The flag should be followed by a single segmentation module name or a comma separated list of names to run multiple segmentation modules in parallel. The available options currently supported are `mesmer` and `cellpose`. More information about each of these modules can be found on their respective nf-core module websites: [deepcell_mesmer](https://nf-co.re/modules/deepcell_mesmer/) [cellpose](https://nf-co.re/modules/cellpose/) + +When `cellpose` is selected as a segmentation method you may also provide a pretrained model to the cellpose module by using the `--cellpose_model` flag followed by a full path or URL to the model file. + +#### Quantification + +This is a required step that follows segmentation. The workflow currently runs the mcquant module by default. Other quantification modules will be added as options in the future. Mcquant extracts single-cell data given a multi-channel image and a segmentation mask. More information about mcquant can be found on the [mcquant nf-core module website](https://nf-co.re/modules/mcquant/). + ### Updating the pipeline When you run the above command, Nextflow automatically pulls the pipeline code from GitHub and stores it as a cached version. When running the pipeline after this, it will always use the cached version if available - even if the pipeline has been updated since. To make sure that you're running the latest version of the pipeline, make sure that you regularly update the cached version of the pipeline: diff --git a/subworkflows/local/utils_nfcore_mcmicro_pipeline/main.nf b/subworkflows/local/utils_nfcore_mcmicro_pipeline/main.nf index 39d01f4..da74f48 100644 --- a/subworkflows/local/utils_nfcore_mcmicro_pipeline/main.nf +++ b/subworkflows/local/utils_nfcore_mcmicro_pipeline/main.nf @@ -75,7 +75,6 @@ workflow PIPELINE_INITIALISATION { // Create channel from input file provided through params.input_cycle or .input_sample // if (input_cycle) { - // TODO: Validate that cycle_number is 1..N, in order, for all samples. ch_samplesheet = Channel.fromList(samplesheetToList(params.input_cycle, "${projectDir}/assets/schema_input_cycle.json")) .map{ sample, cycle_number, channel_count, image_tiles, dfp, ffp -> @@ -253,6 +252,19 @@ def validateInputSamplesheetMarkersheet ( samples, markers ) { if (marker_cycles.unique(false) != sample_cycles.unique(false) ) { error("cycle_number values must match between sample and marker sheets") } + + // TODO: should the following test be in a separate validateInputSamplesheet() function? + + def channel_cycle_map = samples.collect{ meta, image_tiles, dfp, ffp -> [meta.id,meta.cycle_number] }.groupBy{ it[0] } + channel_cycle_map.each { entry -> + last_val = -1 + entry.value.collect{ it[1] }.each{ curr_val -> + if (last_val != -1 && (curr_val > (last_val + 1) || curr_val <= last_val)) { + error("cycle_number values must be increasing with no gaps") + } + last_val = curr_val + } + } } def expandSampleRow( row ) { @@ -273,12 +285,15 @@ def expandSampleRow( row ) { // Generate methods description for MultiQC // def toolCitationText() { - // TODO nf-core: Optionally add in-text citation tools to this list. // Can use ternary operators to dynamically construct based conditions, e.g. params["run_xyz"] ? "Tool (Foo et al. 2023)" : "", // Uncomment function in methodsDescriptionText to render in MultiQC report def citation_text = [ "Tools used in the workflow included:", - "FastQC (Andrews 2010),", + params["illumination"] ? "Basicpy (Peng et al. 2017)," : "", + "Ashlar (Muhlich et al. 2022),", + params["segmentation"].contains("cellpose") ? "Cellpose (Stringer et al. 2021)," : "", + params["segmentation"].contains("mesmer") ? "Mesmer (Van Valen et al. 2016)," : "", + "MCQuant (Schapiro et al. 2022),", "MultiQC (Ewels et al. 2016)", "." ].join(' ').trim() @@ -287,11 +302,14 @@ def toolCitationText() { } def toolBibliographyText() { - // TODO nf-core: Optionally add bibliographic entries to this list. // Can use ternary operators to dynamically construct based conditions, e.g. params["run_xyz"] ? "
  • Author (2023) Pub name, Journal, DOI
  • " : "", // Uncomment function in methodsDescriptionText to render in MultiQC report def reference_text = [ - "
  • Andrews S, (2010) FastQC, URL: https://www.bioinformatics.babraham.ac.uk/projects/fastqc/).
  • ", + params["illumination"] ? "
  • Peng, T., Thorn, K., Schroeder, T., Wang, L., Theis, F.J., Marr*, C., Navab*, N. (2017). A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images Nature Communication 8(14836). doi: 10.1038/ncomms14836
  • " : "", + "
  • Muhlich, J.L., Chen, Y., Yapp, C., Russell, D., Santagata, S., Sorger, P.K. (2022) Stitching and registering highly multiplexed whole-slide images of tissues and tumors using ASHLAR, Bioinformatics 38(19), 4613–4621. doi: 10.1093/bioinformatics/btac544
  • ", + params["segmentation"].contains("cellpose") ? "
  • Stringer, C., Wang, T., Michaelos, M., & Pachitariu, M. (2021). Cellpose: a generalist algorithm for cellular segmentation. Nature methods, 18(1), 100-106.
  • " : "", + params["segmentation"].contains("mesmer") ? "
  • Van Valen, D.A., Kudo, T., Lane, K.M., Macklin, D.N., Quach, N.T., DeFelice, M.M., Maayan, I., Tanouchi, Y., Ashley, E.A., Covert, M.W. (2016). Deep Learning Automates the Quantitative Analysis of Individual Cells in Live-Cell Imaging Experiments. PLOS Computational Biology 12(11), doi: 10.1371/journal.pcbi.1005177.
  • " : "", + "
  • Schapiro, D., Sokolov, A., Yapp, C. et al. MCMICRO: a scalable, modular image-processing pipeline for multiplexed tissue imaging. Nat Methods 19, 311–315 (2022). doi: 10.1038/s41592-021-01308-y
  • ", "
  • Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: summarize analysis results for multiple tools and samples in a single report. Bioinformatics , 32(19), 3047–3048. doi: /10.1093/bioinformatics/btw354
  • " ].join(' ').trim() @@ -322,9 +340,8 @@ def methodsDescriptionText( mqc_methods_yaml ) { meta["tool_citations"] = "" meta["tool_bibliography"] = "" - // TODO nf-core: Only uncomment below if logic in toolCitationText/toolBibliographyText has been filled! - // meta["tool_citations"] = toolCitationText().replaceAll(", \\.", ".").replaceAll("\\. \\.", ".").replaceAll(", \\.", ".") - // meta["tool_bibliography"] = toolBibliographyText() + meta["tool_citations"] = toolCitationText().replaceAll(", \\.", ".").replaceAll("\\. \\.", ".").replaceAll(", \\.", ".") + meta["tool_bibliography"] = toolBibliographyText() def methods_text = mqc_methods_yaml.text diff --git a/subworkflows/local/utils_nfcore_mcmicro_pipeline/tests/initialisation.nf.test b/subworkflows/local/utils_nfcore_mcmicro_pipeline/tests/initialisation.nf.test index 8c9b415..a1edf0e 100644 --- a/subworkflows/local/utils_nfcore_mcmicro_pipeline/tests/initialisation.nf.test +++ b/subworkflows/local/utils_nfcore_mcmicro_pipeline/tests/initialisation.nf.test @@ -14,7 +14,7 @@ nextflow_workflow { } workflow { """ - input = [false, false, false, false, [], '$outputDir', params.input_cycle, [], params.marker_sheet] + input = [false, false, false, [], '$outputDir', params.input_cycle, [], params.marker_sheet] """ } } @@ -34,7 +34,7 @@ nextflow_workflow { } workflow { """ - input = [false, false, false, false, [], '$outputDir', params.input_cycle, [], params.marker_sheet] + input = [false, false, false, [], '$outputDir', params.input_cycle, [], params.marker_sheet] """ } } diff --git a/workflows/mcmicro.nf b/workflows/mcmicro.nf index d1167bf..dece512 100644 --- a/workflows/mcmicro.nf +++ b/workflows/mcmicro.nf @@ -18,8 +18,8 @@ include { BACKSUB } from '../modules/nf-core/backsub/main' include { CELLPOSE } from '../modules/nf-core/cellpose/main' include { COREOGRAPH } from '../modules/nf-core/coreograph/main' include { DEEPCELL_MESMER } from '../modules/nf-core/deepcell/mesmer/main' -include { MCQUANT } from '../modules/nf-core/mcquant/main' include { SCIMAP_MCMICRO } from '../modules/nf-core/scimap/mcmicro/main' +include { MCQUANT } from '../modules/nf-core/mcquant/main' /* ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~