Multiclass_Disease_Classification

This project is designed to develop, deploy, and maintain a machine learning model for multiclass disease classification of chest X-rays. It utilizes a Machine Learning Operations (MLOps) framework to ensure seamless development, deployment, and continuous monitoring of the model. The project follows best practices for reproducibility, modularity, and scalability.

Introduction

Thoracic diseases, such as pneumonia, emphysema, and cardiomegaly, are significant health concerns globally, affecting millions of people each year. In the United States alone, respiratory diseases contribute to a high percentage of hospitalizations and healthcare costs. Accurate diagnosis of these conditions is crucial for effective treatment, yet traditional diagnostic methods rely heavily on manual analysis of medical images like chest X-rays. This process can be time-consuming and subject to human error, especially in areas with limited access to trained radiologists.

Motivated by these challenges, we chose this topic for our MLOps project. By leveraging the principles of MLOps, we have developed an end-to-end machine learning pipeline for automated thoracic disease classification using chest X-rays. Our solution is built to enhance the accuracy and efficiency of non-invasive diagnostic methods, aiming to assist healthcare providers with reliable, timely, and scalable diagnostic support.

By automating the process of classifying multiple thoracic diseases from chest X-ray images, this approach has the potential to alleviate the burden on healthcare professionals, reduce diagnostic errors, and provide faster results, especially in underserved areas where radiologists may not always be available.

This project integrates best practices in MLOps to ensure that our model is not only accurate but also easy to maintain and deploy, making it suitable for real-world clinical use.

Dataset Information

The project utilizes the NIH ChestX-ray14 dataset, one of the largest publicly available chest X-ray image datasets for research purposes. The dataset is provided by the National Institutes of Health (NIH) Clinical Center and consists of 112,120 frontal-view chest X-ray images from 30,805 unique patients. The dataset is annotated with labels for 14 different thoracic disease categories, including:

DataCard

Dataset Name: NIH ChestX-ray14
Source: National Institutes of Health (NIH) Clinical Center
Link to Dataset: NIH Clinical Center ChestX-ray Dataset
Domain: Medical Imaging (Radiology)
Task: Multiclass disease classification from chest X-ray images

Project Workflow

This is the basic Project Flow and this will be updated after the final system architecture is designed

Prerequisites

1. Softwares to be installed

Before you start, please make sure that the following are already installed in your machine, if not please install them.

Git
Docker
Airflow
DVC
Python 3
Pip

2. Dependencies installation

pip install -r requirements.txt

3. Clone the repositry

git clone https://github.com/SatwikReddySripathi/Multiclass_Disease_Classification.git

Git Repo and Project Structure

Project Structure

├── .dvc
│   ├── config                # Configuration file for DVC (Data Version Control)
│   ├── .gitignore            # Specifies files/folders related to DVC to be ignored by Git
├── data
│   ├── training              # Directory containing preprocessed training data
│   │   ├── ....
│   ├── validation            # Directory containing preprocessed validation data
│   │   ├── ....
│   ├── testing           # Directory containing preprocessed testing data
│   │   ├── ....
├── frontend
│   ├── app.py                # Main script for the frontend application (e.g., user interface)
│   ├── dockerfile            # Dockerfile to build the Docker image for the frontend
│   ├── requirements.txt      # Python dependencies required for the frontend
│   ├── kubernetes
│   │   ├── deployment.yaml   # Kubernetes deployment configuration for the frontend
│   │   ├── namespace.yaml    # Kubernetes namespace configuration for organizing resources
│   │   ├── service.yaml      # Kubernetes service configuration for exposing the frontend
├── backend
│   ├── model.py              # Defines the model architecture for training
│   ├── train.py              # Script for training the machine learning model
│   ├── evaluate.py           # Script for evaluating the model's performance
│   ├── predict.py            # Script for making predictions using the trained model
│   ├── requirements.txt      # Python dependencies required for the backend
├── src
│   ├── preprocessing
│   │   ├── image_processing.py # Functions for preprocessing raw images (e.g., resizing, normalization)
│   ├── utils
│   │   ├── metrics.py          # Utility functions for calculating evaluation metrics (e.g., accuracy, precision)
│   │   ├── logger.py           # Provides functions for logging messages, warnings, and errors
│   ├── config
│   │   ├── config.yaml         # YAML file containing configuration settings for the project
│   ├── datapipeline.py         # Script handling the end-to-end data pipeline (loading, preprocessing, transforming)
│   └── keys
│       ├── keyfile.json        # Contains sensitive information (e.g., API keys, credentials)
├── logs
|   ├── ...                    # Placeholder for log files generated during model training, evaluation, etc.
├── assets                     # Stores images, visualizations, and graphs.
├── notebooks
|   ├── EDA.ipynb              # Jupyter notebook for Exploratory Data Analysis (EDA) of the dataset
├── .dvcignore                 # Specifies files/folders that should be ignored by DVC
├── .gitignore                 # Specifies files/folders that should be ignored by Git
├── data.dvc                   # DVC file tracking large data files in the `data` directory
├── dockerfile                 # Dockerfile to build the Docker image for the main application
├── docker-compose.yaml        # Docker Compose file to set up and run multi-container Docker applications
├── entrypoint.sh              # Shell script defining startup commands for the Docker container
├── requirements.txt           # Python dependencies required for running the project locally

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multiclass_Disease_Classification

Table of Contents

Introduction

Dataset Information

DataCard

Disease Categories - Labels

Project Workflow

Prerequisites

1. Softwares to be installed

2. Dependencies installation

3. Clone the repositry

Git Repo and Project Structure

Data Storage and Model Registry

Pipeline

Application Interface

Monitoring Dashboard

Contributors

Acknowledgments

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
.dvc		.dvc
assets		assets
backend		backend
dags		dags
frontend		frontend
notebooks		notebooks
src		src
.dvcignore		.dvcignore
.gitignore		.gitignore
README.md		README.md
data.dvc		data.dvc
docker-compose.yaml		docker-compose.yaml
dockerfile		dockerfile
entrypoint.sh		entrypoint.sh
requirements.txt		requirements.txt

DhanushAkula/Multiclass_Disease_Classification

Folders and files

Latest commit

History

Repository files navigation

Multiclass_Disease_Classification

Table of Contents

Introduction

Dataset Information

DataCard

Disease Categories - Labels

Project Workflow

Prerequisites

1. Softwares to be installed

2. Dependencies installation

3. Clone the repositry

Git Repo and Project Structure

Data Storage and Model Registry

Pipeline

Application Interface

Monitoring Dashboard

Contributors

Acknowledgments

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages