Skin Cancer Classification using Neural Networks

📌 Project Overview

This project focused on developing a deep-learning model for skin cancer classification using the HAM10000 dataset. The dataset contains images of skin lesions categorized into seven types of skin cancer. The goal was to build a model that could classify unseen images while handling data imbalances and computational constraints.

This was a group project developed as part of the Deep Learning course.

📂 Dataset Overview

The dataset consists of 10 015 labeled images, including:

Age
Sex
Location of the lesion
Diagnosis (dx) (Target variable with seven classes):
- Actinic keratoses and intraepithelial carcinoma / Bowen's disease (akiec)
- Basal cell carcinoma (bcc)
- Benign keratosis-like lesions (bkl)
- Dermatofibroma (df)
- Melanoma (mel)
- Melanocytic nevi (nv)
- Vascular lesions (vasc)

🔬 Methodology

1️⃣ Data Exploration

Analyzed metadata to identify missing values and class distribution.
Noted dataset imbalance, requiring the use of f1-score as evaluation metric.
Decided to use only images for training, excluding metadata.

2️⃣ Image Preprocessing

Resized images from 600x450 to 150x112 to optimize computational efficiency.
Applied label encoding to the target variable.
Normalized pixel values to a range between 0 and 1.
Converted images to grayscale and tested hair removal techniques.
Applied sharpening and histogram equalization for contrast enhancement.
Due to suboptimal results, preprocessing was ultimately not used in the final model.

3️⃣ Model Development

Implemented a Convolutional Neural Network (CNN) using TensorFlow/Keras.
Used ImageDataGenerator for data augmentation (rotation, flipping, shifting), but found no improvement in performance.
Employed grid search to optimize hyperparameters using Keras Tuner - Hyperband.

4️⃣ Model Evaluation

Used stratified k-fold cross-validation to handle class imbalance.
Achieved a weighted f1-score of 0.74.
Confusion matrix analysis showed that the model performed well on majority classes (e.g., 'nv') but struggled with rare ones (e.g., 'df').

🔥 Best CNN Model Architecture

3 Convolutional layers (20, 60, 80 filters) with ReLU activation and max pooling.
Flattened the output and added 3 dense layers:
- 352 neurons, ReLU, Dropout (30%)
- 256 neurons, ReLU, Dropout (10%)
- 32 neurons, ReLU, Dropout (30%)
Final softmax layer with 7 neurons for multi-class classification.
Optimized using Adam optimizer and sparse_categorical_crossentropy loss.

🎯 Results & Challenges

✅ Key Achievements

The model achieved an F1-score of 0.74.

Successfully optimized a deep-learning model despite limited resources.

❌ Challenges Faced

Computational Limitations: Required image resizing due to RAM constraints.

Class Imbalance: Rare categories impacted model accuracy.

Preprocessing Trade-offs: Despite testing, preprocessing did not significantly improve results.

⚙️ Installation & Usage

🔧 Prerequisites

Python 3.x
TensorFlow/Keras
NumPy, Pandas, Matplotlib, Seaborn, OpenCV
Sklearn (for label encoding and k-fold validation)

🚀 Running the Model

Clone this repository:

git clone https://github.com/MGN19/deep-learning-skin-cancer.git

Run the notebooks in order:
- 1_Explore.ipynb
- 2_ImagePreProcessing.ipynb
- 3_Model.ipynb
- 4_ImgGen.ipynb
- 5_GridSearch.ipynb

📚 Lessons Learned & Future Improvements

As this was our first time working on an image classification task, we gained valuable insights throughout the process. With additional experience, I have identified areas where improvements could be made:

More In-Depth Data Exploration: While we performed initial exploratory analysis, a more detailed investigation into feature distributions, correlations, and additional metadata insights could have provided stronger insights.
Data Leakage in Preprocessing: Preprocessing (such as image normalization and transformations) was applied before splitting the data into training and validation sets. This led to data leakage, as preprocessing should be applied after the split.
Additional Preprocessing Techniques: Although several preprocessing techniques were tested, more advanced methods (such as advanced augmentation, denoising, or color space transformations) could have been explored further.
Data Leakage in Hyperparameter Tuning: During hyperparameter tuning, data leakage occurred as the validation data was not properly separated. Using sklearn’s PredefinedSplit before the grid search would have been a better approach to ensure that validation data remained truly unseen during hyperparameter tuning.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Skin Cancer Classification using Neural Networks

📌 Project Overview

📂 Dataset Overview

🔬 Methodology

1️⃣ Data Exploration

2️⃣ Image Preprocessing

3️⃣ Model Development

4️⃣ Model Evaluation

🔥 Best CNN Model Architecture

🎯 Results & Challenges

⚙️ Installation & Usage

🔧 Prerequisites

🚀 Running the Model

About

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 16 Commits
1_Explore.ipynb		1_Explore.ipynb
2_ImagePreProcessing.ipynb		2_ImagePreProcessing.ipynb
3_Model.ipynb		3_Model.ipynb
4_ImgGen.ipynb		4_ImgGen.ipynb
5_GridSearch.ipynb		5_GridSearch.ipynb
README.md		README.md

MGN19/deep-learning-skin-cancer

Folders and files

Latest commit

History

Repository files navigation

Skin Cancer Classification using Neural Networks

📌 Project Overview

📂 Dataset Overview

🔬 Methodology

1️⃣ Data Exploration

2️⃣ Image Preprocessing

3️⃣ Model Development

4️⃣ Model Evaluation

🔥 Best CNN Model Architecture

🎯 Results & Challenges

⚙️ Installation & Usage

🔧 Prerequisites

🚀 Running the Model

About

Topics

Resources

Stars

Watchers

Forks

Languages