SpamShield

- with PySpark's Machine Learning library and deployed on Streamlit

SpamShield is an Email/SMS spam classifier built using PySpark's Machine Learning library. It predicts whether a given message is spam or not. The model is built using Naive Bayes algorithm and preprocessed using a custom PySpark Pipeline.

Technologies

Technology	Description
Python	An interpreted, high-level programming language used for general-purpose programming.
PySpark	An open-source data processing framework used for big data processing and analysis.
FastAPI	A modern, fast (high-performance) web framework for building APIs with Python.
Streamlit	An open-source Python library used to build interactive web applications.
Vercel	A cloud platform used for deploying and scaling web applications.

Demo

A demo of this application is here - SpamShield

Features

Prediction: The application can predict whether a given message is spam or not.
Custom PySpark Pipeline: The message is preprocessed using a custom PySpark Pipeline to ensure that it is correctly formatted and cleaned before it is fed into the model for prediction.
Naive Bayes Algorithm: The model is built using Naive Bayes algorithm, which is a simple yet effective algorithm for text classification.
User Interface: The application has a simple and user-friendly interface that allows users to enter a message and get a prediction with just a click of a button.
Easy to Use: The application can be easily installed and used on any machine that has Python and the required packages installed.
Efficient: The PySpark library allows the application to handle large amounts of data efficiently, making it suitable for businesses and individuals who receive a high volume of messages.

Requirements

Python 3.6 or higher findspark pyspark streamlit numpy

Installation

To install the required packages, run the following command: pip install -r requirements.txt

Usage

To run the application, navigate to the root directory of the project in the terminal and run the following command:

streamlit run app.py

This will start the application and launch a local server at http://localhost:8501/ in your web browser.

Model

The model is built using Naive Bayes algorithm and preprocessed using a custom PySpark Pipeline. The trained model and the pipeline are saved as separate files and loaded into the application at runtime.

Conclusion

The custom PySpark Pipeline ensures that the message is preprocessed correctly before it is fed into the model for prediction. This application can be useful for individuals and businesses to filter out spam messages from their inbox and focus on important messages.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
spam_classifier_model		spam_classifier_model
spam_cleaner_pipeline		spam_cleaner_pipeline
.gitignore		.gitignore
README.md		README.md
SMSSpamCollection.csv		SMSSpamCollection.csv
SpamShield.png		SpamShield.png
SpamShield2.png		SpamShield2.png
Spam_Classification_with_Spark.ipynb		Spam_Classification_with_Spark.ipynb
app.py		app.py
core.py		core.py
fast_app.py		fast_app.py
packages.txt		packages.txt
requirements.txt		requirements.txt
vercel.json		vercel.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

SpamShield

- with PySpark's Machine Learning library and deployed on Streamlit

Technologies

Demo

Features

Requirements

Installation

Usage

Model

Conclusion

About

Releases

Packages

Languages

ajosegun/SpamShield

Folders and files

Latest commit

History

Repository files navigation

SpamShield

- with PySpark's Machine Learning library and deployed on Streamlit

Technologies

Demo

Features

Requirements

Installation

Usage

Model

Conclusion

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages