SpamShield is an Email/SMS spam classifier built using PySpark's Machine Learning library. It predicts whether a given message is spam or not. The model is built using Naive Bayes algorithm and preprocessed using a custom PySpark Pipeline.
Technology | Description |
---|---|
Python | An interpreted, high-level programming language used for general-purpose programming. |
PySpark | An open-source data processing framework used for big data processing and analysis. |
FastAPI | A modern, fast (high-performance) web framework for building APIs with Python. |
Streamlit | An open-source Python library used to build interactive web applications. |
Vercel | A cloud platform used for deploying and scaling web applications. |
A demo of this application is here - SpamShield
-
Prediction: The application can predict whether a given message is spam or not.
-
Custom PySpark Pipeline: The message is preprocessed using a custom PySpark Pipeline to ensure that it is correctly formatted and cleaned before it is fed into the model for prediction.
-
Naive Bayes Algorithm: The model is built using Naive Bayes algorithm, which is a simple yet effective algorithm for text classification.
-
User Interface: The application has a simple and user-friendly interface that allows users to enter a message and get a prediction with just a click of a button.
-
Easy to Use: The application can be easily installed and used on any machine that has Python and the required packages installed.
-
Efficient: The PySpark library allows the application to handle large amounts of data efficiently, making it suitable for businesses and individuals who receive a high volume of messages.
Python 3.6 or higher findspark pyspark streamlit numpy
To install the required packages, run the following command: pip install -r requirements.txt
To run the application, navigate to the root directory of the project in the terminal and run the following command:
streamlit run app.py
This will start the application and launch a local server at http://localhost:8501/ in your web browser.
The model is built using Naive Bayes algorithm and preprocessed using a custom PySpark Pipeline. The trained model and the pipeline are saved as separate files and loaded into the application at runtime.
The custom PySpark Pipeline ensures that the message is preprocessed correctly before it is fed into the model for prediction. This application can be useful for individuals and businesses to filter out spam messages from their inbox and focus on important messages.