A PySpark-powered project for analyzing advertising campaign data. This project transforms raw CSV data into actionable insights, summarizing user engagement across demographics, income levels, and more. Includes seamless integration with AWS S3 for storage and backup.
- Data Transformation:
- Calculate
Site Engagement Ratio
. - Categorize users by age, gender, income, and engagement levels.
- Calculate
- Analytical Summaries:
- Engagement trends by age group and gender.
- Click-through analysis by income bracket.
- Ad topic performance evaluation.
- Demographic-based engagement analysis.
- Integration:
- Export analytical results to local CSV files.
- Automatic uploads to AWS S3 for storage.
- PySpark for data transformation and analysis.
- Pandas for data handling.
- Boto3 for AWS S3 interaction.
-
Python Environment:
- Ensure Python is installed, along with the required libraries:
- PySpark
- Pandas
- Boto3
- Use the following command to install dependencies if needed:
pip install pyspark pandas boto3
- Ensure Python is installed, along with the required libraries:
-
AWS Configuration:
- Set up AWS credentials for S3 access. You can use the AWS CLI or set up a
.aws/credentials
file.
- Set up AWS credentials for S3 access. You can use the AWS CLI or set up a
-
Data File:
- Place the
advertising.csv
file in the directory specified in the script.
- Place the
- Clone the Repository:
git clone https://github.com/your-username/big-data-advertising-analysis.git cd big-data-advertising-analysis