Skip to content

A PySpark-powered project for analyzing advertising campaign data. This project transforms raw CSV data into actionable insights, summarizing user engagement across demographics, income levels, and more. Includes seamless integration with AWS S3 for storage and backup.

Notifications You must be signed in to change notification settings

qtle3/AdCampaignETL

Repository files navigation

Big Data Advertising Analysis

A PySpark-powered project for analyzing advertising campaign data. This project transforms raw CSV data into actionable insights, summarizing user engagement across demographics, income levels, and more. Includes seamless integration with AWS S3 for storage and backup.

Features

  • Data Transformation:
    • Calculate Site Engagement Ratio.
    • Categorize users by age, gender, income, and engagement levels.
  • Analytical Summaries:
    • Engagement trends by age group and gender.
    • Click-through analysis by income bracket.
    • Ad topic performance evaluation.
    • Demographic-based engagement analysis.
  • Integration:
    • Export analytical results to local CSV files.
    • Automatic uploads to AWS S3 for storage.

Technologies Used

  • PySpark for data transformation and analysis.
  • Pandas for data handling.
  • Boto3 for AWS S3 interaction.

Prerequisites

  1. Python Environment:

    • Ensure Python is installed, along with the required libraries:
      • PySpark
      • Pandas
      • Boto3
    • Use the following command to install dependencies if needed:
      pip install pyspark pandas boto3
  2. AWS Configuration:

    • Set up AWS credentials for S3 access. You can use the AWS CLI or set up a .aws/credentials file.
  3. Data File:

    • Place the advertising.csv file in the directory specified in the script.

Setup and Usage

  1. Clone the Repository:
    git clone https://github.com/your-username/big-data-advertising-analysis.git
    cd big-data-advertising-analysis

About

A PySpark-powered project for analyzing advertising campaign data. This project transforms raw CSV data into actionable insights, summarizing user engagement across demographics, income levels, and more. Includes seamless integration with AWS S3 for storage and backup.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages