This project showcases a comprehensive method to extract and recognize information from gas station price tables using various image processing techniques and a custom-trained machine learning model.
This project is designed to process images of gas station price tables, extract text and numbers, and accurately recognize and display the prices. The method leverages a custom-trained machine learning model on the SHVN Dataset, along with various image processing techniques to handle different table layouts and formats. The model recognizes standard formats and is a work in progress, currently identifying the majority of gas station tables.
This project integrates multiple components into a cohesive system for extracting and recognizing prices from gas station tables. The pipeline consists of the following steps:
-
Image Preprocessing:
- Enhances image quality (contrast adjustment, resizing, etc.).
- Applies contour detection to locate regions of interest.
-
Optical Character Recognition (OCR):
- Uses PaddleOCR to extract text from identified regions.
-
Custom ML Model:
- Processes segmented images to predict numeric values using a TensorFlow Lite model.
-
Post-Processing:
- Matches recognized text with detected numbers.
- Validates and formats the final output to ensure accuracy.
- Supported Formats:
.png
,.jpg
- Recommended Resolution: 300 DPI or higher for best accuracy.
- Machine Learning: Utilizes convolutional neural networks (CNNs) to train a custom model for recognizing numeric and textual information from images.
- Image Preprocessing: Enhances image quality by adjusting contrast, converting to grayscale, and resizing images for optimal processing.
- Contour Detection and Processing: Identifies and processes contours to segment regions of interest, isolating specific areas containing price information.
- Optical Character Recognition (OCR): Uses PaddleOCR to extract text from images, converting visual information into readable and actionable data.
- Image Enhancement: Improves image resolution and clarity to aid in accurate text and number extraction.
- Model Prediction: Employs the trained model to predict prices from the segmented regions, ensuring precise recognition.
- Post-Processing: Matches and validates extracted text and numbers to generate accurate and reliable results.
- Adaptive Learning: Continuously improves the model based on new data inputs, enabling it to handle various and evolving price table formats effectively.
This repository showcases how to integrate machine learning models, OCR, and image processing into a complete solution. It demonstrates:
- Seamless ML Integration: Combining PaddleOCR and a TensorFlow Lite model into a single pipeline.
- Error Resilience: Robust handling of poor-quality images and unexpected input formats.
- Scalable Design: Designed for batch processing and easy extension with additional OCR models.
- Lightweight Deployment: The system is optimized to be lightweight, enabling seamless integration as part of a mobile or web app.
This system is designed to be modular and easily extendable:
- Replaceable OCR Engine: Swap out PaddleOCR with another OCR library like Tesseract or EasyOCR.
- Model Customization: Retrain the TensorFlow Lite model on custom datasets for other use cases.
- Flexible Output Formats: Extend post-processing to generate outputs in JSON, Excel, or other formats.
- Mobile-Friendly Design: Its lightweight architecture makes it suitable for deployment within apps, ensuring efficient on-device processing.
To get started, follow these steps:
-
Open Command Prompt as an administrator:
- Press
Win + S
, search for Command Prompt. - Right-click on Command Prompt and select Run as administrator.
- Press
-
Clone the repository and navigate to the project directory:
git clone https://github.com/lodist/Gas-Station-Price-Table-OCR.git
cd Gas-Station-Price-Table-OCR
pip install -r requirements.txt
To use the code, simply call the process_image
function with the path to your image:
from OCR_gas_station_table import process_image
result = process_image(image_path='path/to/your/image.png')
print(result)
result = process_image(image_path='sample_gas_station_image.png')
print(result)
{'Bf 95': '1.88', 'Bf 98': '1.96', 'Diesel': '1.95'}
The model used in this project is trained on the SHVN Dataset and then converted to TensorFlow Lite format. If you wish to train your own model or retrain the existing model, you can follow these steps:
- Prepare the Dataset: Download and preprocess the SHVN Dataset.
- Train the Model: Use the training script provided in the repository to train the model.
- Convert to TFLite: Convert the trained model to TensorFlow Lite format for efficient deployment.
After the initial deployment, which takes approximately 30 seconds, the script processes each image in 2-3 seconds to produce results. This performance ensures quick and efficient processing for practical applications.
Contributions are welcome! If you have any suggestions or improvements, please create a pull request or open an issue.
This project is licensed under the Creative Commons Attribution-NonCommercial 4.0 International License. See the LICENSE file for more details.
For any inquiries or commercial use, please contact me at lorisdistefano@protonmail.com.