📂 Data-Extractor-Application

This repository contains Python scripts that demonstrate how to develop a Python-based application designed to streamline the process of converting PDF or PNG files into structured JSON payloads using advanced machine learning vision technologies (an API integration with OpenAI's GPT-V).

💻 Technologies Used

Python
AWS S3 (Simple Storage Service)
MongoDB
OpenAI API (GPT-4 Vision)
Streamlit
Boto3
Base64

🌟 Features

Here's what you can do with DataExtractorPro:

Upload Your Files: Easily upload PDF or PNG files. Just drag and drop your documents into the application, and let DataExtractorPro handle the rest.

Automatic Data Extraction: Once you upload a file, our ML vision technology kicks in, analyzing your document and extracting structured data. Whether it's text from a PNG or data points from a PDF, we've got it covered.

Review and Confirm: After extraction, you'll see a neatly organized preview of the extracted data. If something doesn't look right, you can directly edit the information on-screen. Confirm when you're satisfied to proceed.

Data Structuring: Your confirmed data is automatically structured into a JSON payload, ready for any API or database. You see exactly how your data is organized and can make last-minute tweaks if needed.

Save and Store: With a click, your original file and the structured JSON payload are securely saved in our database. Perfect for building a rich dataset for ML training purposes.

Zoom & Edit for Precision: Zoom in to review details or zoom out for a broader view. Essential for those intricate data points you don't want to miss.

Pan Through Your Upload History: Navigate through your past uploads and extracted data with ease. It's like having an infinite canvas of your work, ready for review or further editing.

⚙️ The Process

Development Phases:

• Backend Development: I focused on creating a scalable and efficient backend structure that could handle .pdf and .png file uploads, convert PDFs to images, and interact with ML models for data extraction.

• Integration of ML Models: Integrating the ML models was a pivotal phase. I experimented with different models and APIs to find the most accurate and efficient solution for our data extraction needs.

• Frontend Development with Streamlit: Designing an intuitive and user-friendly interface with Streamlit was crucial. I aimed for simplicity, enabling users to easily upload files, view extracted data, and make corrections if necessary.

• Database Integration: The final step involved setting up database connections to store the images and JSON payloads securely, focusing on future scalability and data retrieval for ML training sets.

📚 What I Learned

• Continuous Learning: This project was a testament to the ever-evolving nature of technology and the need for continuous learning and adaptation as a developer.

🎥 Demo Video

Demo.Video.MP4

Name		Name	Last commit message	Last commit date
Latest commit History 17 Commits
env		env
src		src
README.md		README.md
main.py		main.py
requirement.txt		requirement.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

📂 Data-Extractor-Application

💻 Technologies Used

🌟 Features

⚙️ The Process

📚 What I Learned

🎥 Demo Video

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

📂 Data-Extractor-Application

💻 Technologies Used

🌟 Features

⚙️ The Process

📚 What I Learned

🎥 Demo Video

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages