PDF Text Extractor

This is a Streamlit application that allows users to upload PDF files, extract text from them, and display the extracted text in a clean content.

Features

Upload PDF Files: Easily upload PDF documents for text extraction.
Text Extraction: Extracts and processes text from PDFs using PyMuPDF (Fitz).
Interactive Interface: User-friendly interface to upload files and view results instantly.

Requirements

Python 3.9 or higher
Libraries:
- streamlit
- pymupdf
- unidecode

Installation

Clone this repository or download the code:

   https://github.com/edgelearningcentre/pdf2text_parser.git
   cd pdf2text_parser

Install the required Python packages:
```
pip install -r requirements.txt
```
Run the Streamlit app:
```
streamlit run app.py
```

How to Use

Open the Streamlit app in your browser (usually at http://localhost:8501).
Upload a PDF file using the file uploader.
Click the "Extract Text" button.
View the extracted and formatted text displayed in the app.

Example Output

Uploaded PDF: A research paper or document.
Extracted Text: Structured content

Limitations

The accuracy of heading detection relies on simple heuristics and may require adjustments for complex PDF layouts.
Currently supports only text-based PDFs, not scanned image PDFs.

Contributing

Contributions are welcome! Feel free to submit issues or pull requests to improve this app.

License

This project is licensed under the MIT License. See the LICENSE file for more details.

Author: edgelearningcentre Contact: [email protected]

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
app.py		app.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PDF Text Extractor

Features

Requirements

Installation

How to Use

Example Output

Limitations

Contributing

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

PDF Text Extractor

Features

Requirements

Installation

How to Use

Example Output

Limitations

Contributing

License

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages