Multi-Stage Web Scraper with Node.js, Puppeteer & Flask

This project demonstrates a multi-stage Docker build that:

🕷️ Scrapes data from a user-provided URL using Node.js, Puppeteer, and Chromium
🐍 Serves the scraped data as JSON via a lightweight Flask web server
🐳 Uses Docker multi-stage builds to keep the final image small and efficient

🛠 Technologies Used

Node.js 18 (slim)
Puppeteer (headless browser automation)
Chromium
Python 3.10 (slim)
Flask (Python web server)
Docker (Multi-stage build)

📁 Project Structure

project/
├── Dockerfile             # Multi-stage Dockerfile (Node.js + Python)
├── scrape.js              # Puppeteer script to scrape title and heading
├── server.py              # Flask server to serve scraped JSON
├── requirements.txt       # Flask dependency

⚙️ How It Works

1️⃣ Scraper Stage (Node.js + Puppeteer)

Installs Chromium and Puppeteer
Accepts a SCRAPE_URL as a build argument
Uses Puppeteer to scrape:
- <title> of the page
- First <h1> heading
Outputs scraped_data.json

2️⃣ Final Stage (Python + Flask)

Copies only scraped_data.json into a minimal Python image
Runs a Flask server that serves the JSON on /

🚀 Getting Started

✅ Prerequisites

Docker Desktop installed on Windows 11 (or other OS)

🧱 Build the Docker Image

docker build --build-arg SCRAPE_URL=https://example.com -t scraper-server .

Replace https://example.com with the target website you want to scrape.

🏃 Run the Container

docker run -p 5000:5000 scraper-server

Then open your browser and navigate to:

http://localhost:5000

✅ Example Output

{
  "title": "Example Domain",
  "heading": "Example Domain"
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Multi-Stage Web Scraper with Node.js, Puppeteer & Flask

🛠 Technologies Used

📁 Project Structure

⚙️ How It Works

1️⃣ Scraper Stage (Node.js + Puppeteer)

2️⃣ Final Stage (Python + Flask)

🚀 Getting Started

✅ Prerequisites

🧱 Build the Docker Image

About

Uh oh!

Releases

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
Dockerfile		Dockerfile
README.md		README.md
requirements.txt		requirements.txt
scrape.js		scrape.js
server.py		server.py

Folders and files

Latest commit

History

Repository files navigation

Multi-Stage Web Scraper with Node.js, Puppeteer & Flask

🛠 Technologies Used

📁 Project Structure

⚙️ How It Works

1️⃣ Scraper Stage (Node.js + Puppeteer)

2️⃣ Final Stage (Python + Flask)

🚀 Getting Started

✅ Prerequisites

🧱 Build the Docker Image

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages