This project demonstrates a multi-stage Docker build that:
- 🕷️ Scrapes data from a user-provided URL using Node.js, Puppeteer, and Chromium
- 🐍 Serves the scraped data as JSON via a lightweight Flask web server
- 🐳 Uses Docker multi-stage builds to keep the final image small and efficient
- Node.js 18 (slim)
- Puppeteer (headless browser automation)
- Chromium
- Python 3.10 (slim)
- Flask (Python web server)
- Docker (Multi-stage build)
project/
├── Dockerfile # Multi-stage Dockerfile (Node.js + Python)
├── scrape.js # Puppeteer script to scrape title and heading
├── server.py # Flask server to serve scraped JSON
├── requirements.txt # Flask dependency
- Installs Chromium and Puppeteer
- Accepts a
SCRAPE_URLas a build argument - Uses Puppeteer to scrape:
<title>of the page- First
<h1>heading
- Outputs
scraped_data.json
- Copies only
scraped_data.jsoninto a minimal Python image - Runs a Flask server that serves the JSON on
/
- Docker Desktop installed on Windows 11 (or other OS)
docker build --build-arg SCRAPE_URL=https://example.com -t scraper-server .Replace https://example.com with the target website you want to scrape.
🏃 Run the Container
docker run -p 5000:5000 scraper-serverThen open your browser and navigate to:
✅ Example Output
{
"title": "Example Domain",
"heading": "Example Domain"
}