A modern Python 3.13+ microservices platform for transforming the complete Discogs music database into powerful, queryable knowledge graphs and analytics engines.
🚀 Quick Start | 📖 Documentation | 🎯 Features | 💬 Community
Discogsography transforms monthly Discogs data dumps (~11.3GB compressed XML) into:
- 🔗 Neo4j Graph Database: Navigate complex music industry relationships
- 🐘 PostgreSQL Database: High-performance queries and full-text search
- 🔍 Interactive Explorer: Graph visualisation, trends, and path discovery
- 📊 Real-time Dashboard: Monitor system health and processing metrics
- 🎵 MusicBrainz Enrichment: Cross-reference with MusicBrainz for metadata, relationships, and external links
Perfect for music researchers, data scientists, developers, and music enthusiasts who want to explore the world's largest music database.
| Service | Purpose | Key Technologies |
|---|---|---|
| 🔐 API | User accounts, JWT auth, and collection sync | FastAPI, psycopg3, redis, Discogs OAuth 1.0 |
| 📊 Dashboard | Real-time monitoring and admin panel | FastAPI, WebSocket, reactive UI |
| 🔍 Explore | Serves graph exploration frontend (static files) | FastAPI, Tailwind CSS, Alpine.js, D3.js, Plotly.js |
| ⚡ Extractor | High-performance Rust-based extractor | tokio, quick-xml, lapin |
| 🔗 Graphinator | Builds Neo4j knowledge graphs | neo4j-driver, graph algorithms |
| 🔧 Schema-Init | One-shot database schema initializer | neo4j-driver, psycopg3 |
| 🐘 Tableinator | Creates PostgreSQL analytics tables | psycopg3, JSONB, full-text search |
| 📈 Insights | Precomputed analytics and music trends | FastAPI, psycopg3, httpx |
| 🤖 MCP Server | Exposes knowledge graph to AI assistants | FastMCP, httpx |
| Service | Purpose | Key Technologies |
|---|---|---|
| 🧠 Brainzgraphinator | Enriches Neo4j graph with MusicBrainz metadata and relationships | neo4j-driver, pika |
| 🧬 Brainztableinator | Populates PostgreSQL with MusicBrainz data and external links | psycopg3, pika |
graph TD
S3[("🌐 Discogs S3<br/>Data Dumps")]
MB[("🎵 MusicBrainz<br/>JSONL Dumps")]
subgraph Pipeline ["Data Pipeline"]
EXT[["⚡ Extractor"]]
RMQ{{"🐰 RabbitMQ"}}
GRAPH[["🔗 Graphinator"]]
TABLE[["🐘 Tableinator"]]
end
subgraph MBPipeline ["MusicBrainz Enrichment"]
BGRAPH[["🧠 Brainzgraphinator"]]
BTABLE[["🧬 Brainztableinator"]]
end
subgraph Storage ["Storage"]
NEO4J[("🔗 Neo4j")]
PG[("🐘 PostgreSQL")]
REDIS[("🔴 Redis")]
end
subgraph Services ["User-Facing Services"]
API[["🔐 API"]]
EXPLORE[["🔍 Explore"]]
DASH[["📊 Dashboard"]]
INSIGHTS[["📈 Insights"]]
end
S3 --> EXT --> RMQ
MB --> EXT
RMQ --> GRAPH --> NEO4J
RMQ --> TABLE --> PG
RMQ --> BGRAPH --> NEO4J
RMQ --> BTABLE --> PG
API --- NEO4J & PG & REDIS
EXPLORE --- API
INSIGHTS -.-> API
INSIGHTS --- PG & REDIS
DASH -.- RMQ & NEO4J & PG & REDIS
style S3 fill:#e1f5fe,stroke:#01579b,stroke-width:2px
style MB fill:#e1f5fe,stroke:#01579b,stroke-width:2px
style EXT fill:#ffccbc,stroke:#d84315,stroke-width:2px
style RMQ fill:#fff3e0,stroke:#e65100,stroke-width:2px
style NEO4J fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
style PG fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px
style REDIS fill:#ffebee,stroke:#b71c1c,stroke-width:2px
style GRAPH fill:#e0f2f1,stroke:#004d40,stroke-width:2px
style TABLE fill:#fce4ec,stroke:#880e4f,stroke-width:2px
style BGRAPH fill:#e0f2f1,stroke:#004d40,stroke-width:2px
style BTABLE fill:#fce4ec,stroke:#880e4f,stroke-width:2px
style API fill:#e3f2fd,stroke:#0d47a1,stroke-width:2px
style EXPLORE fill:#e8eaf6,stroke:#283593,stroke-width:2px
style DASH fill:#fce4ec,stroke:#880e4f,stroke-width:2px
style INSIGHTS fill:#fff9c4,stroke:#f57f17,stroke-width:2px
See Architecture Overview for detailed diagrams covering data pipeline, service communication, and message queue structure.
- ⚡ High-Speed Processing: ~130–480 records/second end-to-end throughput per data type with Rust-based extractor
- 🔄 Smart Deduplication: SHA256 hash-based change detection prevents reprocessing
- 📈 Handles Big Data: Processes 19M+ releases, 10M+ artists across ~11.3GB compressed XML
- 🔁 Auto-Recovery: Automatic retries with exponential backoff and dead letter queues
- 🐋 Container Security: Non-root users, read-only filesystems, dropped capabilities
- 📝 Type Safety: Full type hints with strict mypy validation and Bandit security scanning
- ✅ Comprehensive Testing: Unit, integration, and E2E tests with Playwright
- 🚀 Query Performance: 249x overall query performance optimization across 88 endpoints (PRs #175–#184), plus configurable data quality rules for extraction validation (#187) — see Recent Improvements
# Clone and start all services
git clone https://github.com/SimplicityGuy/discogsography.git
cd discogsography
docker-compose up -d
# Access the dashboard
open http://localhost:8003| Service | URL | Default Credentials |
|---|---|---|
| 🔐 API | http://localhost:8004 | Register via /api/auth/register |
| 📊 Dashboard | http://localhost:8003 | None |
| 🔗 Neo4j | http://localhost:7474 | neo4j / discogsography |
| 🐘 PostgreSQL | localhost:5433 |
discogsography / discogsography |
| 🐰 RabbitMQ | http://localhost:15672 | discogsography / discogsography |
See the Quick Start Guide for prerequisites, local development setup, and environment configuration.
| Document | Purpose |
|---|---|
| Quick Start Guide | ⚡ Get Discogsography running in minutes |
| Configuration Guide | ⚙️ Complete environment variable and settings reference |
| Architecture Overview | 🏛️ System architecture, components, data flow, and scale |
| CLAUDE.md | 🤖 Claude Code integration guide & development standards |
| Document | Purpose |
|---|---|
| Usage Examples | 💡 Neo4j Cypher and PostgreSQL query examples |
| Database Schema | 🗄️ Complete Neo4j graph model and PostgreSQL schema |
| Monitoring Guide | 📊 Real-time dashboard, metrics, and debug utilities |
| Document | Purpose |
|---|---|
| Development Guide | 💻 Project structure, tooling, and developer workflow |
| Testing Guide | 🧪 Unit, integration, and E2E testing with Playwright |
| Logging Guide | 📊 Structured logging standards and emoji conventions |
| Contributing Guide | 🤝 How to contribute: process, standards, and PR flow |
| Python Version Management | 🐍 Managing Python 3.13+ across the project |
| Document | Purpose |
|---|---|
| Troubleshooting Guide | 🔧 Common issues, solutions, and debugging steps |
| Maintenance Guide | 🔄 Package upgrades, dependency management |
| Performance Guide | ⚡ Database tuning, hardware specs, optimization |
| Database Resilience | 💾 Database connection patterns & error handling |
| MusicBrainz Sync Guide | 🎵 MusicBrainz data import and enrichment operations |
| Document | Purpose |
|---|---|
| Dockerfile Standards | 🐋 Best practices for writing Dockerfiles |
| Docker Security | 🔒 Container hardening & security practices |
| GitHub Actions Guide | 🚀 CI/CD workflows, automation & best practices |
| Task Automation | ⚙️ Complete just and uv run task command reference |
| Monorepo Guide | 📦 Managing Python monorepo with shared dependencies |
| Document | Purpose |
|---|---|
| State Marker System | 📋 Extraction progress tracking & safe restart system |
| State Marker Periodic Updates | 💾 Periodic state saves and crash recovery |
| Consumer Cancellation | 🔄 File completion and consumer lifecycle management |
| File Completion Tracking | 📊 Intelligent completion tracking and stall detection |
| Neo4j Indexing | 🔗 Advanced Neo4j indexing strategies |
| Platform Targeting | 🎯 Cross-platform compatibility guidelines |
| Emoji Guide | 📋 Standardized emoji usage across the project |
| Recent Improvements | 🚀 Latest platform enhancements and changelog |
- 🐛 Bug Reports: GitHub Issues
- 💡 Feature Requests: GitHub Discussions
- 💬 Questions: Discussions Q&A
- 📚 Full Documentation: docs/README.md
This project is licensed under the MIT License — see the LICENSE file for details.
Some other projects working with the monthly Discogs data dump.
- 🎵 Discogs for providing the monthly data dumps
- 🎵 MusicBrainz for the open music encyclopedia and twice-weekly JSONL dumps
- 🚀 uv for blazing-fast package management
- 🔥 Ruff for lightning-fast linting
- 🐍 The Python community for excellent libraries and tools
- 🦀 The Rust community for excellent libraries and amazing performance