Skip to content

SimplicityGuy/discogsography

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

964 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

discogsography

Build Code Quality Tests E2E Tests codecov License: MIT Python 3.13+ Rust uv just Ruff Cargo Clippy pre-commit mypy Bandit Docker Claude Code

A modern Python 3.13+ microservices platform for transforming the complete Discogs music database into powerful, queryable knowledge graphs and analytics engines.

🚀 Quick Start | 📖 Documentation | 🎯 Features | 💬 Community

🎯 What is Discogsography?

Discogsography transforms monthly Discogs data dumps (~11.3GB compressed XML) into:

  • 🔗 Neo4j Graph Database: Navigate complex music industry relationships
  • 🐘 PostgreSQL Database: High-performance queries and full-text search
  • 🔍 Interactive Explorer: Graph visualisation, trends, and path discovery
  • 📊 Real-time Dashboard: Monitor system health and processing metrics
  • 🎵 MusicBrainz Enrichment: Cross-reference with MusicBrainz for metadata, relationships, and external links

Perfect for music researchers, data scientists, developers, and music enthusiasts who want to explore the world's largest music database.

🏛️ Architecture Overview

⚙️ Core Services

Service Purpose Key Technologies
🔐 API User accounts, JWT auth, and collection sync FastAPI, psycopg3, redis, Discogs OAuth 1.0
📊 Dashboard Real-time monitoring and admin panel FastAPI, WebSocket, reactive UI
🔍 Explore Serves graph exploration frontend (static files) FastAPI, Tailwind CSS, Alpine.js, D3.js, Plotly.js
Extractor High-performance Rust-based extractor tokio, quick-xml, lapin
🔗 Graphinator Builds Neo4j knowledge graphs neo4j-driver, graph algorithms
🔧 Schema-Init One-shot database schema initializer neo4j-driver, psycopg3
🐘 Tableinator Creates PostgreSQL analytics tables psycopg3, JSONB, full-text search
📈 Insights Precomputed analytics and music trends FastAPI, psycopg3, httpx
🤖 MCP Server Exposes knowledge graph to AI assistants FastMCP, httpx

🎵 MusicBrainz Enrichment Services

Service Purpose Key Technologies
🧠 Brainzgraphinator Enriches Neo4j graph with MusicBrainz metadata and relationships neo4j-driver, pika
🧬 Brainztableinator Populates PostgreSQL with MusicBrainz data and external links psycopg3, pika

📐 System Architecture

graph TD
    S3[("🌐 Discogs S3<br/>Data Dumps")]
    MB[("🎵 MusicBrainz<br/>JSONL Dumps")]

    subgraph Pipeline ["Data Pipeline"]
        EXT[["⚡ Extractor"]]
        RMQ{{"🐰 RabbitMQ"}}
        GRAPH[["🔗 Graphinator"]]
        TABLE[["🐘 Tableinator"]]
    end

    subgraph MBPipeline ["MusicBrainz Enrichment"]
        BGRAPH[["🧠 Brainzgraphinator"]]
        BTABLE[["🧬 Brainztableinator"]]
    end

    subgraph Storage ["Storage"]
        NEO4J[("🔗 Neo4j")]
        PG[("🐘 PostgreSQL")]
        REDIS[("🔴 Redis")]
    end

    subgraph Services ["User-Facing Services"]
        API[["🔐 API"]]
        EXPLORE[["🔍 Explore"]]
        DASH[["📊 Dashboard"]]
        INSIGHTS[["📈 Insights"]]
    end

    S3 --> EXT --> RMQ
    MB --> EXT
    RMQ --> GRAPH --> NEO4J
    RMQ --> TABLE --> PG
    RMQ --> BGRAPH --> NEO4J
    RMQ --> BTABLE --> PG

    API --- NEO4J & PG & REDIS
    EXPLORE --- API
    INSIGHTS -.-> API
    INSIGHTS --- PG & REDIS
    DASH -.- RMQ & NEO4J & PG & REDIS

    style S3 fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    style MB fill:#e1f5fe,stroke:#01579b,stroke-width:2px
    style EXT fill:#ffccbc,stroke:#d84315,stroke-width:2px
    style RMQ fill:#fff3e0,stroke:#e65100,stroke-width:2px
    style NEO4J fill:#f3e5f5,stroke:#4a148c,stroke-width:2px
    style PG fill:#e8f5e9,stroke:#1b5e20,stroke-width:2px
    style REDIS fill:#ffebee,stroke:#b71c1c,stroke-width:2px
    style GRAPH fill:#e0f2f1,stroke:#004d40,stroke-width:2px
    style TABLE fill:#fce4ec,stroke:#880e4f,stroke-width:2px
    style BGRAPH fill:#e0f2f1,stroke:#004d40,stroke-width:2px
    style BTABLE fill:#fce4ec,stroke:#880e4f,stroke-width:2px
    style API fill:#e3f2fd,stroke:#0d47a1,stroke-width:2px
    style EXPLORE fill:#e8eaf6,stroke:#283593,stroke-width:2px
    style DASH fill:#fce4ec,stroke:#880e4f,stroke-width:2px
    style INSIGHTS fill:#fff9c4,stroke:#f57f17,stroke-width:2px
Loading

See Architecture Overview for detailed diagrams covering data pipeline, service communication, and message queue structure.

🌟 Key Features

  • ⚡ High-Speed Processing: ~130–480 records/second end-to-end throughput per data type with Rust-based extractor
  • 🔄 Smart Deduplication: SHA256 hash-based change detection prevents reprocessing
  • 📈 Handles Big Data: Processes 19M+ releases, 10M+ artists across ~11.3GB compressed XML
  • 🔁 Auto-Recovery: Automatic retries with exponential backoff and dead letter queues
  • 🐋 Container Security: Non-root users, read-only filesystems, dropped capabilities
  • 📝 Type Safety: Full type hints with strict mypy validation and Bandit security scanning
  • ✅ Comprehensive Testing: Unit, integration, and E2E tests with Playwright
  • 🚀 Query Performance: 249x overall query performance optimization across 88 endpoints (PRs #175–#184), plus configurable data quality rules for extraction validation (#187) — see Recent Improvements

🚀 Quick Start

# Clone and start all services
git clone https://github.com/SimplicityGuy/discogsography.git
cd discogsography
docker-compose up -d

# Access the dashboard
open http://localhost:8003
Service URL Default Credentials
🔐 API http://localhost:8004 Register via /api/auth/register
📊 Dashboard http://localhost:8003 None
🔗 Neo4j http://localhost:7474 neo4j / discogsography
🐘 PostgreSQL localhost:5433 discogsography / discogsography
🐰 RabbitMQ http://localhost:15672 discogsography / discogsography

See the Quick Start Guide for prerequisites, local development setup, and environment configuration.

📖 Documentation

🚀 Getting Started

Document Purpose
Quick Start Guide ⚡ Get Discogsography running in minutes
Configuration Guide ⚙️ Complete environment variable and settings reference
Architecture Overview 🏛️ System architecture, components, data flow, and scale
CLAUDE.md 🤖 Claude Code integration guide & development standards

💡 Usage & Data

Document Purpose
Usage Examples 💡 Neo4j Cypher and PostgreSQL query examples
Database Schema 🗄️ Complete Neo4j graph model and PostgreSQL schema
Monitoring Guide 📊 Real-time dashboard, metrics, and debug utilities

👨‍💻 Development

Document Purpose
Development Guide 💻 Project structure, tooling, and developer workflow
Testing Guide 🧪 Unit, integration, and E2E testing with Playwright
Logging Guide 📊 Structured logging standards and emoji conventions
Contributing Guide 🤝 How to contribute: process, standards, and PR flow
Python Version Management 🐍 Managing Python 3.13+ across the project

🔧 Operations

Document Purpose
Troubleshooting Guide 🔧 Common issues, solutions, and debugging steps
Maintenance Guide 🔄 Package upgrades, dependency management
Performance Guide ⚡ Database tuning, hardware specs, optimization
Database Resilience 💾 Database connection patterns & error handling
MusicBrainz Sync Guide 🎵 MusicBrainz data import and enrichment operations

🐋 Infrastructure & CI/CD

Document Purpose
Dockerfile Standards 🐋 Best practices for writing Dockerfiles
Docker Security 🔒 Container hardening & security practices
GitHub Actions Guide 🚀 CI/CD workflows, automation & best practices
Task Automation ⚙️ Complete just and uv run task command reference
Monorepo Guide 📦 Managing Python monorepo with shared dependencies

📋 Reference

Document Purpose
State Marker System 📋 Extraction progress tracking & safe restart system
State Marker Periodic Updates 💾 Periodic state saves and crash recovery
Consumer Cancellation 🔄 File completion and consumer lifecycle management
File Completion Tracking 📊 Intelligent completion tracking and stall detection
Neo4j Indexing 🔗 Advanced Neo4j indexing strategies
Platform Targeting 🎯 Cross-platform compatibility guidelines
Emoji Guide 📋 Standardized emoji usage across the project
Recent Improvements 🚀 Latest platform enhancements and changelog

💬 Support & Community

📄 License

This project is licensed under the MIT License — see the LICENSE file for details.

Other Discogs Projects

Some other projects working with the monthly Discogs data dump.

🙏 Acknowledgments

  • 🎵 Discogs for providing the monthly data dumps
  • 🎵 MusicBrainz for the open music encyclopedia and twice-weekly JSONL dumps
  • 🚀 uv for blazing-fast package management
  • 🔥 Ruff for lightning-fast linting
  • 🐍 The Python community for excellent libraries and tools
  • 🦀 The Rust community for excellent libraries and amazing performance

Made with ❤️ in the Pacific Northwest