A web application for creating YouTube-ready coding tutorials. Choose between a Markdown editor with a fixed 16:9 live preview or a freeform Excalidraw canvas per slide. Capture at 1920x1080 with Ctrl+S, add TTS narration, paste/upload images, clone slides, edit in fullscreen, and export as a ZIP of PNGs or a high-quality MP4 video with narrated audio.
| Layer | Technology |
|---|---|
| Frontend | React 19, Vite, TypeScript, Tailwind CSS v4 |
| Editors | Markdown (@uiw/react-md-editor + rehype-highlight), Canvas (Excalidraw) |
| Screenshot | html2canvas (Markdown), Excalidraw exportToBlob (Canvas) — both 1920x1080 |
| TTS | PocketTTS (voice cloning from a single audio sample) |
| Audio | scipy + ffmpeg (WAV generation, MP3 encoding at 128kbps) |
| Export | JSZip + file-saver, ffmpeg (server-side H.264 video with AAC audio) |
| Backend | FastAPI, SQLAlchemy 2 (async), Pydantic v2 |
| Database | SQLite (via aiosqlite) |
| Package Manager | npm (frontend), uv (backend) |
- Node.js >= 18
- Python >= 3.12
- uv — install via
curl -LsSf https://astral.sh/uv/install.sh | sh - ffmpeg — required for MP4 video export and audio conversion
cd backend
uv synccd frontend
npm installPlace a voice sample MP3 file at backend/data/voice-sample.mp3. This file is used by PocketTTS for voice cloning. You can configure the path via the VOICE_SAMPLE_PATH environment variable.
From the project root, run both servers with a single command:
./dev.shPorts are auto-detected — if the defaults (8000/5173) are busy, the next available port is used automatically. The actual URLs are printed on startup.
Or run individually if needed:
Backend (default port 8000):
cd backend
uv run uvicorn app.main:app --reload --port 8000Frontend (default port 5173):
cd frontend
npm run devOpen http://localhost:5173 in your browser.
API docs are available at http://localhost:8000/docs (Swagger UI).
- Home — Click "Create New Recording" and enter a title
- Choose mode — Toggle between Markdown and Canvas (Excalidraw) per slide using the mode switcher at the top of the editor
- Markdown mode — Write Markdown content in a split-pane editor. The right half is a fixed 16:9 (1920x1080) live preview with YouTube-optimized font sizes. Content that overflows is clipped — what you see is exactly what gets captured
- Canvas mode — Use the full Excalidraw editor (shapes, text, arrows, images, hand-drawn style) with a configurable per-slide background color. Draw freely within the 16:9 frame
- Images — In Markdown mode, paste images directly or click the image toolbar button to pick a file. Images are uploaded to the server and inserted as clean URLs. In Canvas mode, paste images directly onto the canvas
- Capture — Press
Ctrl+S(Cmd+Son Mac) or click "Screenshot" to capture the current editor as a 1920x1080 PNG slide. Markdown useshtml2canvas; Canvas uses Excalidraw's nativeexportToBlob - Slides — Captured screenshots appear as numbered thumbnails in the right panel with mode badges (MD/Canvas), selection checkboxes, duration controls, and narration support
- Re-edit — Click "Edit" on any slide to load its content back into the editor (Markdown text or canvas scene). The toolbar shows a "Save Edit" button and an amber "Editing Slide #N" badge. Press
Ctrl+Sto re-capture — the existing slide is updated in-place, keeping its narration and audio intact. Click "Cancel" to exit edit mode - Clone — Click "Clone" on any slide to duplicate it with all content (image, editor mode, code/scene data, narration text). Audio is not cloned — regenerate TTS for the copy. Padding resets to defaults
- Fullscreen — Click "Fullscreen" in the toolbar to expand the editor to fill the entire viewport. A floating toolbar at the top-center provides Screenshot/Save Edit, Cancel, mode badge, and Exit controls. The 16:9 aspect ratio is maintained. Markdown mode stays split-pane. Press
Ctrl+Sto capture from fullscreen - Narration — Open the narration panel on any slide, enter text (with an expand-to-modal option), and click "Generate" to create TTS audio using a cloned voice
- Export — Export selected slides as a ZIP or an MP4 video (1920x1080, H.264, CRF 18 high-quality) with narrated audio and configurable padding
- Manage — View, edit titles, or delete recordings from the Recordings list page
Each slide has an editor_mode of either "markdown" or "canvas":
- Markdown — Split-pane editor with live preview. Preview is clipped to 16:9 with YouTube-optimized fonts (18px base, 36px H1, 15px code). Captured via
html2canvasat 1920x1080 - Canvas — Full Excalidraw editor with dark theme, all tools enabled. Background color is configurable per slide via a color picker that syncs with the live canvas. Pasted images are captured via the
onChangecallback and included in exports. Captured via Excalidraw'sexportToBlob, composited centered on 1920x1080
Scenes (including embedded images) are persisted as JSON in the backend. Re-editing loads the scene back into Excalidraw. Canvas scene changes are auto-saved with a 1-second debounce.
Markdown mode: html2canvas renders the preview pane to a canvas at exactly 1920x1080 pixels. The canvas is converted to a PNG blob and uploaded along with the Markdown source.
Canvas mode: Excalidraw's exportToBlob API exports the current scene elements at their natural bounding box size, then composites the result centered and scaled to fit a 1920x1080 canvas with the configured background color. Embedded images in the scene are included in the export.
Preview font sizes (Markdown) are optimized for YouTube readability:
- Base text: 18px
- H1: 36px, H2: 27px, H3: 22.5px
- Code blocks: 15px monospace
Images can be added to the Markdown editor in two ways:
- Paste — Paste an image from the clipboard. It uploads to the server and inserts a clean
URL - File picker — Click the image toolbar button to pick a file from disk. Same upload-and-URL flow
Images are stored as files in backend/data/images/ and served via a dedicated endpoint. The editor stays clean — no base64 inline data.
Each slide can have narration text regardless of editor mode. When the user clicks "Generate", PocketTTS converts the text to speech using a voice cloned from a global voice sample. The generated audio is stored as MP3 (128kbps). The audio duration is used for video export timing.
- Left padding (default 0.0s) — silence before audio starts
- Right padding (default 0.5s) — silence after audio ends
- Total slide duration = left_padding + audio_duration + right_padding
- Padding controls appear as soon as narration text is saved
- Editing narration text clears the previously generated audio (user must regenerate)
- Slides without narration fall back to the manual duration slider
- A full-screen narration modal is available for comfortable long-form text editing
The MP4 export sends selected slide IDs to the backend. The export guards against slides that have narration text but no generated audio. For each slide:
- With audio: The backend creates a padded audio track (silence + narration + silence) and generates a video segment combining the slide image with the audio using ffmpeg
- Without audio: The backend creates a video segment from the image with a silent audio track, using the manual duration
All frames are scaled/padded to exactly 1920x1080 by ffmpeg (safety net). All audio segments are normalized to 44100Hz stereo AAC to ensure seamless concatenation. Segments are concatenated into a final H.264 MP4 video optimized for sharp text and screen content: libx264, CRF 18 (high quality), slow preset, stillimage tune, yuv420p, 30fps, AAC audio at 128kbps.
Click the "Fullscreen" button in the toolbar to expand the editor to fill the entire viewport. The top toolbar and right slides panel are hidden, replaced by a floating toolbar at the top-center with essential controls (Screenshot/Save Edit, Cancel, mode badge, Exit). The editor maintains its 16:9 aspect ratio but is much larger since chrome is removed. Ctrl+S works in fullscreen. Click "Exit" or the toolbar's "Fullscreen" button to return to normal mode.
Click "Clone" (hover-revealed green button on each slide card) to duplicate a slide. The clone copies the image, editor mode, code snapshot/scene data, background color, and narration text. Audio is not cloned (regenerate TTS for the copy). Padding resets to defaults (0.0 / 0.5). The cloned slide appears at the end and is auto-selected.
When editing an existing slide (via the "Edit" button), the UI provides clear feedback:
- Toolbar: amber "Editing Slide #N" badge, button changes to "Save Edit", "Cancel" button appears
- Editor area: amber banner with "press Ctrl+S or click Save Edit" hint
- Cancel: exits edit mode and resets the editor state
The dev.sh startup script automatically finds available ports starting from 8000 (backend) and 5173 (frontend) using Python's socket.bind(). Both ports are exported as environment variables (BACKEND_PORT, FRONTEND_PORT) so the Vite proxy and CORS configuration adapt automatically.
coding-tutorial-screen-shoter/
├── backend/
│ ├── pyproject.toml # uv dependencies
│ ├── data/
│ │ ├── app.db # SQLite database (auto-created)
│ │ ├── voice-sample.mp3 # Global voice sample for TTS
│ │ └── images/ # Uploaded editor images
│ └── app/
│ ├── main.py # FastAPI app, CORS (reads FRONTEND_PORT), lifespan
│ ├── database.py # SQLAlchemy async engine + session
│ ├── deps.py # FastAPI dependency (get_db)
│ ├── tts.py # PocketTTS wrapper (lazy-loaded model + voice state)
│ ├── models/
│ │ ├── recording.py # Recording ORM model
│ │ └── screenshot.py # Screenshot ORM model (image, audio, narration, canvas)
│ ├── schemas/
│ │ ├── recording.py # Pydantic request/response schemas
│ │ └── screenshot.py # Screenshot metadata schema (has_audio computed)
│ └── routers/
│ ├── recordings.py # CRUD: /api/v1/recordings
│ ├── screenshots.py # Upload/serve/delete/export/narration/audio/canvas/padding
│ └── images.py # Upload/serve editor images: /api/v1/images
├── frontend/
│ ├── package.json
│ ├── vite.config.ts # Proxy /api → localhost:BACKEND_PORT
│ └── src/
│ ├── App.tsx # React Router (3 routes)
│ ├── components/
│ │ ├── Header.tsx # Navigation bar
│ │ ├── LandingPage.tsx # Home page with create button
│ │ ├── CreateRecordingModal.tsx
│ │ ├── RecordingsList.tsx # CRUD list of all recordings
│ │ ├── RecordingEditor.tsx # Main editor layout (mode switcher, fullscreen, edit mode, 16:9 container)
│ │ ├── MarkdownEditor.tsx # Markdown editor (16:9 clipped preview, custom image upload)
│ │ ├── CanvasEditor.tsx # Excalidraw canvas editor (bg color sync, onChange files, export compositing)
│ │ ├── EditorModeSwitcher.tsx # Markdown/Canvas toggle tabs
│ │ ├── EditorToolbar.tsx # Toolbar with Screenshot/Save Edit, Fullscreen, Export buttons + edit badge
│ │ ├── ScreenshotPanel.tsx # Right column with slide thumbnails
│ │ ├── ScreenshotCard.tsx # Single slide with mode badge, edit, clone, narration, audio, padding
│ │ ├── SlidePreviewModal.tsx # Full-screen slide viewer
│ │ └── DurationPopover.tsx # Per-slide duration slider
│ ├── hooks/
│ │ ├── useRecordings.ts # Recording CRUD state management
│ │ ├── useScreenshots.ts # Screenshot state per recording (upload, clone, update, remove)
│ │ └── useScreenshotCapture.ts # html2canvas capture at 1920x1080
│ ├── services/
│ │ └── api.ts # Axios HTTP client + all API functions
│ ├── types/
│ │ └── index.ts # TypeScript interfaces
│ └── utils/
│ ├── zipExport.ts # JSZip export logic
│ └── videoExport.ts # MP4 video export logic
├── dev.sh # Auto-port-detecting dev startup script
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/recordings |
List all recordings |
POST |
/api/v1/recordings |
Create a new recording |
GET |
/api/v1/recordings/{id} |
Get recording with screenshots |
PUT |
/api/v1/recordings/{id} |
Update recording title |
DELETE |
/api/v1/recordings/{id} |
Delete recording + screenshots |
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/recordings/{id}/screenshots |
Upload screenshot (multipart, with editor_mode/scene_data) |
GET |
/api/v1/recordings/{id}/screenshots/{sid}/image |
Serve PNG image |
GET |
/api/v1/recordings/{id}/screenshots/{sid}/audio |
Serve MP3 audio |
DELETE |
/api/v1/recordings/{id}/screenshots/{sid} |
Delete a screenshot |
PUT |
/api/v1/recordings/{id}/screenshots/{sid}/narration |
Update narration text |
POST |
/api/v1/recordings/{id}/screenshots/{sid}/generate-audio |
Generate TTS audio |
PUT |
/api/v1/recordings/{id}/screenshots/{sid}/padding |
Update left/right padding |
PUT |
/api/v1/recordings/{id}/screenshots/{sid}/canvas |
Save canvas scene data + bg color |
PUT |
/api/v1/recordings/{id}/screenshots/{sid}/image |
Update slide image + metadata in-place |
POST |
/api/v1/recordings/{id}/screenshots/{sid}/clone |
Clone a slide (copies content, not audio) |
GET |
/api/v1/recordings/{id}/screenshots/export |
Download all as ZIP |
GET |
/api/v1/recordings/{id}/screenshots/export-video |
Export slides as 1920x1080 MP4 |
| Method | Endpoint | Description |
|---|---|---|
POST |
/api/v1/images |
Upload an image, returns URL |
GET |
/api/v1/images/{filename} |
Serve uploaded image |
| Method | Endpoint | Description |
|---|---|---|
GET |
/api/v1/health |
Health check |
recordings
| Column | Type | Description |
|---|---|---|
| id | INTEGER PK | Auto-increment ID |
| title | VARCHAR(255) | Recording title |
| created_at | DATETIME | Creation timestamp |
| updated_at | DATETIME | Last modified timestamp |
screenshots
| Column | Type | Description |
|---|---|---|
| id | INTEGER PK | Auto-increment ID |
| recording_id | INTEGER FK | References recordings(id), CASCADE delete |
| slide_number | INTEGER | Sequential slide number (unique per recording) |
| image_data | BLOB | PNG binary data (1920x1080) |
| code_snapshot | TEXT | Markdown source at time of capture |
| editor_mode | VARCHAR(10) | "markdown" or "canvas" (default "markdown") |
| scene_data | TEXT | Excalidraw scene JSON (canvas mode) |
| canvas_bg_color | VARCHAR(10) | Canvas background hex color (default "#0d1117") |
| narration_text | TEXT | Narration script for TTS |
| audio_data | BLOB | Generated MP3 audio data (128kbps) |
| audio_duration | FLOAT | Duration of generated audio in seconds |
| left_padding | FLOAT | Silence before audio (default 0.0s) |
| right_padding | FLOAT | Silence after audio (default 0.5s) |
| created_at | DATETIME | Capture timestamp |
| updated_at | DATETIME | Last modified timestamp (auto-updated) |
| Path | Component | Description |
|---|---|---|
/ |
LandingPage | Home with "Create New Recording" |
/recordings |
RecordingsList | List, edit, delete recordings |
/recording/:id |
RecordingEditor | Markdown/Canvas editor + screenshot panel with narration |