Skip to content

smrati/coding-tutorial-screen-shoter

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

31 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

CodeShot

A web application for creating YouTube-ready coding tutorials. Choose between a Markdown editor with a fixed 16:9 live preview or a freeform Excalidraw canvas per slide. Capture at 1920x1080 with Ctrl+S, add TTS narration, paste/upload images, clone slides, edit in fullscreen, and export as a ZIP of PNGs or a high-quality MP4 video with narrated audio.

Tech Stack

Layer Technology
Frontend React 19, Vite, TypeScript, Tailwind CSS v4
Editors Markdown (@uiw/react-md-editor + rehype-highlight), Canvas (Excalidraw)
Screenshot html2canvas (Markdown), Excalidraw exportToBlob (Canvas) — both 1920x1080
TTS PocketTTS (voice cloning from a single audio sample)
Audio scipy + ffmpeg (WAV generation, MP3 encoding at 128kbps)
Export JSZip + file-saver, ffmpeg (server-side H.264 video with AAC audio)
Backend FastAPI, SQLAlchemy 2 (async), Pydantic v2
Database SQLite (via aiosqlite)
Package Manager npm (frontend), uv (backend)

Prerequisites

  • Node.js >= 18
  • Python >= 3.12
  • uv — install via curl -LsSf https://astral.sh/uv/install.sh | sh
  • ffmpeg — required for MP4 video export and audio conversion

Setup

Backend

cd backend
uv sync

Frontend

cd frontend
npm install

Voice Sample

Place a voice sample MP3 file at backend/data/voice-sample.mp3. This file is used by PocketTTS for voice cloning. You can configure the path via the VOICE_SAMPLE_PATH environment variable.

Running the Application

From the project root, run both servers with a single command:

./dev.sh

Ports are auto-detected — if the defaults (8000/5173) are busy, the next available port is used automatically. The actual URLs are printed on startup.

Or run individually if needed:

Backend (default port 8000):

cd backend
uv run uvicorn app.main:app --reload --port 8000

Frontend (default port 5173):

cd frontend
npm run dev

Open http://localhost:5173 in your browser.

API docs are available at http://localhost:8000/docs (Swagger UI).

How It Works

User Flow

  1. Home — Click "Create New Recording" and enter a title
  2. Choose mode — Toggle between Markdown and Canvas (Excalidraw) per slide using the mode switcher at the top of the editor
  3. Markdown mode — Write Markdown content in a split-pane editor. The right half is a fixed 16:9 (1920x1080) live preview with YouTube-optimized font sizes. Content that overflows is clipped — what you see is exactly what gets captured
  4. Canvas mode — Use the full Excalidraw editor (shapes, text, arrows, images, hand-drawn style) with a configurable per-slide background color. Draw freely within the 16:9 frame
  5. Images — In Markdown mode, paste images directly or click the image toolbar button to pick a file. Images are uploaded to the server and inserted as clean URLs. In Canvas mode, paste images directly onto the canvas
  6. Capture — Press Ctrl+S (Cmd+S on Mac) or click "Screenshot" to capture the current editor as a 1920x1080 PNG slide. Markdown uses html2canvas; Canvas uses Excalidraw's native exportToBlob
  7. Slides — Captured screenshots appear as numbered thumbnails in the right panel with mode badges (MD/Canvas), selection checkboxes, duration controls, and narration support
  8. Re-edit — Click "Edit" on any slide to load its content back into the editor (Markdown text or canvas scene). The toolbar shows a "Save Edit" button and an amber "Editing Slide #N" badge. Press Ctrl+S to re-capture — the existing slide is updated in-place, keeping its narration and audio intact. Click "Cancel" to exit edit mode
  9. Clone — Click "Clone" on any slide to duplicate it with all content (image, editor mode, code/scene data, narration text). Audio is not cloned — regenerate TTS for the copy. Padding resets to defaults
  10. Fullscreen — Click "Fullscreen" in the toolbar to expand the editor to fill the entire viewport. A floating toolbar at the top-center provides Screenshot/Save Edit, Cancel, mode badge, and Exit controls. The 16:9 aspect ratio is maintained. Markdown mode stays split-pane. Press Ctrl+S to capture from fullscreen
  11. Narration — Open the narration panel on any slide, enter text (with an expand-to-modal option), and click "Generate" to create TTS audio using a cloned voice
  12. Export — Export selected slides as a ZIP or an MP4 video (1920x1080, H.264, CRF 18 high-quality) with narrated audio and configurable padding
  13. Manage — View, edit titles, or delete recordings from the Recordings list page

Dual Editor Modes

Each slide has an editor_mode of either "markdown" or "canvas":

  • Markdown — Split-pane editor with live preview. Preview is clipped to 16:9 with YouTube-optimized fonts (18px base, 36px H1, 15px code). Captured via html2canvas at 1920x1080
  • Canvas — Full Excalidraw editor with dark theme, all tools enabled. Background color is configurable per slide via a color picker that syncs with the live canvas. Pasted images are captured via the onChange callback and included in exports. Captured via Excalidraw's exportToBlob, composited centered on 1920x1080

Scenes (including embedded images) are persisted as JSON in the backend. Re-editing loads the scene back into Excalidraw. Canvas scene changes are auto-saved with a 1-second debounce.

Screenshot Capture

Markdown mode: html2canvas renders the preview pane to a canvas at exactly 1920x1080 pixels. The canvas is converted to a PNG blob and uploaded along with the Markdown source.

Canvas mode: Excalidraw's exportToBlob API exports the current scene elements at their natural bounding box size, then composites the result centered and scaled to fit a 1920x1080 canvas with the configured background color. Embedded images in the scene are included in the export.

Preview font sizes (Markdown) are optimized for YouTube readability:

  • Base text: 18px
  • H1: 36px, H2: 27px, H3: 22.5px
  • Code blocks: 15px monospace

Image Handling

Images can be added to the Markdown editor in two ways:

  1. Paste — Paste an image from the clipboard. It uploads to the server and inserts a clean ![image](/api/v1/images/{uuid}.png) URL
  2. File picker — Click the image toolbar button to pick a file from disk. Same upload-and-URL flow

Images are stored as files in backend/data/images/ and served via a dedicated endpoint. The editor stays clean — no base64 inline data.

TTS Narration

Each slide can have narration text regardless of editor mode. When the user clicks "Generate", PocketTTS converts the text to speech using a voice cloned from a global voice sample. The generated audio is stored as MP3 (128kbps). The audio duration is used for video export timing.

  • Left padding (default 0.0s) — silence before audio starts
  • Right padding (default 0.5s) — silence after audio ends
  • Total slide duration = left_padding + audio_duration + right_padding
  • Padding controls appear as soon as narration text is saved
  • Editing narration text clears the previously generated audio (user must regenerate)
  • Slides without narration fall back to the manual duration slider
  • A full-screen narration modal is available for comfortable long-form text editing

Video Export

The MP4 export sends selected slide IDs to the backend. The export guards against slides that have narration text but no generated audio. For each slide:

  1. With audio: The backend creates a padded audio track (silence + narration + silence) and generates a video segment combining the slide image with the audio using ffmpeg
  2. Without audio: The backend creates a video segment from the image with a silent audio track, using the manual duration

All frames are scaled/padded to exactly 1920x1080 by ffmpeg (safety net). All audio segments are normalized to 44100Hz stereo AAC to ensure seamless concatenation. Segments are concatenated into a final H.264 MP4 video optimized for sharp text and screen content: libx264, CRF 18 (high quality), slow preset, stillimage tune, yuv420p, 30fps, AAC audio at 128kbps.

Fullscreen Editor

Click the "Fullscreen" button in the toolbar to expand the editor to fill the entire viewport. The top toolbar and right slides panel are hidden, replaced by a floating toolbar at the top-center with essential controls (Screenshot/Save Edit, Cancel, mode badge, Exit). The editor maintains its 16:9 aspect ratio but is much larger since chrome is removed. Ctrl+S works in fullscreen. Click "Exit" or the toolbar's "Fullscreen" button to return to normal mode.

Slide Cloning

Click "Clone" (hover-revealed green button on each slide card) to duplicate a slide. The clone copies the image, editor mode, code snapshot/scene data, background color, and narration text. Audio is not cloned (regenerate TTS for the copy). Padding resets to defaults (0.0 / 0.5). The cloned slide appears at the end and is auto-selected.

Edit Mode

When editing an existing slide (via the "Edit" button), the UI provides clear feedback:

  • Toolbar: amber "Editing Slide #N" badge, button changes to "Save Edit", "Cancel" button appears
  • Editor area: amber banner with "press Ctrl+S or click Save Edit" hint
  • Cancel: exits edit mode and resets the editor state

Auto Port Detection

The dev.sh startup script automatically finds available ports starting from 8000 (backend) and 5173 (frontend) using Python's socket.bind(). Both ports are exported as environment variables (BACKEND_PORT, FRONTEND_PORT) so the Vite proxy and CORS configuration adapt automatically.

Architecture

coding-tutorial-screen-shoter/
├── backend/
│   ├── pyproject.toml              # uv dependencies
│   ├── data/
│   │   ├── app.db                  # SQLite database (auto-created)
│   │   ├── voice-sample.mp3        # Global voice sample for TTS
│   │   └── images/                 # Uploaded editor images
│   └── app/
│       ├── main.py                 # FastAPI app, CORS (reads FRONTEND_PORT), lifespan
│       ├── database.py             # SQLAlchemy async engine + session
│       ├── deps.py                 # FastAPI dependency (get_db)
│       ├── tts.py                  # PocketTTS wrapper (lazy-loaded model + voice state)
│       ├── models/
│       │   ├── recording.py        # Recording ORM model
│       │   └── screenshot.py       # Screenshot ORM model (image, audio, narration, canvas)
│       ├── schemas/
│       │   ├── recording.py        # Pydantic request/response schemas
│       │   └── screenshot.py       # Screenshot metadata schema (has_audio computed)
│       └── routers/
│           ├── recordings.py       # CRUD: /api/v1/recordings
│           ├── screenshots.py      # Upload/serve/delete/export/narration/audio/canvas/padding
│           └── images.py           # Upload/serve editor images: /api/v1/images
├── frontend/
│   ├── package.json
│   ├── vite.config.ts              # Proxy /api → localhost:BACKEND_PORT
│   └── src/
│       ├── App.tsx                 # React Router (3 routes)
│       ├── components/
│       │   ├── Header.tsx          # Navigation bar
│       │   ├── LandingPage.tsx     # Home page with create button
│       │   ├── CreateRecordingModal.tsx
│       │   ├── RecordingsList.tsx  # CRUD list of all recordings
│       │   ├── RecordingEditor.tsx # Main editor layout (mode switcher, fullscreen, edit mode, 16:9 container)
│       │   ├── MarkdownEditor.tsx  # Markdown editor (16:9 clipped preview, custom image upload)
│       │   ├── CanvasEditor.tsx    # Excalidraw canvas editor (bg color sync, onChange files, export compositing)
│       │   ├── EditorModeSwitcher.tsx # Markdown/Canvas toggle tabs
│       │   ├── EditorToolbar.tsx   # Toolbar with Screenshot/Save Edit, Fullscreen, Export buttons + edit badge
│       │   ├── ScreenshotPanel.tsx  # Right column with slide thumbnails
│       │   ├── ScreenshotCard.tsx   # Single slide with mode badge, edit, clone, narration, audio, padding
│       │   ├── SlidePreviewModal.tsx # Full-screen slide viewer
│       │   └── DurationPopover.tsx  # Per-slide duration slider
│       ├── hooks/
│       │   ├── useRecordings.ts    # Recording CRUD state management
│       │   ├── useScreenshots.ts   # Screenshot state per recording (upload, clone, update, remove)
│       │   └── useScreenshotCapture.ts  # html2canvas capture at 1920x1080
│       ├── services/
│       │   └── api.ts              # Axios HTTP client + all API functions
│       ├── types/
│       │   └── index.ts            # TypeScript interfaces
│       └── utils/
│           ├── zipExport.ts        # JSZip export logic
│           └── videoExport.ts      # MP4 video export logic
├── dev.sh                          # Auto-port-detecting dev startup script

API Endpoints

Recordings

Method Endpoint Description
GET /api/v1/recordings List all recordings
POST /api/v1/recordings Create a new recording
GET /api/v1/recordings/{id} Get recording with screenshots
PUT /api/v1/recordings/{id} Update recording title
DELETE /api/v1/recordings/{id} Delete recording + screenshots

Screenshots

Method Endpoint Description
POST /api/v1/recordings/{id}/screenshots Upload screenshot (multipart, with editor_mode/scene_data)
GET /api/v1/recordings/{id}/screenshots/{sid}/image Serve PNG image
GET /api/v1/recordings/{id}/screenshots/{sid}/audio Serve MP3 audio
DELETE /api/v1/recordings/{id}/screenshots/{sid} Delete a screenshot
PUT /api/v1/recordings/{id}/screenshots/{sid}/narration Update narration text
POST /api/v1/recordings/{id}/screenshots/{sid}/generate-audio Generate TTS audio
PUT /api/v1/recordings/{id}/screenshots/{sid}/padding Update left/right padding
PUT /api/v1/recordings/{id}/screenshots/{sid}/canvas Save canvas scene data + bg color
PUT /api/v1/recordings/{id}/screenshots/{sid}/image Update slide image + metadata in-place
POST /api/v1/recordings/{id}/screenshots/{sid}/clone Clone a slide (copies content, not audio)
GET /api/v1/recordings/{id}/screenshots/export Download all as ZIP
GET /api/v1/recordings/{id}/screenshots/export-video Export slides as 1920x1080 MP4

Images

Method Endpoint Description
POST /api/v1/images Upload an image, returns URL
GET /api/v1/images/{filename} Serve uploaded image

Health

Method Endpoint Description
GET /api/v1/health Health check

Database Schema

recordings

Column Type Description
id INTEGER PK Auto-increment ID
title VARCHAR(255) Recording title
created_at DATETIME Creation timestamp
updated_at DATETIME Last modified timestamp

screenshots

Column Type Description
id INTEGER PK Auto-increment ID
recording_id INTEGER FK References recordings(id), CASCADE delete
slide_number INTEGER Sequential slide number (unique per recording)
image_data BLOB PNG binary data (1920x1080)
code_snapshot TEXT Markdown source at time of capture
editor_mode VARCHAR(10) "markdown" or "canvas" (default "markdown")
scene_data TEXT Excalidraw scene JSON (canvas mode)
canvas_bg_color VARCHAR(10) Canvas background hex color (default "#0d1117")
narration_text TEXT Narration script for TTS
audio_data BLOB Generated MP3 audio data (128kbps)
audio_duration FLOAT Duration of generated audio in seconds
left_padding FLOAT Silence before audio (default 0.0s)
right_padding FLOAT Silence after audio (default 0.5s)
created_at DATETIME Capture timestamp
updated_at DATETIME Last modified timestamp (auto-updated)

Frontend Routes

Path Component Description
/ LandingPage Home with "Create New Recording"
/recordings RecordingsList List, edit, delete recordings
/recording/:id RecordingEditor Markdown/Canvas editor + screenshot panel with narration

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors