CodeShot

A web application for creating YouTube-ready coding tutorials. Choose between a Markdown editor with a fixed 16:9 live preview or a freeform Excalidraw canvas per slide. Capture at 1920x1080 with Ctrl+S, add TTS narration, paste/upload images, clone slides, edit in fullscreen, and export as a ZIP of PNGs or a high-quality MP4 video with narrated audio.

Tech Stack

Layer	Technology
Frontend	React 19, Vite, TypeScript, Tailwind CSS v4
Editors	Markdown (`@uiw/react-md-editor` + `rehype-highlight`), Canvas (Excalidraw)
Screenshot	html2canvas (Markdown), Excalidraw `exportToBlob` (Canvas) — both 1920x1080
TTS	PocketTTS (voice cloning from a single audio sample)
Audio	scipy + ffmpeg (WAV generation, MP3 encoding at 128kbps)
Export	JSZip + file-saver, ffmpeg (server-side H.264 video with AAC audio)
Backend	FastAPI, SQLAlchemy 2 (async), Pydantic v2
Database	SQLite (via aiosqlite)
Package Manager	npm (frontend), uv (backend)

Prerequisites

Node.js >= 18
Python >= 3.12
uv — install via curl -LsSf https://astral.sh/uv/install.sh | sh
ffmpeg — required for MP4 video export and audio conversion

Setup

Backend

cd backend
uv sync

Frontend

cd frontend
npm install

Voice Sample

Place a voice sample MP3 file at backend/data/voice-sample.mp3. This file is used by PocketTTS for voice cloning. You can configure the path via the VOICE_SAMPLE_PATH environment variable.

Running the Application

From the project root, run both servers with a single command:

./dev.sh

Ports are auto-detected — if the defaults (8000/5173) are busy, the next available port is used automatically. The actual URLs are printed on startup.

Or run individually if needed:

Backend (default port 8000):

cd backend
uv run uvicorn app.main:app --reload --port 8000

Frontend (default port 5173):

cd frontend
npm run dev

Open http://localhost:5173 in your browser.

API docs are available at http://localhost:8000/docs (Swagger UI).

How It Works

User Flow

Home — Click "Create New Recording" and enter a title
Choose mode — Toggle between Markdown and Canvas (Excalidraw) per slide using the mode switcher at the top of the editor
Markdown mode — Write Markdown content in a split-pane editor. The right half is a fixed 16:9 (1920x1080) live preview with YouTube-optimized font sizes. Content that overflows is clipped — what you see is exactly what gets captured
Canvas mode — Use the full Excalidraw editor (shapes, text, arrows, images, hand-drawn style) with a configurable per-slide background color. Draw freely within the 16:9 frame
Images — In Markdown mode, paste images directly or click the image toolbar button to pick a file. Images are uploaded to the server and inserted as clean URLs. In Canvas mode, paste images directly onto the canvas
Capture — Press Ctrl+S (Cmd+S on Mac) or click "Screenshot" to capture the current editor as a 1920x1080 PNG slide. Markdown uses html2canvas; Canvas uses Excalidraw's native exportToBlob
Slides — Captured screenshots appear as numbered thumbnails in the right panel with mode badges (MD/Canvas), selection checkboxes, duration controls, and narration support
Re-edit — Click "Edit" on any slide to load its content back into the editor (Markdown text or canvas scene). The toolbar shows a "Save Edit" button and an amber "Editing Slide #N" badge. Press Ctrl+S to re-capture — the existing slide is updated in-place, keeping its narration and audio intact. Click "Cancel" to exit edit mode
Clone — Click "Clone" on any slide to duplicate it with all content (image, editor mode, code/scene data, narration text). Audio is not cloned — regenerate TTS for the copy. Padding resets to defaults
Fullscreen — Click "Fullscreen" in the toolbar to expand the editor to fill the entire viewport. A floating toolbar at the top-center provides Screenshot/Save Edit, Cancel, mode badge, and Exit controls. The 16:9 aspect ratio is maintained. Markdown mode stays split-pane. Press Ctrl+S to capture from fullscreen
Narration — Open the narration panel on any slide, enter text (with an expand-to-modal option), and click "Generate" to create TTS audio using a cloned voice
Export — Export selected slides as a ZIP or an MP4 video (1920x1080, H.264, CRF 18 high-quality) with narrated audio and configurable padding
Manage — View, edit titles, or delete recordings from the Recordings list page

Dual Editor Modes

Each slide has an editor_mode of either "markdown" or "canvas":

Markdown — Split-pane editor with live preview. Preview is clipped to 16:9 with YouTube-optimized fonts (18px base, 36px H1, 15px code). Captured via html2canvas at 1920x1080
Canvas — Full Excalidraw editor with dark theme, all tools enabled. Background color is configurable per slide via a color picker that syncs with the live canvas. Pasted images are captured via the onChange callback and included in exports. Captured via Excalidraw's exportToBlob, composited centered on 1920x1080

Scenes (including embedded images) are persisted as JSON in the backend. Re-editing loads the scene back into Excalidraw. Canvas scene changes are auto-saved with a 1-second debounce.

Screenshot Capture

Markdown mode: html2canvas renders the preview pane to a canvas at exactly 1920x1080 pixels. The canvas is converted to a PNG blob and uploaded along with the Markdown source.

Canvas mode: Excalidraw's exportToBlob API exports the current scene elements at their natural bounding box size, then composites the result centered and scaled to fit a 1920x1080 canvas with the configured background color. Embedded images in the scene are included in the export.

Preview font sizes (Markdown) are optimized for YouTube readability:

Base text: 18px
H1: 36px, H2: 27px, H3: 22.5px
Code blocks: 15px monospace

Image Handling

Images can be added to the Markdown editor in two ways:

Paste — Paste an image from the clipboard. It uploads to the server and inserts a clean ![image](/api/v1/images/{uuid}.png) URL
File picker — Click the image toolbar button to pick a file from disk. Same upload-and-URL flow

Images are stored as files in backend/data/images/ and served via a dedicated endpoint. The editor stays clean — no base64 inline data.

TTS Narration

Each slide can have narration text regardless of editor mode. When the user clicks "Generate", PocketTTS converts the text to speech using a voice cloned from a global voice sample. The generated audio is stored as MP3 (128kbps). The audio duration is used for video export timing.

Left padding (default 0.0s) — silence before audio starts
Right padding (default 0.5s) — silence after audio ends
Total slide duration = left_padding + audio_duration + right_padding
Padding controls appear as soon as narration text is saved
Editing narration text clears the previously generated audio (user must regenerate)
Slides without narration fall back to the manual duration slider
A full-screen narration modal is available for comfortable long-form text editing

Video Export

The MP4 export sends selected slide IDs to the backend. The export guards against slides that have narration text but no generated audio. For each slide:

With audio: The backend creates a padded audio track (silence + narration + silence) and generates a video segment combining the slide image with the audio using ffmpeg
Without audio: The backend creates a video segment from the image with a silent audio track, using the manual duration

All frames are scaled/padded to exactly 1920x1080 by ffmpeg (safety net). All audio segments are normalized to 44100Hz stereo AAC to ensure seamless concatenation. Segments are concatenated into a final H.264 MP4 video optimized for sharp text and screen content: libx264, CRF 18 (high quality), slow preset, stillimage tune, yuv420p, 30fps, AAC audio at 128kbps.

Fullscreen Editor

Click the "Fullscreen" button in the toolbar to expand the editor to fill the entire viewport. The top toolbar and right slides panel are hidden, replaced by a floating toolbar at the top-center with essential controls (Screenshot/Save Edit, Cancel, mode badge, Exit). The editor maintains its 16:9 aspect ratio but is much larger since chrome is removed. Ctrl+S works in fullscreen. Click "Exit" or the toolbar's "Fullscreen" button to return to normal mode.

Slide Cloning

Click "Clone" (hover-revealed green button on each slide card) to duplicate a slide. The clone copies the image, editor mode, code snapshot/scene data, background color, and narration text. Audio is not cloned (regenerate TTS for the copy). Padding resets to defaults (0.0 / 0.5). The cloned slide appears at the end and is auto-selected.

Edit Mode

When editing an existing slide (via the "Edit" button), the UI provides clear feedback:

Toolbar: amber "Editing Slide #N" badge, button changes to "Save Edit", "Cancel" button appears
Editor area: amber banner with "press Ctrl+S or click Save Edit" hint
Cancel: exits edit mode and resets the editor state

Auto Port Detection

The dev.sh startup script automatically finds available ports starting from 8000 (backend) and 5173 (frontend) using Python's socket.bind(). Both ports are exported as environment variables (BACKEND_PORT, FRONTEND_PORT) so the Vite proxy and CORS configuration adapt automatically.

Architecture

coding-tutorial-screen-shoter/
├── backend/
│   ├── pyproject.toml              # uv dependencies
│   ├── data/
│   │   ├── app.db                  # SQLite database (auto-created)
│   │   ├── voice-sample.mp3        # Global voice sample for TTS
│   │   └── images/                 # Uploaded editor images
│   └── app/
│       ├── main.py                 # FastAPI app, CORS (reads FRONTEND_PORT), lifespan
│       ├── database.py             # SQLAlchemy async engine + session
│       ├── deps.py                 # FastAPI dependency (get_db)
│       ├── tts.py                  # PocketTTS wrapper (lazy-loaded model + voice state)
│       ├── models/
│       │   ├── recording.py        # Recording ORM model
│       │   └── screenshot.py       # Screenshot ORM model (image, audio, narration, canvas)
│       ├── schemas/
│       │   ├── recording.py        # Pydantic request/response schemas
│       │   └── screenshot.py       # Screenshot metadata schema (has_audio computed)
│       └── routers/
│           ├── recordings.py       # CRUD: /api/v1/recordings
│           ├── screenshots.py      # Upload/serve/delete/export/narration/audio/canvas/padding
│           └── images.py           # Upload/serve editor images: /api/v1/images
├── frontend/
│   ├── package.json
│   ├── vite.config.ts              # Proxy /api → localhost:BACKEND_PORT
│   └── src/
│       ├── App.tsx                 # React Router (3 routes)
│       ├── components/
│       │   ├── Header.tsx          # Navigation bar
│       │   ├── LandingPage.tsx     # Home page with create button
│       │   ├── CreateRecordingModal.tsx
│       │   ├── RecordingsList.tsx  # CRUD list of all recordings
│       │   ├── RecordingEditor.tsx # Main editor layout (mode switcher, fullscreen, edit mode, 16:9 container)
│       │   ├── MarkdownEditor.tsx  # Markdown editor (16:9 clipped preview, custom image upload)
│       │   ├── CanvasEditor.tsx    # Excalidraw canvas editor (bg color sync, onChange files, export compositing)
│       │   ├── EditorModeSwitcher.tsx # Markdown/Canvas toggle tabs
│       │   ├── EditorToolbar.tsx   # Toolbar with Screenshot/Save Edit, Fullscreen, Export buttons + edit badge
│       │   ├── ScreenshotPanel.tsx  # Right column with slide thumbnails
│       │   ├── ScreenshotCard.tsx   # Single slide with mode badge, edit, clone, narration, audio, padding
│       │   ├── SlidePreviewModal.tsx # Full-screen slide viewer
│       │   └── DurationPopover.tsx  # Per-slide duration slider
│       ├── hooks/
│       │   ├── useRecordings.ts    # Recording CRUD state management
│       │   ├── useScreenshots.ts   # Screenshot state per recording (upload, clone, update, remove)
│       │   └── useScreenshotCapture.ts  # html2canvas capture at 1920x1080
│       ├── services/
│       │   └── api.ts              # Axios HTTP client + all API functions
│       ├── types/
│       │   └── index.ts            # TypeScript interfaces
│       └── utils/
│           ├── zipExport.ts        # JSZip export logic
│           └── videoExport.ts      # MP4 video export logic
├── dev.sh                          # Auto-port-detecting dev startup script

API Endpoints

Recordings

Method	Endpoint	Description
`GET`	`/api/v1/recordings`	List all recordings
`POST`	`/api/v1/recordings`	Create a new recording
`GET`	`/api/v1/recordings/{id}`	Get recording with screenshots
`PUT`	`/api/v1/recordings/{id}`	Update recording title
`DELETE`	`/api/v1/recordings/{id}`	Delete recording + screenshots

Screenshots

Method	Endpoint	Description
`POST`	`/api/v1/recordings/{id}/screenshots`	Upload screenshot (multipart, with editor_mode/scene_data)
`GET`	`/api/v1/recordings/{id}/screenshots/{sid}/image`	Serve PNG image
`GET`	`/api/v1/recordings/{id}/screenshots/{sid}/audio`	Serve MP3 audio
`DELETE`	`/api/v1/recordings/{id}/screenshots/{sid}`	Delete a screenshot
`PUT`	`/api/v1/recordings/{id}/screenshots/{sid}/narration`	Update narration text
`POST`	`/api/v1/recordings/{id}/screenshots/{sid}/generate-audio`	Generate TTS audio
`PUT`	`/api/v1/recordings/{id}/screenshots/{sid}/padding`	Update left/right padding
`PUT`	`/api/v1/recordings/{id}/screenshots/{sid}/canvas`	Save canvas scene data + bg color
`PUT`	`/api/v1/recordings/{id}/screenshots/{sid}/image`	Update slide image + metadata in-place
`POST`	`/api/v1/recordings/{id}/screenshots/{sid}/clone`	Clone a slide (copies content, not audio)
`GET`	`/api/v1/recordings/{id}/screenshots/export`	Download all as ZIP
`GET`	`/api/v1/recordings/{id}/screenshots/export-video`	Export slides as 1920x1080 MP4

Images

Method	Endpoint	Description
`POST`	`/api/v1/images`	Upload an image, returns URL
`GET`	`/api/v1/images/{filename}`	Serve uploaded image

Health

Method	Endpoint	Description
`GET`	`/api/v1/health`	Health check

Database Schema

recordings

Column	Type	Description
id	INTEGER PK	Auto-increment ID
title	VARCHAR(255)	Recording title
created_at	DATETIME	Creation timestamp
updated_at	DATETIME	Last modified timestamp

screenshots

Column	Type	Description
id	INTEGER PK	Auto-increment ID
recording_id	INTEGER FK	References recordings(id), CASCADE delete
slide_number	INTEGER	Sequential slide number (unique per recording)
image_data	BLOB	PNG binary data (1920x1080)
code_snapshot	TEXT	Markdown source at time of capture
editor_mode	VARCHAR(10)	`"markdown"` or `"canvas"` (default `"markdown"`)
scene_data	TEXT	Excalidraw scene JSON (canvas mode)
canvas_bg_color	VARCHAR(10)	Canvas background hex color (default `"#0d1117"`)
narration_text	TEXT	Narration script for TTS
audio_data	BLOB	Generated MP3 audio data (128kbps)
audio_duration	FLOAT	Duration of generated audio in seconds
left_padding	FLOAT	Silence before audio (default 0.0s)
right_padding	FLOAT	Silence after audio (default 0.5s)
created_at	DATETIME	Capture timestamp
updated_at	DATETIME	Last modified timestamp (auto-updated)

Frontend Routes

Path	Component	Description
`/`	LandingPage	Home with "Create New Recording"
`/recordings`	RecordingsList	List, edit, delete recordings
`/recording/:id`	RecordingEditor	Markdown/Canvas editor + screenshot panel with narration

Name		Name	Last commit message	Last commit date
Latest commit History 31 Commits
backend		backend
frontend		frontend
.gitignore		.gitignore
README.md		README.md
dev.sh		dev.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

CodeShot

Tech Stack

Prerequisites

Setup

Backend

Frontend

Voice Sample

Running the Application

How It Works

User Flow

Dual Editor Modes

Screenshot Capture

Image Handling

TTS Narration

Video Export

Fullscreen Editor

Slide Cloning

Edit Mode

Auto Port Detection

Architecture

API Endpoints

Recordings

Screenshots

Images

Health

Database Schema

Frontend Routes

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

CodeShot

Tech Stack

Prerequisites

Setup

Backend

Frontend

Voice Sample

Running the Application

How It Works

User Flow

Dual Editor Modes

Screenshot Capture

Image Handling

TTS Narration

Video Export

Fullscreen Editor

Slide Cloning

Edit Mode

Auto Port Detection

Architecture

API Endpoints

Recordings

Screenshots

Images

Health

Database Schema

Frontend Routes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages