Tags

PythonGoogle GeminiGroq APIOpenAI WhisperPexels APIFFmpegTMDB APIaivideoautomationpython

StudioZero is an AI-powered video generation pipeline that creates short, stylized vertical videos (9:16 aspect ratio) from movie titles. Input a movie name, and the system automatically fetches movie data, generates a compelling narrative via LLM, synthesizes voiceovers, downloads matching stock footage, and renders a final video with Hormozi-style animated captions.

Target Output: Viral-ready social media videos suitable for TikTok, Instagram Reels, and YouTube Shorts.

How It Works

Movie Name → Wikipedia/TMDB → Gemini LLM → Gemini VideoScript → Parallel Asset Gen → Whisper Sync → FFmpeg Render

The 5-Step Pipeline

The pipeline is implemented as a generator-based orchestration engine that yields real-time progress updates:

#### Step 1: Movie Data Retrieval & Script Generation

  • Searches Wikipedia for movie data (with TMDB fallback)
  • Extracts plot, year, tagline, and poster path
  • Passes data to Gemini LLM (with Groq fallback) with a comprehensive system prompt
  • LLM generates a `VideoScript` with:
- Genre classification and voice selection (from 30 available voices)

- 6-scene narrative arc with detailed annotations

- Visual search queries per scene (literal, abstract, atmospheric)

- Mood-based pacing recommendations

#### Step 2: Parallel Asset Generation

  • TTS (Parallel): Gemini generates narration audio per scene with mood-based style prompts
  • Stock Video (Parallel): Pexels API downloads portrait footage using 3 visual queries per scene
  • Ending Scene: Special handling for movie poster reveal with Ken Burns zoom effect
  • Fallback to local base videos if Pexels search fails

#### Step 3: Whisper Transcription

  • Whisper "base" model extracts word-level timestamps from each audio file
  • Cumulative timestamp adjustment across all scenes for frame-accurate sync

#### Step 4: Karaoke Subtitle Generation

  • Generates ASS (Advanced SubStation Alpha) subtitles
  • Hormozi-style formatting: Arial Black 80pt, white with black outline
  • Word-by-word appearance based on Whisper timestamps

#### Step 5: Video Rendering

  • Video Normalization: All clips normalized to 1080x1920, 30fps, H.264
  • Audio Mixing: Voiceover concatenation with looped background music
  • Audio Ducking: Sidechain compression reduces music when voice is present
  • Subtitle Burning: ASS subtitles overlaid via FFmpeg `ass` filter
  • Final Encode: H.264 MP4 with AAC audio (192kbps)

Technology Stack

ComponentTechnologyDetails
LanguagePython 3.10+Modern async/threading support
LLMGoogle Gemini (primary) + Groq (fallback)Gemini 2.0 Flash with Groq Llama fallback
Text-to-SpeechGoogle Gemini 2.5 Flash TTS30 voices, mood-based style prompts
TranscriptionOpenAI WhisperBase model with word-level timestamps
Stock VideoPexels APIPortrait filtering, 3-query fallback
Video RenderingFFmpegfilter_complex pipelines, Ken Burns
Subtitlespysubs2ASS format, Hormozi-style captions
Movie DataWikipedia API + TMDBDual-source with fallback
Data ValidationPydantic 2.0+Structured script models
Retry LogictenacityExponential backoff for rate limits
Configpython-dotenvEnvironment variable management
Cloud StorageGoogle Drive APIVideo/log uploads via service account
Job QueueGoogle Sheets APIBatch processing with status tracking
Sheets ClientgspreadPythonic Google Sheets interface

Project Structure

StudioZero/

├── src/ # Main application code

│ ├── app.py # CLI entry point with argument parsing

│ ├── pipeline.py # 5-step orchestration engine (generator-based)

│ ├── narrative.py # Gemini/Groq LLM script generation with Pydantic models

│ ├── moviedbapi.py # Wikipedia/TMDB client for movie data

│ ├── gemini_tts.py # Google Gemini TTS voice synthesis

│ ├── stock_media.py # Pexels API video download with fallback

│ ├── subtitles.py # ASS subtitle generation (word-by-word captions)

│ ├── renderer.py # FFmpeg video composition with Ken Burns

│ ├── config.py # Environment variables & path management

│ ├── config_mappings.py # Voice/music genre mappings

│ ├── batch_runner.py # Batch processing from Google Sheets queue

│ ├── cloud_services.py # Google Drive/Sheets integration

│ └── marketing.py # Social media caption generation

├── assets/

│ ├── basevideos/ # Fallback stock footage (.mp4 clips)

│ ├── music/ # Background music tracks by genre

│ └── creds/ # Google service account credentials

├── output/

│ ├── temp/ # Intermediate files (audio, video, metadata)

│ ├── final/ # Final rendered videos

│ └── pipeline_logs/ # Script generation logs (JSON)

├── requirements.txt # Python dependencies

├── .env.template # Environment variable template

└── .env # API keys and configuration (create from template)

Module Overview

Core Modules

ModuleResponsibility
app.pyCLI interface, argument parsing, logging setup, pipeline orchestration
pipeline.pyGenerator-based 5-step orchestration, caching, parallel scene processing
narrative.pyGemini/Groq LLM integration, system prompts, Pydantic models for scripts
moviedbapi.pyWikipedia/TMDB API client, plot extraction, poster download
gemini_tts.pyGemini TTS API, 30 voices, mood-based style prompts, WAV generation
stock_media.pyPexels API, portrait filtering, 3-query fallback, local video fallback
subtitles.pyASS subtitle generation, Hormozi-style formatting, word timing
renderer.pyFFmpeg composition, Ken Burns, audio ducking, subtitle burning
config.pyEnvironment loading, path management, API key validation
config_mappings.pyVoice metadata, music-genre mapping, mood-speed mapping
batch_runner.pyBatch processing loop, Google Sheet job queue, iCloud export
cloud_services.pyGoogle Drive uploads, Google Sheets read/write, service account auth
marketing.pyLLM-powered social caption generation, genre-based hashtags

Key Data Models (Pydantic)

VideoScript:

- title: str # Movie title

- genre: str # Primary genre classification

- overall_mood: str # TTS voice tone consistency

- selected_voice_id: str # Chosen voice for narration

- selected_music_file: str # Background music filename

- scenes: List[Scene] # 6 scene objects

Scene:

- scene_index: int # 0-5 index

- narration: str # 25-40 word conversational text

- visual_queries: List[str] # 3 search queries (literal, abstract, atmospheric)

- mood: str # Scene emotional tone

- tts_speed: float # 1.0-1.6 speed multiplier

SceneAssets:

- audio_path: Path # Generated WAV file

- audio_duration: float # Duration in seconds

- video_path: Path # Downloaded/fallback video

- word_timestamps: List # Whisper-extracted timing

Data Flow

┌─────────────────────────────────────────────────────────────────┐

│ Movie Name Input │

└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐

│ Step 1: Movie Data Retrieval & Script Generation │

│ ├─ Wikipedia/TMDB Search │

│ ├─ Extract Plot, Year, Tagline │

│ └─ Gemini LLM → VideoScript (6 scenes, voice, music, moods) │

└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐

│ Step 2: Parallel Asset Generation (ThreadPoolExecutor) │

│ ├─ Scene 1-5: │

│ │ ├─ Gemini TTS → Audio (WAV) │

│ │ └─ Pexels API → Video (MP4) [or local fallback] │

│ └─ Scene 6 (Ending): │

│ ├─ Poster Download → Ken Burns Video │

│ └─ Closing Narration Audio │

└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐

│ Step 3: Whisper Transcription │

│ └─ Word-level timestamps with cumulative offset adjustment │

└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐

│ Step 4: Subtitle Generation │

│ └─ ASS format (Hormozi-style: 80pt Arial Black, word-by-word) │

└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐

│ Step 5: FFmpeg Rendering │

│ ├─ Normalize Videos (1080x1920, 30fps, H.264) │

│ ├─ Concatenate Videos + Audio │

│ ├─ Loop Background Music │

│ ├─ Sidechain Compression (audio ducking) │

│ ├─ Burn ASS Subtitles │

│ └─ Final H.264 + AAC Encode │

└─────────────────────────────────────────────────────────────────┘

┌─────────────────────────────────────────────────────────────────┐

│ Output: output/final/<movie>.mp4 │

└─────────────────────────────────────────────────────────────────┘

Setup

1. Install Python Dependencies

pip install -r requirements.txt

2. Install FFmpeg

# macOS

brew install ffmpeg

Ubuntu/Debian

sudo apt install ffmpeg

Windows (via chocolatey)

choco install ffmpeg

FFmpeg must be compiled with `libx264` and `libass` support.

3. Configure API Keys

Copy the template and fill in your values:

cp .env.template .env

Required API Keys (for video generation):

GEMINI_API_KEY=your_gemini_key    # Primary LLM + TTS

GROQ_API_KEY=your_groq_key # Fallback LLM + caption generation

PEXELS_API_KEY=your_pexels_key # Stock video

TMDB_API_KEY=your_tmdb_key # Optional but recommended for movie data

Get API Keys:

4. Configure Batch Processing (Optional)

For automated batch processing from Google Sheets:

a) Create a Google Service Account:

1. Go to Google Cloud Console

2. Create a new service account

3. Enable Google Drive API and Google Sheets API

4. Download the JSON credentials file

5. Place it in `assets/creds/drive_credentials.json`

b) Add batch configuration to `.env`:

# Path to service account credentials

DRIVE_APPLICATION_CREDENTIALS=assets/creds/drive_credentials.json

Google Sheet URL with movie queue

BATCH_SHEET_URL=https://docs.google.com/spreadsheets/d/your_sheet_id

Google Drive folder IDs for uploads

DRIVE_VIDEO_FOLDER_ID=your_video_folder_id

DRIVE_LOGS_FOLDER_ID=your_logs_folder_id

c) Share your Google Sheet and Drive folders with the service account email (found in the JSON file).

5. Add Fallback Footage (Optional)

Place `.mp4` video clips in `assets/basevideos/` for when Pexels search fails. These should be portrait (9:16) clips.

6. Add Background Music (Optional)

Place music tracks in `assets/music/<genre>/` folders. Supported genres: action, comedy, drama, horror, romance, sci-fi, thriller, etc.

Usage

Basic Usage

python -m src.app "Inception"

CLI Options

# Full pipeline with verbose logging

python -m src.app "The Matrix" --verbose

Generate assets only (skip final render)

python -m src.app "Pulp Fiction" --assets-only

Use cached data (offline mode)

python -m src.app "Interstellar" --offline

Custom output path

python -m src.app "Dune" -o custom_output.mp4

Output

  • Final Video: `output/final/<movie_name>.mp4`
  • Intermediate Files: `output/temp/<movie_name>/`
  • Pipeline Logs: `output/pipeline_logs/`
  • Cache: `pipeline_cache.json`

Batch Processing

Process multiple movies automatically from a Google Sheet queue.

Google Sheet Setup

Create a sheet with these columns:

ColumnDescription
`movie_title`Movie name to process
`Status`Set to `Pending` for jobs to run
`start_time`Auto-populated when processing starts
`end_time`Auto-populated when processing completes
`video_link`Auto-populated with Google Drive link
`log_link`Auto-populated with pipeline JSON log link
`icloud_link`Auto-populated with local iCloud path
`caption`Auto-populated with generated social caption
`notes`Auto-populated with error details (blank on success)

Running Batch Processing

# Use default sheet from .env

python -m src.batch_runner

Override sheet URL

python -m src.batch_runner --sheet-url "https://docs.google.com/spreadsheets/d/..."

Limit number of movies to process

python -m src.batch_runner --limit 5

Process only one movie

python -m src.batch_runner --limit 1

Verbose logging

python -m src.batch_runner --verbose

Batch Processing Pipeline

For each pending job, the batch runner:

1. Marks row as `Processing` with start timestamp

2. Runs the full video generation pipeline

3. Generates a viral social media caption (via Groq LLM)

4. Copies video to iCloud (macOS)

5. Uploads video to Google Drive

6. Uploads pipeline log to Google Drive

7. Updates row with `Completed` status and all links

Failed jobs are marked with error details in the `notes` column.

Shell Script Shortcut (macOS Automation)

For quick video generation via macOS Shortcuts or cron jobs, use this shell script:

#!/bin/bash

PROJECT_DIR="/Users/samirhusain/Personal/code_projects/StudioZero"

PYTHON_BIN="/Users/samirhusain/Personal/code_projects/StudioZero/.venv/bin/python"

LOG_FILE="$PROJECT_DIR/output/shortcut_log.txt"

{

echo "=== Run started: $(date) ==="

cd "$PROJECT_DIR" || { echo "ERROR: Could not cd to $PROJECT_DIR"; exit 1; }

"$PYTHON_BIN" -m src.batch_runner --limit 1

echo "=== Run finished: $(date) ==="

echo ""

} >> "$LOG_FILE" 2>&1

Usage:

1. Save as `generate_video.sh` in the project root

2. Make executable: `chmod +x generate_video.sh`

3. Run directly: `./generate_video.sh`

4. Or set up as a macOS Shortcut to trigger video generation with one click

The script:

  • Processes exactly one pending movie from the Google Sheet queue (`--limit 1`)
  • Logs all output to `output/shortcut_log.txt` for debugging
  • Can be triggered by macOS Shortcuts, cron, or other automation tools

Advanced Features

Generator-Based Progress Reporting

The pipeline yields `PipelineStatus` objects for real-time UI feedback, enabling progress bars and status updates.

Caching System

`pipeline_cache.json` stores generated scripts for offline processing and faster re-runs.

Multi-Level Fallback

Pexels Query 1 → Pexels Query 2 → Pexels Query 3 → Local Fallback Video

Audio Ducking

Professional sidechain compression automatically lowers music volume when voice is present:

  • Threshold: 0.1
  • Ratio: 10:1
  • Attack: 50ms
  • Release: 200ms

Ken Burns Effect

Subtle zoom applied to static poster images for dynamic ending sequences.

Voice Selection

30 available voices (14 female, 16 male) with genre-optimized recommendations:

  • Dramatic: Orus, Fenrir
  • Conversational: Kore, Puck
  • Warm: Aoede, Leda
  • Energetic: Zephyr, Charon

Mood-Based TTS Pacing

14 emotional contexts with optimized speech speeds (1.0-1.6x):

  • Tense/Suspenseful: 0.95x
  • Exciting/Action: 1.15x
  • Calm/Reflective: 0.9x
  • Dramatic: 1.0x

Social Media Caption Generation

Automated viral caption generation for TikTok/Instagram Reels:

  • Hook-first format optimized for engagement
  • Genre-specific hashtag selection (15 genres supported)
  • Conversational tone via Groq LLM
  • Includes soft CTA for follower growth

Requirements

  • Python 3.10+
  • FFmpeg 4.0+ (with libx264, libass)
  • FFprobe (included with FFmpeg)
  • ~2GB RAM for Whisper transcription
  • Internet connection for API calls

License

MIT License