All posts
podcastcaptionswhispertutorialcaption-studio

Caption your podcast in 5 minutes (Whisper + auto vertical export)

Turn long-form podcasts into vertical captioned shorts for TikTok, Reels, and Shorts. Whisper transcription, in/out points, 9:16 export — no editor needed.

May 8, 20265 min readBy Vivix Team

A 60-minute podcast usually has 5-10 moments worth clipping for social. The problem: finding them, transcribing them, captioning them, and exporting at 9:16 used to take an hour per clip in Premiere or DaVinci.

Vivix Caption Studiodoes the whole pipeline in ~5 minutes per clip. Here's the workflow.

What Caption Studio does

  • Transcribes the entire file with Whisper — 95-98% accurate, every word timestamped to the millisecond
  • Lets you scrub the timeline — click any line in the transcript to jump to that timestamp
  • Drag in/out handles to set clip boundaries
  • Burn captions into the video with viral-tested templates (Bold TikTok, Karaoke, Subtitle)
  • Auto-crop to 9:16 with subject detection so the speaker stays centered
  • Export ready-to-upload MP4 — drop into TikTok, Reels, Shorts

The 5-minute workflow

Step 1: Upload (30 seconds)

Drop an MP3, MP4, M4A, or WAV. Long-form is fine — Caption Studio transcribes the whole thing once, even a 2-hour episode.

Step 2: Whisper transcribes (1-2 minutes)

Whisper runs server-side. For a 60-minute file, it finishes in ~90 seconds. Every word gets a timestamp, so you can click any line in the transcript to scrub the playhead there.

Step 3: Pick the moment (1 minute)

Skim the transcript. When you find a clip-worthy moment (a strong quote, a controversial take, a laugh), click the line to jump there. Drag the in/out handles to set 30-90 second boundaries — that's the Goldilocks length for TikTok and Reels.

Step 4: Caption + crop (30 seconds)

Pick a caption template:

  • Bold TikTok — large white text, black outline, bottom-third placement. Best for podcasts.
  • Karaoke — word-by-word highlight on the active word. Works for fast-paced clips.
  • Subtitle — small clean text at the bottom. Best for talking-head clips.

Caption Studio auto-crops to 9:16 vertical with subject detection. If your podcast has video of two speakers, the crop follows whichever one is talking.

Step 5: Export (30 seconds)

Click export. Caption Studio renders the captioned 9:16 MP4 with ffmpeg server-side and gives you a download link. Costs 1 clip credit per export.

Pricing

  • Free signup: 1 clip credit (1 captioned export)
  • Standard ($10/mo): 80 clip credits/mo
  • Pro ($25/mo): 120 clip credits/mo
  • Ultimate ($70/mo): 400 clip credits/mo

Clip credits are separate from the regular generation credits — they reset monthly, but unused credits roll over.

Real numbers from a real podcast

We tested with a 65-minute interview podcast. Workflow:

  • Upload: 45 seconds (180 MB file)
  • Whisper transcription: 2 minutes 10 seconds
  • Picking 5 moments: 4 minutes (skimming the transcript)
  • Captioning + exporting all 5 clips: 12 minutes
  • Total: ~19 minutes for 5 publishable shorts

Same job in Premiere: ~5 hours.

Tips that earned us views

1. Pull-quotes beat play-by-play

The clips that go viral are strong claims, not slow build-ups. Look for moments where the host or guest says something opinionated — that's the hook.

2. Add a 1-second pause at the end

Drag the out-point 1 second past the end of the line. The pause lets the viewer process before scrolling. Watch-time goes up.

3. Edit the captions for clarity

Whisper gets 95-98% right but it sometimes misses uncommon names or jargon. Click any caption to fix the text — the timing stays locked to the audio.

4. Use the same template across all clips from one episode

Visual consistency = brand recognition in the algorithm. Pick Bold TikTok once, use it for every clip from that episode.

Multi-language

Whisper supports 90+ languages. The transcription works in whatever language your podcast is in — Arabic, Mandarin, Japanese, Spanish, all native quality. Captions burn in with the right script and direction (RTL for Arabic, etc.).

FAQ

Can I edit the in/out points after seeing the export?

Yes — your transcript stays in your account. Open the same project, drag new boundaries, export again (costs another clip credit).

Can I do voice-over for the clip?

Yes. Generate a voice with Vivix TTS (ElevenLabs Flash, MiniMax Speech, xAI TTS) and layer it in Caption Studio.

What about B-roll behind the audio?

Generate B-roll with any of the video models in Vivix and use it as the visual track with the podcast audio overlaid.

How accurate is Whisper on heavy accents?

~92% on heavy accents (vs. 95-98% on neutral). Errors are easy to fix — click, retype, save.

Sign up free — 1 clip credit on signup, no card, your first captioned short is free.

Try Vivix free — 30 credits + 30 daily

Over 100 frontier AI models in one studio. Same models on free as on paid.

Start free
Newsletter

Be the first to know

Subscribe to the Vivix newsletter and you'll hear it first whenever new models land or new features go live. No promo spam. Unsubscribe in one click.

We use your email only for the newsletter. Unsubscribe anytime.