For AI agents: a documentation index is available at the root level at /llms.txt and /llms-full.txt. Append /llms.txt to any URL for a page-level index, or .md for the markdown version of any page.
Get Started
  • Overview
    • Welcome
    • How It Works
  • Getting Started
    • Installation
    • Configuration
  • Usage
    • CLI Reference
    • Summary Styles
    • Batch Processing
    • Config Management
    • Retry Behavior
    • Errors and Troubleshooting
  • Features
    • Visual Mode
    • Transcription
    • Webapp
    • Caching
  • Integrations
    • Share a Summary
    • Cobalt
    • Proxy
    • Agent Skill
Get Started
On this page
  • Cloud Whisper (Default)
  • Local Whisper
  • Model Sizes
  • Audio Speed
  • Docker Note
Features

Transcription with Whisper

Was this page helpful?
Edit this page
Previous

Webapp

Next
Built with

Cloud Whisper (Default)

Tip: Cloud Whisper uses the Groq API, which offers a generous free tier. A 10-minute video typically costs well under a cent.

Uses Groq Cloud Whisper API for fast transcription with Whisper. Requires a Groq API key in .env.

$python -m summarizer --source "URL"

Cloud Whisper is the default in both CLI and Docker.

Local Whisper

Runs transcription with Whisper on your machine instead of using Groq Cloud Whisper. This removes the Groq API requirement, but CPU-only runs are much slower.

$# Add Local Whisper support
$pip install -e .[whisper]
$
$# Optional: install CUDA-enabled PyTorch for GPU acceleration
$pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu124
$
$# Use it
$python -m summarizer --source "URL" --force-download --transcription "Local Whisper" --whisper-model "small"

If you only need CPU transcription with Whisper, pip install -e .[whisper] is enough. GPU detection is automatic when PyTorch can see a CUDA device.

Model Sizes

ModelSpeedAccuracy
tinyFastestLowest
baseFastLow
smallModerateModerate
mediumSlowHigh
largeSlowestHighest

Audio Speed

Speed up audio before Whisper to reduce transcription time and cost. Groq Whisper is priced by audio duration, so a 2× speed-up roughly halves the API cost. This may reduce accuracy.

$# Moderate speed-up
$python -m summarizer --source "URL" --force-download --audio-speed 2.0
$
$# Aggressive speed-up
$python -m summarizer --source "URL" --force-download --audio-speed 5.0

Set the default in summarizer.yaml:

1defaults:
2 audio-speed: 2.0

Docker Note

The Docker image does not include Local Whisper or GPU-oriented PyTorch. It targets lightweight VPS deployments where GPUs are usually unavailable. In Docker, Cloud Whisper is the practical default. Use Local Whisper on the host machine if you have the hardware.