Tip: Cloud Whisper uses the Groq API, which offers a generous free tier. A 10-minute video typically costs well under a cent.
Uses Groq Cloud Whisper API for fast transcription with Whisper. Requires a Groq API key in .env.
Cloud Whisper is the default in both CLI and Docker.
Runs transcription with Whisper on your machine instead of using Groq Cloud Whisper. This removes the Groq API requirement, but CPU-only runs are much slower.
If you only need CPU transcription with Whisper, pip install -e .[whisper] is enough. GPU detection is automatic when PyTorch can see a CUDA device.
Speed up audio before Whisper to reduce transcription time and cost. Groq Whisper is priced by audio duration, so a 2× speed-up roughly halves the API cost. This may reduce accuracy.
Set the default in summarizer.yaml:
The Docker image does not include Local Whisper or GPU-oriented PyTorch. It targets lightweight VPS deployments where GPUs are usually unavailable. In Docker, Cloud Whisper is the practical default. Use Local Whisper on the host machine if you have the hardware.