Overview
Whisper is an automatic speech recognition (ASR) system developed by OpenAI. It approaches human-level accuracy and supports 99 languages.
Key Features
- High Accuracy: Near-human transcription quality
- Multilingual: Supports 99 languages
- Multiple Models: Tiny to Large for different needs
- Tasks: Transcription, translation, detection
- Open Source: Free to use and modify
Installation
pip install openai-whisper
whisper audio.mp3 --model medium
Model Sizes
| Model | Parameters | VRAM | Speed |
|---|---|---|---|
| tiny | 39M | ~1GB | Fastest |
| base | 74M | ~1GB | Fast |
| small | 244M | ~2GB | Medium |
| medium | 769M | ~5GB | Slow |
| large | 1550M | ~10GB | Slowest |
Resources
Source: GitHub