Product Packaging And Access Modes (Open-Weights Vs Hosted)
Key takeaways
- Mistral released Voxtral Transcribe 2, a family of two new audio-to-text transcription models, as a sequel to the original Voxtral from July 2025.
- The Mistral transcription API supports diarization, context biasing, and segment-level timestamp granularities via request parameters.
- Voxtral transcription via the Mistral API is priced at $0.003 per minute ($0.18 per hour).
- In a live demo, Voxtral Realtime transcribed fast speech containing technical jargon (e.g., Django and WebAssembly) within moments of each utterance.
- The open-weights model in the release is Voxtral Realtime (Voxtral-Mini-4B-Realtime-2602) and it is available under an Apache-2.0 license.
Sections
Product Packaging And Access Modes (Open-Weights Vs Hosted)
The release is presented as a two-model family with a split between an open-weights realtime model under a permissive license and a closed-weights model available via a hosted API. The main delta is that a developer can choose between self-deployable weights and a managed endpoint within the same product line.
- Mistral released Voxtral Transcribe 2, a family of two new audio-to-text transcription models, as a sequel to the original Voxtral from July 2025.
- The open-weights model in the release is Voxtral Realtime (Voxtral-Mini-4B-Realtime-2602) and it is available under an Apache-2.0 license.
- The closed-weights model is called voxtral-mini-latest and it is accessed via the Mistral API audio transcription endpoint.
Api-Level Controllability For Transcript Structure And Accuracy Shaping
The API exposes request-parameter controls for diarization, context biasing, and timestamp granularity, and the console playground returns multiple export formats (text, SRT, JSON). This indicates a focus on integration workflows where transcript structure and downstream usability matter, not just raw text output.
- The Mistral transcription API supports diarization, context biasing, and segment-level timestamp granularities via request parameters.
- The Mistral API console includes a speech-to-text playground that can upload audio and return diarized transcripts with downloads in text, SRT, or JSON.
Pricing Disclosure For Hosted Transcription
A concrete per-minute and per-hour price is provided for the hosted API transcription option. This is a high-signal delta because it enables immediate unit-cost modeling, even though the corpus does not specify additional fees, tiers, or limits.
- Voxtral transcription via the Mistral API is priced at $0.003 per minute ($0.18 per hour).
Performance Expectations Based On A Demo Observation
The only performance-related evidence is a qualitative live demo report indicating low apparent latency and adequate handling of technical jargon. Because it is not a benchmark and lacks quantitative metrics, it updates expectations cautiously rather than establishing measured performance.
- In a live demo, Voxtral Realtime transcribed fast speech containing technical jargon (e.g., Django and WebAssembly) within moments of each utterance.
Unknowns
- What are the measured transcription accuracy metrics (e.g., word error rate) and latency for Voxtral Realtime and voxtral-mini-latest across realistic audio conditions (noise, accents, overlapping speech) and hardware configurations?
- What operational constraints apply to the Mistral transcription API (rate limits, maximum file length, streaming vs batch behavior, concurrency caps, and uptime/SLA terms)?
- Does the $0.003/min price change with enabled features such as diarization, timestamps, or context biasing, and are there separate charges for output formats or storage in the console workflow?
- What are the resource requirements and real-time performance characteristics for self-hosting Voxtral Realtime (e.g., memory footprint, compute needs, and achievable throughput)?
- What is the functional or quality difference between the open-weights Voxtral Realtime model and the hosted voxtral-mini-latest model (accuracy, latency, language coverage, supported features)?