Speech to text
Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web.
- ID: @cf/openai/whisper - used to
run
this model via SDK or API - Name: Automatic speech recognition (ASR) system from OpenAI
- Task: speech-recognition
- License type: MIT
- Terms + Information
Examples
import { Ai } from "@cloudflare/ai";export interface Env {AI: any;}export default {async fetch(request: Request, env: Env) {const res: any = await fetch("https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav");const blob = await res.arrayBuffer();const ai = new Ai(env.AI);const input = {audio: [...new Uint8Array(blob)],};const response = await ai.run("@cf/openai/whisper", input);return new Response(JSON.stringify({ input: { audio: [] }, response }));}};
$ curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/openai/whisper \-X POST \-H "Authorization: Bearer {API_TOKEN}" \--data-binary @talking-llama.mp3
API schema
The following schema is based on JSON Schema
{"task": "speech-recognition","tsClass": "AiSpeechRecognition","jsonSchema": {"input": {"type": "object","properties": {"audio": {"type": "string","format": "binary"}},"required": ["audio"]},"output": {"type": "object","properties": {"text": {"type": "string"}}}}}