Cloudflare Docs
Workers AI
Visit Workers AI on GitHub
Set theme to dark (⇧+D)

Speech to text

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web.

​​ Examples


import { Ai } from "@cloudflare/ai";
export interface Env {
AI: any;
}
export default {
async fetch(request: Request, env: Env) {
const res: any = await fetch("https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav");
const blob = await res.arrayBuffer();
const ai = new Ai(env.AI);
const input = {
audio: [...new Uint8Array(blob)],
};
const response = await ai.run("@cf/openai/whisper", input);
return new Response(JSON.stringify({ input: { audio: [] }, response }));
}
};

$ curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/openai/whisper \
-X POST \
-H "Authorization: Bearer {API_TOKEN}" \
--data-binary @talking-llama.mp3

​​ API schema

The following schema is based on JSON Schema


{
"task": "speech-recognition",
"tsClass": "AiSpeechRecognition",
"jsonSchema": {
"input": {
"type": "object",
"properties": {
"audio": {
"type": "string",
"format": "binary"
}
},
"required": ["audio"]
},
"output": {
"type": "object",
"properties": {
"text": {
"type": "string"
}
}
}
}
}