Speech to text

Whisper is an automatic speech recognition (ASR) system trained on 680,000 hours of multilingual and multitask supervised data collected from the web.

ID: @cf/openai/whisper - used to run this model via SDK or API
Name: Automatic speech recognition (ASR) system from OpenAI
Task: speech-recognition
License type: MIT
Terms + Information

Examples


import { Ai } from "@cloudflare/ai";

export interface Env {	AI: any;
}

export default {	async fetch(request: Request, env: Env) {    const res: any = await fetch("https://github.com/Azure-Samples/cognitive-services-speech-sdk/raw/master/samples/cpp/windows/console/samples/enrollment_audio_katie.wav");    const blob = await res.arrayBuffer();
    const ai = new Ai(env.AI);    const input = {      audio: [...new Uint8Array(blob)],    };
    const response = await ai.run("@cf/openai/whisper", input);
    return new Response(JSON.stringify({ input: { audio: [] }, response }));	}
};


$ curl https://api.cloudflare.com/client/v4/accounts/{account_id}/ai/run/@cf/openai/whisper \    -X POST \    -H "Authorization: Bearer {API_TOKEN}" \    --data-binary @talking-llama.mp3

API schema

The following schema is based on JSON Schema


{    "task": "speech-recognition",    "tsClass": "AiSpeechRecognition",    "jsonSchema": {        "input": {            "type": "object",            "properties": {                "audio": {                    "type": "string",                    "format": "binary"                }            },            "required": ["audio"]        },        "output": {            "type": "object",            "properties": {                "text": {                    "type": "string"                }            }        }    }
}

Speech to text

​​ Examples

​​ API schema

Examples

API schema