Skip to main content

Input formats

Audio

FormatExtensions
WAV.wav
AIFF.aiff, .aif
FLAC.flac
MP3.mp3
AAC.aac, .m4a
For best results, use WAV at 16-bit or 24-bit depth. Supports up to 192kHz sample rate. Provide a stereo or mono track.

Video

FormatExtensionsAudio codec
MP4.mp4AAC or PCM
MOV.movAAC or PCM
Only the audio stream is processed — video content is ignored.

Text

FormatExtensionsUse case
JSON.jsonTranscript input for the alignment model
TXT.txtTranscript input for the alignment model

Output formats

Set output formats in the formats array when creating a Task. Use the API value shown below.

Audio

FormatAPI valueModels
WAVwavAll models that output audio
MP3mp3All models that output audio
FLACflacAll models that output audio
AIFFaiffAll models that output audio

Video

FormatAPI valueModels
MP4mp4All models that output audio
Output video files contain the separated audio stream in the original video container.

Text

FormatAPI valueModels
JSONjsontranscription, alignment, music_detection
SRTsrttranscription, alignment
TXTtxttranscription, alignment, music_detection

Limitations

  • Encrypted or DRM-protected content is not supported
  • Multi-channel or surround formats (e.g. 5.1) are not supported