Skip to main content

Getting started

A Task is a processing job. You provide a media source and one or more model targets — AudioShake processes them asynchronously and returns results for each target. See Create Task.
A target specifies which model to run and what output format to produce. You can include up to 20 targets in one Task to generate several outputs from the same file:
{
  "assetId": "your_asset_id",
  "targets": [
    { "model": "vocals", "formats": ["wav"] },
    { "model": "instrumental", "formats": ["wav"] },
    { "model": "transcription", "formats": ["json"] }
  ]
}
  • Use url when your media is publicly accessible over HTTPS
  • Use assetId when your media is local — upload it first, then reference the returned ID
We recommend uploading files when possible for the most reliable processing.
AudioShake offers models for instrument stem separation, speech processing, post-production, copyright compliance, and lyric transcription. See the Models page for the full list with descriptions and pricing.
Yes. Use the alignment model and provide your existing transcript via transcriptUrl or transcriptAssetId. If no transcript is provided, the model will transcribe automatically.
{
  "assetId": "your_asset_id",
  "targets": [
    {
      "model": "alignment",
      "formats": ["json"],
      "transcriptUrl": "https://example.com/lyrics.txt"
    }
  ]
}

Processing

Two options:
Processing time depends on the media length, number of targets, and current queue load. Most tasks complete within seconds to a few minutes.
Yes. AudioShake accepts MP4 and MOV video files. Only the audio stream is processed — video content is ignored. See Formats for all supported input types.
Individual targets can fail independently. Check the status and error fields on each target in the Task response. Other targets in the same Task are not affected.

Files and formats

Audio: WAV, AIFF, FLAC, MP3, AAC. Video: MP4, MOV. Maximum file size is 2GB. See Formats for the full list.
Audio: wav, mp3, flac, aiff. Video: mp4. Text: json, srt, txt. See Formats for which formats apply to which models.
Uploaded Assets expire after 72 hours. Output download links expire after one hour — re-fetch the Task to get fresh links.

Billing

Credits are charged per minute of source audio, per target model. Duration is rounded up to the nearest minute. See Billing & Credits for examples and Models for per-model rates.
Tasks that fail before processing begins are not charged. If a Task fails mid-processing, contact support@audioshake.ai with the Task ID.
Yes. Contact info@audioshake.ai for volume pricing, custom SLAs, and enterprise plans.

Support

Email support@audioshake.ai. Include your Task ID and the full error message for the fastest resolution.
Yes — 60 requests per second. See Rate Limits.