Introduction

AudioShake is a platform for state-of-the-art AI source separation, transcription, and content analysis — powering post-production workflows, karaoke, dubbing, speech enhancement, music copyright compliance, improving ASR accuracy, and AI training data preparation — at scale, and in production.

Get started

Make your first API call in under 5 minutes.

Create an account

Start building for free — 10 credits included on sign up.

Your first API call

Separate a track into vocals and instrumental in a single request:

curl -X POST "https://api.audioshake.ai/tasks" \
  -H "Content-Type: application/json" \
  -H "x-api-key: your_api_key" \
  -d '{
    "url": "https://demos.audioshake.ai/demo-assets/shakeitup.mp3",
    "targets": [
      { "model": "vocals", "formats": ["wav"] },
      { "model": "instrumental", "formats": ["wav"] }
    ]
  }'

What you can build

Remove vocals from songs

Isolate vocals, drums, bass, guitar, and more for remixing, karaoke, and sampling.

Isolate dialogue for dubbing

Separate speech from music and effects for localization and post-production.

Transcribe and sync lyrics

Generate word- and line-level timestamped lyrics from any song.

Separate speakers

Isolate individual speakers from multi-speaker recordings, even with overlapping speech.

Detect and identify music

Find where music appears in podcasts, video, and broadcast content for compliance.

Recover degraded speech

Denoise and de-reverb recordings for clearer speech and better transcription.

Who this is for

AI companies training speech, music, or multimodal models with clean, labeled data
Media companies processing archives for dubbing, compliance, and cataloging
Developers building karaoke apps, remix tools, practice platforms, and audio experiences

Why AudioShake

State-of-the-art quality — purpose-built models for music and speech, trained on licensed data
Production-ready — async processing, webhooks, and batch support for any scale
Simple API — one endpoint, multiple models per request, results in minutes
Audio + video — process MP4 and MOV files directly, no pre-processing needed
On-device option — run the same models locally with the Local Inference SDK

Reference

Models

Browse all available models with pricing.

API Reference

Full endpoint reference.

Tutorials

Step-by-step guides for every workflow.

Billing & Credits

How pricing works.

Quickstart

⌘I

Getting started

Tutorials

Tools

Support

Get started

Create an account

Your first API call

What you can build

Remove vocals from songs

Isolate dialogue for dubbing

Transcribe and sync lyrics

Separate speakers

Detect and identify music

Recover degraded speech

Who this is for

Why AudioShake

Reference

Models

API Reference

Tutorials

Billing & Credits

Get started

Create an account

​Your first API call

​What you can build

Remove vocals from songs

Isolate dialogue for dubbing

Transcribe and sync lyrics

Separate speakers

Detect and identify music

Recover degraded speech

​Who this is for

​Why AudioShake

​Reference

Models

API Reference

Tutorials

Billing & Credits

Your first API call

What you can build

Who this is for

Why AudioShake

Reference