Get started
Make your first API call in under 5 minutes.
Create an account
Start building for free — 10 credits included on sign up.
Your first API call
Separate a track into vocals and instrumental in a single request:What you can build
Remove vocals from songs
Isolate vocals, drums, bass, guitar, and more for remixing, karaoke, and sampling.
Isolate dialogue for dubbing
Separate speech from music and effects for localization and post-production.
Transcribe and sync lyrics
Generate word- and line-level timestamped lyrics from any song.
Separate speakers
Isolate individual speakers from multi-speaker recordings, even with overlapping speech.
Detect and identify music
Find where music appears in podcasts, video, and broadcast content for compliance.
Clean up noisy speech
Remove background noise from recordings for clearer speech and better transcription.
Who this is for
- AI companies training speech, music, or multimodal models with clean, labeled data
- Media companies processing archives for dubbing, compliance, and cataloging
- Developers building karaoke apps, remix tools, practice platforms, and audio experiences
Why AudioShake
- State-of-the-art quality — purpose-built models for music and speech, trained on licensed data
- Production-ready — async processing, webhooks, and batch support for any scale
- Simple API — one endpoint, multiple models per request, results in minutes
- Audio + video — process MP4 and MOV files directly, no pre-processing needed
- On-device option — run the same models locally with the Local Inference SDK
Reference
Models
Browse all available models with pricing.
API Reference
Full endpoint reference.
Tutorials
Step-by-step guides for every workflow.
Billing & Credits
How pricing works.