Skip to main content

Models

AudioShake’s Models define the specific type of audio processing applied to your content.
Each model represents a distinct audio processing operation—such as isolating vocals, removing background music, or transcribing dialogue—and can be combined within a single Tasks API request to produce multiple outputs from the same source file.

Models are organized by use case:

  • Instrument Stem Separation — break down songs into individual components like vocals, drums, and bass.
  • Dialogue, Music, and Effects — isolate voices or remove background elements for film, TV, and dubbing.
  • Transcription and Alignment — convert spoken content into synchronized text and timestamps.

Use these models to design flexible workflows for music production, post-production, accessibility, and AI data preparation.


Instrument Stem Separation

These models isolate or extract musical components from a mixed track.
They’re useful for remixing, immersive audio, gaming, and music education.
All models can be called via the /tasks route and support standard formats like WAV, MP3, or FLAC.

NameModel KeyDescriptionCredits / MinuteMax Length
VocalsvocalsExtracts vocal elements from a mix. Supports the high_quality variant for improved clarity.1.03 Hours
Lead Vocalsvocals_leadVocal performances carrying the primary melodic or lyrical content of the track. Excellent for karoke.1.03 Hours
Backing Vocalsvocals_backingExtracts only the backing vocals including harmonies, chants, ad-libs, and choirs.1.03 Hours
InstrumentalinstrumentalGenerates an instrumental-only version by removing vocals. For best quality, use the high_quality variant.1.03 Hours
DrumsdrumsIsolates percussion and rhythmic elements.1.03 Hours
BassbassSeparates bass instruments and low-frequency sounds.1.03 Hours
GuitarguitarIsolates guitar stems (acoustic, electric, classical).1.03 Hours
Electric Guitarguitar_electricIsolates electric guitar stems.1.03 Hours
Acoustic Guitarguitar_acousticIsolates acoustic guitar stems (including classical guitar).1.03 Hours
PianopianoExtracts only acoustic piano.1.03 Hours
KeyskeysExtracts all keyboard instruments including piano, electric piano, organ, etc.1.03 Hours
StringsstringsIsolates orchestral string instruments like violin, cello, and viola.1.03 Hours
WindwindExtracts wind instruments such as flute and saxophone.1.03 Hours
OtherotherCaptures remaining instrumentation after main stems are removed.1.03 Hours
Other-x-Guitarother-x-guitarResidual instrumentation after removing vocals, drums, bass, and guitar.1.03 Hours
Residual Stems

To include a residual stem in your results, set "residual": true in the target metadata when creating your task. For more info, contact support@audioshake.ai

Example — Using Models in a Tasks API Request

curl -sS -X POST "https://api.audioshake.ai/tasks" \
-H "x-api-key: $AUDIOSHAKE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://demos.audioshake.ai/demo-assets/shakeitup.mp3",
"targets": [
{ "model": "vocals", "formats": ["wav"], "variant": "high_quality" },
{ "model": "instrumental", "formats": ["wav"] },
{ "model": "transcription", "formats": ["json"], "language": "en" }
]
}'

Dialogue, Music, & Effects

NameModelDescriptionCredits / MinuteMax Length
DialoguedialogueIsolates speech or vocals from any other sound1.53 Hours
EffectseffectsRemoves dialogue and music but retains the ambience, sound effects, and environmental noise1.53 Hours
Music removalmusic_removalRemoves music from audio while retaining dialogue, background effects, and natural soundN/A1 Hour
Background (Music & FX)music_fxRemoves dialogue to extracting a clean background stem of music and effects1.53 Hours
Music detectionmusic_detectionDetects the portions of an audio file that contain music0.53 Hours
Multi-Voicemulti_voiceSeparates dialogue from multiple speakers in audio recordings, delivering individual audio files per speaker. Available in two_speaker and n_speaker variants, detailed below.N/A1 Hour
Music Removal & Multi-Voice Availability

Currently Music Removal and Multi-Voice separation are not available via the /tasks route. Please contact support@audioshake.ai for access.


Transcription & Alignment

NameModel NameDescriptionCredits / MinuteMax Length
TranscriptiontranscriptionText representation of spoken words or audio content11 Hour
AlignmentalignmentSynchronization of audio and corresponding text or captions11 Hour

Alignment-Only Targets

You can run Alignment as a standalone target. If you don't provide a transcript, alignment will automatically generate one for you. If you already have an accurate transcript, you can provide it to skip transcription and only generate synchronized timestamps.

Provide either a transcriptUrl (public URL to your transcript file) or a transcriptAssetId (if you've already uploaded the transcript as an asset).

curl -sS -X POST "https://api.audioshake.ai/tasks" \
-H "x-api-key: $AUDIOSHAKE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://demos.audioshake.ai/demo-assets/shakeitup.mp3",
"assetId": "",
"targets": [
{
"model": "alignment",
"formats": ["json", "txt", "srt"],
"transcriptUrl": "",
"transcriptAssetId": ""
}
]
}'
tip

Use url or assetId for your source audio/video file, and transcriptUrl or transcriptAssetId within the target for your transcript. You only need to provide one of each pair.


Variants

Certain models offer variants optimized for specific audio processing use-cases. To use a variant, include "variant": "<desired_variant>" in the metadata parameters when submitting a job via the API. The available variants are listed below:

ModelVariantDescriptionPlanCredits / Minute
multi_voicetwo_speakerOptimized for separating two speakers. (Default)PremiumN/A
multi_voicen_speakerCreates stems for any number of speakers.AdvancedN/A
vocalshigh_qualityHigher quality but longer processing time.Premium1.5
instrumentalhigh_qualityHigher quality but longer processing time.Premium1.5

Example — Including a Variant in a Tasks API Request

curl -sS -X POST "https://api.audioshake.ai/tasks" \
-H "x-api-key: $AUDIOSHAKE_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"url": "https://demos.audioshake.ai/demo-assets/shakeitup.mp3",
"targets": [
{ "model": "vocals", "variant": "high_quality", "formats": ["wav"] }
]
}'