Models

When making API calls, you can choose from the following model names to determine the desired output. For instance, using the model name vocals will create a vocals job.

curl -L -X POST 'https://groovy.audioshake.ai/job' \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer <TOKEN>' \
    -d '{
      "metadata": {
        "format": "wav",
        "name": "vocals"
      },
      "assetId": "abc123"
    }'

Instrument Stem Separation

Name	Model Name	Description	API	Widget	Maximum Content Length
Instrumental	instrumental	Music without vocals, only instruments	✅	✅	3 Hours
Drums	drums	Percussion instruments producing rhythmic beats	✅	✅	3 Hours
Vocals	vocals	Isolates singing and vocal sounds	✅	✅	3 Hours
Bass	bass	Instruments producing low-frequency sounds, typically the bass guitar or synthesizer bass lines	✅	✅	3 Hours
Other	other	Remaining instrumentation after removing vocals, drums, and bass	✅	✅	3 Hours
Guitar	guitar	Instruments from the guitar family, including electric, acoustic, and classical guitars	✅		3 Hours
Other-x-Guitar	other-x-guitar	Remaining instrumentation after removing vocals, drums, bass, and guitar	✅	✅	3 Hours
Piano	piano	Instruments like Rhodes piano, upright piano, grand piano, and keyboard	✅	✅	3 Hours
Wind	wind	Instruments like flute, saxophone, producing sound by vibrating air	✅	✅	3 Hours
Strings	strings	Orchestral string instruments like violin, viola, cello, and double bass	✅	✅	3 Hours

Residual

If you would like to generate a residual stem, please set residual in the metadata field to true.

For more info, contact support@audioshake.ai

Dialogue, Music, & Effects

Name	Model Name	Description	API	Maximum Content Length
Dialogue	dialogue	Speech or vocals isolated from any other sound	✅	3 Hours
Music removal	music_removal	Removing music from audio while retaining dialogue, background effects, and natural sound	✅	1 Hour
Background (Music & FX)	music_fx	Remove dialogue to extracting a clean background stem of music and effects	✅	3 Hours
Music detection	music_detection	The portions of an audio file that contain music	✅	3 Hours
Multi-Voice	multi_voice	Separates dialogue from multiple speakers in audio recordings, delivering individual audio files per speaker. Available in `two_speaker` and `n_speaker` variants, detailed below.	✅	1 Hour

Transcription & Alignment

Name	Model Name	Description	API	Widget	Maximum Content Length
Transcription	transcription	Text representation of spoken words or audio content	✅	✅	1 Hour
Alignment	alignment	Synchronization of audio and corresponding text or captions	✅	✅	1 Hour

Variants

Certain models offer variants optimized for specific audio processing use-cases. To use a variant, include "variant": "<desired_variant>" in the metadata parameters when submitting a job via the API. The available variants are listed below:

Model	Variant	Description
multi_voice	two_speaker	Optimized for separating two speakers. (Default)
multi_voice	n_speaker	Creates stems for any number of speakers.
vocals	high_quality	Higher quality but longer processing time.
instrumental	high_quality	Higher quality but longer processing time.

Example request creating a multi_voice job with the n_speaker variant:

curl -L -X POST 'https://groovy.audioshake.ai/job' \
    -H 'Content-Type: application/json' \
    -H 'Authorization: Bearer <TOKEN>' \
    -d '{
      "metadata": {
        "format": "wav",
        "name": "multi_voice",
        "variant": "n_speaker"
      },
      "assetId": "abc123"
    }'

Instrument Stem Separation​

Dialogue, Music, & Effects​

Transcription & Alignment​

Variants​

Instrument Stem Separation

Dialogue, Music, & Effects

Transcription & Alignment

Variants