Models
When making API calls, you can choose from the following model names to determine the desired output. For instance, using the model name vocals
will create a vocals
job.
curl -L -X POST 'https://groovy.audioshake.ai/job/' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"metadata": {
"format": "wav",
"name": "vocals"
},
"assetId": "abc123"
}'
Instrument Stem Separation
Name | Model Name | Description | API | Widget |
---|---|---|---|---|
Instrumental | instrumental | Music without vocals, only instruments | ✅ | ✅ |
Drums | drums | Percussion instruments producing rhythmic beats | ✅ | ✅ |
Vocals | vocals | Isolates singing and vocal sounds | ✅ | ✅ |
Bass | bass | Instruments producing low-frequency sounds, typically the bass guitar or synthesizer bass lines | ✅ | ✅ |
Other | other | Remaining instrumentation after removing vocals, drums, and bass | ✅ | ✅ |
Guitar | guitar | Instruments from the guitar family, including electric, acoustic, and classical guitars | ✅ | |
Other-x-Guitar | other-x-guitar | Remaining instrumentation after removing vocals, drums, bass, and guitar | ✅ | ✅ |
Piano | piano | Instruments like Rhodes piano, upright piano, grand piano, and keyboard | ✅ | ✅ |
Wind | wind | Instruments like flute, saxophone, producing sound by vibrating air | ✅ | ✅ |
Residual
If you would like to generate a residual stem, please set residual
in the metadata
field to true.
For more info, contact support@audioshake.ai
Dialogue, Music, & Effects
Name | Model Name | Description | API | Widget |
---|---|---|---|---|
Dialogue | dialogue | Speech or vocals isolated from any other sound | ✅ | |
Music removal | music_removal | Removing music from audio while retaining dialogue, background effects, and natural sound | ✅ | |
Background (Music & FX) | music_fx | Remove dialogue to extracting a clean background stem of music and effects | ✅ | |
Music detection | music_detection | The portions of an audio file that contain music | ✅ | |
Multi-Voice | multi_voice | Separates dialogue from multiple speakers in audio recordings, delivering individual audio files per speaker. Available in two_speaker and n_speaker variants, detailed below. | ✅ |
Transcription & Alignment
Name | Model Name | Description | API | Widget |
---|---|---|---|---|
Transcription | transcription | Text representation of spoken words or audio content | ✅ | ✅ |
Alignment | alignment | Synchronization of audio and corresponding text or captions | ✅ | ✅ |
Variants
Certain models offer variants optimized for specific audio processing use-cases. To use a variant, include "variant": "desired_variant"
in the metadata parameters when submitting a job via the API. The available variants are listed below:
Model | Variant | Description |
---|---|---|
multi_voice | two_speaker | Optimized for separating two speakers. (Default) |
multi_voice | n_speaker | Creates stems for any number of speakers. |
Example request creating a multi_voice
job with the n_speaker
variant:
curl -L -X POST 'https://groovy.audioshake.ai/job/' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"metadata": {
"format": "wav",
"name": "multi_voice",
"variant": "n_speaker"
},
"assetId": "abc123"
}'