Models
When making API calls, you can choose from the following model names to determine the desired output. For instance, using the model name vocals
will create a vocals
job.
curl -L -X POST 'https://groovy.audioshake.ai/job' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"metadata": {
"format": "wav",
"name": "vocals"
},
"assetId": "abc123"
}'
Instrument Stem Separation
Name | Model | Description | Credits / Minute | Max Length |
---|---|---|---|---|
Instrumental | instrumental | Music without vocals, only instruments. We recommend using the high_quality variant. | 1 | |
Drums | drums | Percussion instruments producing rhythmic beats | 1 | |
Vocals | vocals | Isolates singing and vocal sounds. We reccomend using the high_quality variant. | 1 | |
Bass | bass | Instruments producing low-frequency sounds, typically the bass guitar or synthesizer bass lines | 1 | 3 Hours |
Other | other | Remaining instrumentation after removing vocals, drums, and bass | 1 | 3 Hours |
Guitar | guitar | Instruments from the guitar family, including electric, acoustic, and classical guitars | 1 | 3 Hours |
Other-x-Guitar | other-x-guitar | Remaining instrumentation after removing vocals, drums, bass, and guitar | 1 | 3 Hours |
Piano | piano | Instruments like Rhodes piano, upright piano, grand piano, and keyboard | 1 | 3 Hours |
Wind | wind | Instruments like flute, saxophone, producing sound by vibrating air | 1 | 3 Hours |
Strings | strings | Orchestral string instruments like violin, viola, cello, and double bass | 1 | 3 Hours |
If you would like to generate a residual stem, please set residual
in the metadata
field to true.
For more info, contact support@audioshake.ai
Dialogue, Music, & Effects
Name | Model | Description | Credits / Minute | Max Length |
---|---|---|---|---|
Dialogue | dialogue | Isolates speech or vocals from any other sound | 1.5 | 3 Hours |
Effects | effects | Removes dialogue and music but retains the ambience, sound effects, and environmental noise | 1.5 | 3 Hours |
Music removal | music_removal | Removes music from audio while retaining dialogue, background effects, and natural sound | N/A | 1 Hour |
Background (Music & FX) | music_fx | Removes dialogue to extracting a clean background stem of music and effects | 1.5 | 3 Hours |
Music detection | music_detection | Detects the portions of an audio file that contain music | 0.5 | 3 Hours |
Multi-Voice | multi_voice | Separates dialogue from multiple speakers in audio recordings, delivering individual audio files per speaker. Available in two_speaker and n_speaker variants, detailed below. | N/A | 1 Hour |
Currently Music Removal and Multi-Voice separation are not available via the /tasks route. Please contact support@audioshake.ai for access.
Transcription & Alignment
Name | Model Name | Description | Credits / Minute | Max Length |
---|---|---|---|---|
Transcription | transcription | Text representation of spoken words or audio content | 1 | 1 Hour |
Alignment | alignment | Synchronization of audio and corresponding text or captions | 1 | 1 Hour |
If you run Transcription and Alignment together (T&A), pricing is Premium at 1.5 credits per minute.
Variants
Certain models offer variants optimized for specific audio processing use-cases. To use a variant, include "variant": "<desired_variant>"
in the metadata parameters when submitting a job via the API. The available variants are listed below:
Model | Variant | Description | Plan | Credits / Minute |
---|---|---|---|---|
multi_voice | two_speaker | Optimized for separating two speakers. (Default) | Premium | N/A |
multi_voice | n_speaker | Creates stems for any number of speakers. | Advanced | N/A |
vocals | high_quality | Higher quality but longer processing time. | Premium | 1.5 |
instrumental | high_quality | Higher quality but longer processing time. | Premium | 1.5 |
Example request creating a multi_voice
job with the n_speaker
variant:
curl -L -X POST 'https://groovy.audioshake.ai/job' \
-H 'Content-Type: application/json' \
-H 'Authorization: Bearer <TOKEN>' \
-d '{
"metadata": {
"format": "wav",
"name": "multi_voice",
"variant": "n_speaker"
},
"assetId": "abc123"
}'