AudioShake SDK Documentation
Overview
The AudioShake SDK is a powerful audio processing library that enables high-quality audio source separation. It provides a flexible API for separating audio into multiple stems (e.g., vocals, drums, bass, and other instruments) with support for both CPU and GPU processing.
To access the AudioShake SDK and obtain your Client ID and Client Secret, please contact info@audioshake.ai.
System Requirements
Each of our SDKs have different requirements depending on the platform they are being used on. Below is a list of hardware and software requirements for each of our platforms.
Hardware Requirements
Linux
- Architecture: x86_64
- For GPU processing:
- CUDA 11.8 compatible GPU
- Minimum 4GB VRAM (T4 equivalent recommended)
- Performance scales with GPU capabilities
Windows
- Architecture: x86_64
- For GPU processing:
- DirectX 12 compatible GPU
- Minimum 4GB VRAM (T4 equivalent recommended)
- Performance scales with GPU capabilities
Android
- Architecture: ARM64
- For GPU processing:
- OpenGL ES 3.1+ compatible GPU
- Minimum 2GB VRAM
- Performance scales with GPU capabilities
Apple (iOS/macOS)
- Architecture: Apple Silicon (Apple ARM) or Intel x86_64
- Neural Engine: Optional
- For GPU processing:
- Metal-compatible GPU
- Minimum 2GB VRAM
- Performance scales with GPU capabilities
Software Dependencies
Platform | Dependencies |
---|---|
Linux | • OpenSSL • libsndfile • CMake 3.22.1 or higher • For GPU processing: CUDA 11.8 |
Windows | • OpenSSL • CMake 3.22.1 or higher • For GPU processing: DirectX 12 SDK |
Android | No dependencies needed |
Apple (iOS/macOS) | No dependencies needed |
Quick Start Guide
Running the demo application
Each SDK zip package contains a README.md file that contains instructions on how to integrate and run the provided demo application. The demo provides a very simple example of how to use our SDK on the platform. For help with more sophisticated integration flow, please contact support@audioshake.ai for further assistance from one of our engineers.
API Reference
This section describes the core classes and data structures of our SDK so that you can understand how to best use them.
AudioShakeSeparator Class
The AudioShakeSeparator class is the base foundation class for doing inference on the device. Below is detailed information about the parameters and configuration options for this class.
Constructor
AudioShakeSeparator(
const char *clientID, // Your AudioShake client identifier
const char *clientSecret, // Your AudioShake client secret key
void *model, // Pointer to the encrypted model data in memory
unsigned int modelSizeBytes, // Size of the model data in bytes
unsigned int inputSamplerate,// Sample rate of input audio (e.g., 44100 for CD quality)
unsigned int flags = useFastestBackend | inputFloat | inputNonInterleaved | outputFloat | outputNonInterleaved | chunkNormal
);
Parameters
Parameter | Type | Description |
---|---|---|
clientID | const char* | Your unique AudioShake client identifier. Required for authentication. Must be a valid string provided by AudioShake. Cannot be null or empty. |
clientSecret | const char* | Your AudioShake client secret key. Required for secure authentication. Must be a valid string provided by AudioShake. Cannot be null or empty. |
model | void* | Pointer to the encrypted model data in memory. Must be a valid memory address. Should contain the complete model data. Must match the size specified in modelSizeBytes. |
modelSizeBytes | unsigned int | Size of the model data in bytes. Must be greater than 0. Must match the actual size of the model data. Used to validate model integrity. |
inputSamplerate | unsigned int | Sample rate of the input audio in Hz. Common values: 44100 (CD quality), 48000 (Professional audio). Must be a positive integer. Will be automatically resampled if needed. |
flags | unsigned int | Combination of configuration flags. Default: useFastestBackend | inputFloat | inputNonInterleaved | outputFloat | outputNonInterleaved | chunkNormal . Can be customized based on your needs. |
Configuration Flags
The SDK provides several configuration flags to customize processing:
Backend Options:
useFastestBackend
: Uses fastest available hardware (default)useCPUBackendOnly
: Forces CPU-only processing
Input Format Options:
inputFloat
: 32-bit floating point input (default)inputInt16
: 16-bit signed integer inputinputNonInterleaved
: Separate channel buffers (default)inputInterleavedStereo
: Interleaved stereo format
Output Format Options:
outputFloat
: 32-bit floating point output (default)outputInt16
: 16-bit signed integer outputoutputNonInterleaved
: Separate channel buffers (default)outputInterleavedStereo
: Interleaved stereo format
Processing Chunk Size Options:
chunkNormal
: Default chunk size (~3 seconds)chunk2X
: Half chunk size (~1.5 seconds)chunk4X
: Quarter chunk size (~0.75 seconds)chunk8X
: Eighth chunk size (~0.375 seconds)
Key Methods
Error Handling
const char *getInitializationError();
Returns initialization error message or NULL if successful.
Backend Information
const char *getBackendName();
Returns the name of the ML engine and hardware being used (e.g., "LiteRT GPU" or "CPU").
Stem Information
const char *getStemName(unsigned char stemIndex); // Index of the stem (0 to getNumberOfStems()-1)
unsigned char getNumberOfStems(); // Returns total number of available stems
Provides information about available stems.
Audio Configuration
unsigned int getNumberOfChannels(); // Returns number of audio channels (typically 2 for stereo)
unsigned int getOutputSamplerate(); // Returns the sample rate of output audio
unsigned int getFramesNeeded(); // Returns number of input frames needed for next processing chunk
Returns audio configuration details.
Processing
int process(
void **inputChannels, // Array of pointers to input channel buffers
unsigned int numInputFrames, // Number of frames to process
void ***outputChannels // Pointer to receive array of output channel buffers
);
Processes input audio and returns separated stems.
Return Values:
- Positive integer: Number of output frames processed
- 0: More input is needed
- -1: Error occurred
Process Method Parameters
Parameter | Type | Description |
---|---|---|
inputChannels | void** | Array of pointers to input channel buffers. Each pointer should point to a valid audio buffer. Number of channels should match getNumberOfChannels(). Can be NULL to flush remaining audio (end of file). |
numInputFrames | unsigned int | Number of frames to process. Must be a positive integer. Should not exceed available buffer space. Typical values: 1024, 2048, 4096 frames. Affects processing latency and memory usage. |
outputChannels | void*** | Pointer to receive array of output channel buffers. Will be populated with separated stem data. May be NULL if more input is needed. Each stem will have getNumberOfChannels() channels. Memory must be allocated by the caller. |
SourceSeparationTask - Primary Interface
The SourceSeparationTask class provides a high-level wrapper around the AudioShakeSeparator, making it easier to perform audio source separation tasks. It handles the complexities of audio input/output management and provides progress tracking capabilities. We provide this class to you in our demo code as a helper.
Note: This class is only used for Apple and Windows SDK packages. Android uses the AudioShakeSeparator directly along with ExoPlayer's buffer management to demonstrate media handling. For further information or questions please contact support@audioshake.ai.
Key Components
Input Classes
- SourceSeparationInput (Abstract base class)
- AudioFileReader: Reads audio from files
- RingBufferInput: Reads audio from a ring buffer
Output Classes
- SourceSeparationOutput (Abstract base class)
- WAVOutput: Writes stems to WAV files
- RingBufferOutput: Writes stems to a ring buffer
Usage Example
// Create input and output handlers
AudioFileReader* input = new AudioFileReader("input.wav");
WAVOutput* output = new WAVOutput("output_directory");
// Create and run the separation task
SourceSeparationTask task(
"your_client_id",
"your_client_secret",
input,
output,
"path/to/model.crypt"
);
// Optional progress callback
void progressCallback(SourceSeparationTask* task, double progress, void* clientData) {
std::cout << "Progress: " << (progress * 100) << "%" << std::endl;
}
// Run the separation
if (!task.run(progressCallback)) {
std::cerr << "Error: " << task.getErrorMessage() << std::endl;
}
Constructor
SourceSeparationTask(
const char* clientID, // Your AudioShake client identifier
const char* clientSecret, // Your AudioShake client secret key
SourceSeparationInput* input, // Input audio provider (e.g., AudioFileReader)
SourceSeparationOutput* output, // Output audio writer (e.g., WAVOutput)
const PathStr& modelPath, // Path to the .crypt model file
void* model = nullptr, // Optional: pre-loaded model data
unsigned int modelSizeBytes = 0,// Optional: size of pre-loaded model
unsigned int additionalFlags = 0 // Optional: additional AudioShakeSeparator flags
);
Constructor Parameters
Parameter | Type | Description |
---|---|---|
clientID | const char* | Your AudioShake client identifier. Required for authentication. Must be a valid string provided by AudioShake. Cannot be null or empty. |
clientSecret | const char* | Your AudioShake client secret key. Required for secure authentication. Must be a valid string provided by AudioShake. Cannot be null or empty. |
input | SourceSeparationInput* | Pointer to a SourceSeparationInput implementation. Manages audio input data. Can be AudioFileReader or RingBufferInput. Must be a valid, initialized instance. Ownership is transferred to the task. |
output | SourceSeparationOutput* | Pointer to a SourceSeparationOutput implementation. Manages audio output data. Can be WAVOutput or RingBufferOutput. Must be a valid, initialized instance. Ownership is transferred to the task. |
modelPath | const PathStr& | Path to the .crypt model file. Required if model and modelSizeBytes are not provided. Must be a valid file path. File must be readable. |
model | void* | Optional pre-loaded model data. If provided, modelPath is ignored. Must be valid memory if provided. Should be used with modelSizeBytes. |
modelSizeBytes | unsigned int | Size of pre-loaded model data. Required if model is provided. Must match actual model size. Ignored if model is nullptr. |
additionalFlags | unsigned int | Additional configuration flags. Combined with default flags. See AudioShakeSeparator flags. Optional, defaults to 0. |
Progress Tracking
double getProgress(); // Returns progress as a value between 0 and 1
bool isFinished(); // Returns true when all audio has been processed
Processing Methods
bool processOneIteration(
ProgressCallback progressCallback = nullptr, // Optional callback for progress updates
void* clientData = nullptr // Optional user data passed to callback
);
Processes a single chunk of audio. Returns true on success, false on error.
bool run(
ProgressCallback progressCallback = nullptr, // Optional callback for progress updates
void* clientData = nullptr // Optional user data passed to callback
);
Processes all audio until completion. Returns true on success, false on error.
Progress Callback Type
typedef void (*ProgressCallback)(
SourceSeparationTask* task, // Pointer to the task reporting progress
double progress, // Progress value between 0.0 and 1.0
void* clientData // User data provided in processOneIteration/run
);
Input Classes
AudioFileReader
AudioFileReader(const char* filePath); // Path to the input audio file
Parameter | Description |
---|---|
filePath | Path to the input audio file. Must be a valid file path. File must exist and be readable. Supports common audio formats (WAV, MP3, etc.) |
RingBufferInput
RingBufferInput(
size_t size, // Size of the ring buffer in frames
bool isStereoInterleaved, // True for interleaved stereo, false for separate channels
int sampleRate // Sample rate of the audio data
);
Parameter | Description |
---|---|
size | Size of the ring buffer in frames. Must be a positive integer. Affects latency and memory usage. Should be a power of 2 for optimal performance. |
isStereoInterleaved | Format of stereo data. true : [LRLRLR...] format. false : [LLL...RRR...] format. Must match input data format. |
sampleRate | Sample rate of the audio data. Must be a positive integer. Common values: 44100, 48000. Must match input data sample rate. |
Output Classes
WAVOutput
WAVOutput(const char* outputPath); // Directory path for output WAV files
Parameter | Description |
---|---|
outputPath | Directory path for output WAV files. Must be a valid directory path. Directory must exist and be writable. Each stem will be saved as a separate WAV file. |
RingBufferOutput
RingBufferOutput(
size_t size, // Size of the ring buffer in frames
bool isStereoInterleaved // True for interleaved stereo, false for separate channels
);
Parameter | Description |
---|---|
size | Size of the ring buffer in frames. Must be a positive integer. Affects latency and memory usage. Should be a power of 2 for optimal performance. |
isStereoInterleaved | Format of stereo data. true : [LRLRLR...] format. false : [LLL...RRR...] format. Must match desired output format. |
Performance Considerations
Optimization Tips
- Chunk Size: Use appropriate chunk sizes based on your latency requirements
- GPU Processing: Consider using GPU processing for better performance
- Memory Layout: Use non-interleaved format for better memory access patterns
- Memory Management: Process audio in chunks to manage memory usage
Error Handling
The SDK provides error information through:
- Constructor initialization errors
- Process method return values
- Backend-specific error messages
Platform Support
Supported Platforms
- Linux (x86_64)
- Windows
- Android (with platform-specific optimizations)
- Apple (iOS/macOS)
Platform-Specific Considerations
Platform | Considerations |
---|---|
Linux | Full GPU support with CUDA |
Windows | DirectX support for GPU processing |
Android | Optimized for mobile GPUs |
Apple | Metal and Neural Engine support |
Security
The SDK implements several security features:
- Encrypted model loading
- Secure client authentication
- Protected API access
- Date-based expiry
This documentation provides a comprehensive overview of the AudioShake SDK. For specific implementation details or advanced usage scenarios, please contact support@audioshake.ai.