Skip to content
Coming Soon

SAM Audio API

Use Meta's SAM Audio model as a service. Remove speech, music, or any sound from audio files via a simple REST API - no GPU setup, no Python, no model hosting.

terminal
# 1. Get a presigned upload URL
curl https://api.soundscrub.video/v1/uploads \
  -X POST \
  -H "Authorization: Bearer ss_a1b2c3d4e5f6"
  # => { "upload_id": "upl_9f4c2a1b8e3d", "upload_url": "https://..." }

# 2. Upload your file directly to R2
curl "<upload_url>" \
  -X PUT \
  -H "Content-Type: audio/wav" \
  -T beach_walk.wav

# 3. Start the job
curl https://api.soundscrub.video/v1/jobs \
  -X POST \
  -H "Authorization: Bearer ss_a1b2c3d4e5f6" \
  -H "Content-Type: application/json" \
  -d '{"upload_id":"upl_9f4c2a1b8e3d","description":"background music"}'
  # => { "job_id": "job_d7e5b3a0f1c9", "status": "queued" }

# 4. Poll for status
curl https://api.soundscrub.video/v1/jobs/job_d7e5b3a0f1c9 \
  -H "Authorization: Bearer ss_a1b2c3d4e5f6"
  # => { "status": "complete", "download_url": "https://..." }

Simple, transparent pricing

$0.20 per 30 seconds of audio. Same credit-based system as the desktop app. Credits never expire.

Full model access

1:1 parity with SAM Audio model parameters. Text prompts, span prediction, reranking candidates.

Simple REST API

Upload audio, specify what to remove, poll for status, download clean audio. That's it.

What is SAM Audio?

SAM Audio (Segment Anything Model for Audio) is Meta's open-source AI model for audio source separation. Given a text prompt like "background music" or "people speaking", SAM Audio isolates and separates that sound from the rest of the audio.

Running SAM Audio yourself requires a GPU, Python environment, and model hosting infrastructure. SoundScrub's SAM Audio API handles all of that - you send audio and a text description via HTTP, and get clean audio back. Same model, same parameters, zero infrastructure.

Supported SAM Audio Model Parameters

Parameter Type Description
description string Text prompt describing the sound to isolate. Lowercase noun/verb phrases recommended.
predict_spans bool Automatically predict temporal spans where the target sound occurs. Default false.
reranking_candidates int Candidates to generate and rank. Higher improves quality but increases latency. Default 1.

Get early access

Join the waitlist and be the first to integrate SoundScrub into your workflow.