Sound Classification

Guides

Build models to classify audio and speech.

What You’ll Build

AI models for audio analysis:

Sound classification - Identify sounds (birds, machines, alarms)
Speech-to-text - Transcribe audio to text
Speaker diarization - Identify who spoke when

Prerequisites

A SeeMe.ai account (sign up)
Audio files (WAV, MP3, etc.)
(Optional) Python environment with seeme SDK installed

Supported Tasks

Sound Classification Speech-to-Text Speaker Diarization

Speech-to-Text Quick Start

from seeme import Client

client = Client()

## Get Whisper model for speech-to-text
models = client.get_models()
whisper_model = next(m for m in models if "whisper" in m.name.lower())

# Transcribe audio
result = client.predict(
    model_id=whisper_model.id,
    item="./audio.wav"
)

print(result.transcription)
# With timestamps:
for segment in result.segments:
    print(f"[{segment.start:.1f}s] {segment.text}")

Supported Formats

WAV, MP3, FLAC, OGG, M4A
Up to 30 minutes per file (longer files are chunked automatically)