Post-Processors

Post-Processors

Post-processors automatically process dataset items using AI models. They can transcribe audio, classify content, detect objects, extract named entities, and more. Post-processors run automatically when new items are added to a dataset.

Post-Processors

Get all post-processors

Get all post-processors for a dataset:

processors = client.get_post_processors(my_dataset.id)
ParameterTypeDescription
dataset_idstrThe dataset id

Create a post-processor

Create a post-processor using a SeeMe.ai model:

from seeme.types import CreatePostProcessorRequest, PostProcessorModelType, PostProcessorOutputTarget

processor = CreatePostProcessorRequest(
    name="Audio Transcription",
    description="Transcribe audio files to text",
    model_id=my_stt_model.id,
    model_version=my_stt_model.active_version_id,
    model_type=PostProcessorModelType.STT,
    output_target=PostProcessorOutputTarget.TEXT,
    enabled=True,
    order=0
)

processor = client.create_post_processor(my_dataset.id, processor)

Create a post-processor using an external provider (OpenAI, Anthropic):

processor = CreatePostProcessorRequest(
    name="LLM Summarization",
    description="Summarize text using GPT-4",
    external_provider="openai",
    external_model="gpt-4",
    external_config='{"prompt": "Summarize the following text:"}',
    model_type=PostProcessorModelType.LLM,
    output_target=PostProcessorOutputTarget.TEXT,
    enabled=True,
    order=1
)

processor = client.create_post_processor(my_dataset.id, processor)
ParameterTypeDescription
dataset_idstrThe dataset id
processorCreatePostProcessorRequestThe post-processor configuration

CreatePostProcessorRequest properties:

PropertyTypeDescription
namestrThe post-processor name
descriptionstrDescription of what it does
enabledboolWhether it’s active. Default: True
orderintProcessing order (lower runs first). Default: 0
model_idstrSeeMe.ai model id (for internal models)
model_versionstrModel version id (for internal models)
external_providerstrExternal provider: “openai” or “anthropic”
external_modelstrExternal model name (e.g., “gpt-4”)
external_configstrJSON config for external models
model_typestrType of processing (see below)
output_targetstrWhere to store results: “text”, “annotations”, “both”

Model Types

TypeDescription
sttSpeech-to-text transcription
stt-diarizationSpeech-to-text with speaker diarization
classificationText or image classification
detectionObject detection
nerNamed entity recognition
ocrOptical character recognition
llmLarge language model processing

Get a post-processor

processor = client.get_post_processor(my_dataset.id, processor_id)
ParameterTypeDescription
dataset_idstrThe dataset id
processor_idstrThe post-processor id

PostProcessorWithModel properties:

PropertyTypeDescription
idstrUnique id
namestrThe post-processor name
descriptionstrDescription
model_namestrName of the associated model
model_kindstrType of model
is_externalboolWhether it uses an external provider
provider_namestrExternal provider name if applicable
model_typestrThe processing type

Update a post-processor

from seeme.types import UpdatePostProcessorRequest

updates = UpdatePostProcessorRequest(
    enabled=False,
    order=2
)

processor = client.update_post_processor(my_dataset.id, processor_id, updates)
ParameterTypeDescription
dataset_idstrThe dataset id
processor_idstrThe post-processor id
updatesUpdatePostProcessorRequestFields to update

Delete a post-processor

client.delete_post_processor(my_dataset.id, processor_id)
ParameterTypeDescription
dataset_idstrThe dataset id
processor_idstrThe post-processor id

Post-Processor Jobs

When a post-processor runs on a dataset item, it creates a job. You can monitor these jobs to track processing status.

Get post-processor jobs

jobs = client.get_post_processor_jobs(my_dataset.id)

for job in jobs:
    print(f"{job.name}: {job.status}")

With filtering:

params = {
    "status": "failed",
    "limit": 50
}

failed_jobs = client.get_post_processor_jobs(my_dataset.id, params)
ParameterTypeDescription
dataset_idstrThe dataset id
paramsdictOptional query parameters

PostProcessorJobWithDetails properties:

PropertyTypeDescription
idstrUnique job id
post_processor_idstrThe post-processor that created this job
post_processor_namestrName of the post-processor
dataset_item_idstrThe item being processed
item_namestrName of the item
statusstrStatus: “pending”, “processing”, “completed”, “failed”
attemptsintNumber of processing attempts
max_attemptsintMaximum retry attempts
errorstrError message if failed
resultstrProcessing result
started_atstrWhen processing started
completed_atstrWhen processing completed

Example: Audio Transcription Pipeline

## 1. Create a dataset for audio files
dataset = Dataset(
    name="Interview Recordings",
    content_type=DatasetContentType.DOCUMENTS,
    multi_label=False,
    default_splits=True
)
dataset = client.create_dataset(dataset)

# 2. Add a speech-to-text post-processor
stt_processor = CreatePostProcessorRequest(
    name="Transcribe Audio",
    model_id=whisper_model.id,
    model_version=whisper_model.active_version_id,
    model_type=PostProcessorModelType.STT,
    output_target=PostProcessorOutputTarget.TEXT,
    enabled=True,
    order=0
)
client.create_post_processor(dataset.id, stt_processor)

# 3. Add an LLM post-processor for summarization
llm_processor = CreatePostProcessorRequest(
    name="Summarize Transcript",
    external_provider="openai",
    external_model="gpt-4",
    external_config='{"prompt": "Summarize the key points from this transcript:"}',
    model_type=PostProcessorModelType.LLM,
    output_target=PostProcessorOutputTarget.TEXT,
    enabled=True,
    order=1  # Runs after transcription
)
client.create_post_processor(dataset.id, llm_processor)

# 4. Upload audio files - they will be automatically processed
# ... upload dataset items ...

# 5. Monitor processing jobs
jobs = client.get_post_processor_jobs(dataset.id)