Post-Processors

Post-processors automatically process dataset items using AI models. They can transcribe audio, classify content, detect objects, extract named entities, and more. Post-processors run automatically when new items are added to a dataset.

Post-Processors

Get all post-processors

Get all post-processors for a dataset:

processors = client.get_post_processors(my_dataset.id)

Parameter	Type	Description
dataset_id	str	The dataset id

Create a post-processor

Create a post-processor using a SeeMe.ai model:

from seeme.types import CreatePostProcessorRequest, PostProcessorModelType, PostProcessorOutputTarget

processor = CreatePostProcessorRequest(
    name="Audio Transcription",
    description="Transcribe audio files to text",
    model_id=my_stt_model.id,
    model_version=my_stt_model.active_version_id,
    model_type=PostProcessorModelType.STT,
    output_target=PostProcessorOutputTarget.TEXT,
    enabled=True,
    order=0
)

processor = client.create_post_processor(my_dataset.id, processor)

Create a post-processor using an external provider (OpenAI, Anthropic):

processor = CreatePostProcessorRequest(
    name="LLM Summarization",
    description="Summarize text using GPT-4",
    external_provider="openai",
    external_model="gpt-4",
    external_config='{"prompt": "Summarize the following text:"}',
    model_type=PostProcessorModelType.LLM,
    output_target=PostProcessorOutputTarget.TEXT,
    enabled=True,
    order=1
)

processor = client.create_post_processor(my_dataset.id, processor)

Parameter	Type	Description
dataset_id	str	The dataset id
processor	CreatePostProcessorRequest	The post-processor configuration

CreatePostProcessorRequest properties:

Property	Type	Description
name	str	The post-processor name
description	str	Description of what it does
enabled	bool	Whether it’s active. Default: True
order	int	Processing order (lower runs first). Default: 0
model_id	str	SeeMe.ai model id (for internal models)
model_version	str	Model version id (for internal models)
external_provider	str	External provider: “openai” or “anthropic”
external_model	str	External model name (e.g., “gpt-4”)
external_config	str	JSON config for external models
model_type	str	Type of processing (see below)
output_target	str	Where to store results: “text”, “annotations”, “both”

Model Types

Type	Description
`stt`	Speech-to-text transcription
`stt-diarization`	Speech-to-text with speaker diarization
`classification`	Text or image classification
`detection`	Object detection
`ner`	Named entity recognition
`ocr`	Optical character recognition
`llm`	Large language model processing

Get a post-processor

processor = client.get_post_processor(my_dataset.id, processor_id)

Parameter	Type	Description
dataset_id	str	The dataset id
processor_id	str	The post-processor id

PostProcessorWithModel properties:

Property	Type	Description
id	str	Unique id
name	str	The post-processor name
description	str	Description
model_name	str	Name of the associated model
model_kind	str	Type of model
is_external	bool	Whether it uses an external provider
provider_name	str	External provider name if applicable
model_type	str	The processing type

Update a post-processor

from seeme.types import UpdatePostProcessorRequest

updates = UpdatePostProcessorRequest(
    enabled=False,
    order=2
)

processor = client.update_post_processor(my_dataset.id, processor_id, updates)

Parameter	Type	Description
dataset_id	str	The dataset id
processor_id	str	The post-processor id
updates	UpdatePostProcessorRequest	Fields to update

Delete a post-processor

client.delete_post_processor(my_dataset.id, processor_id)

Parameter	Type	Description
dataset_id	str	The dataset id
processor_id	str	The post-processor id

Post-Processor Jobs

When a post-processor runs on a dataset item, it creates a job. You can monitor these jobs to track processing status.

Get post-processor jobs

jobs = client.get_post_processor_jobs(my_dataset.id)

for job in jobs:
    print(f"{job.name}: {job.status}")

With filtering:

params = {
    "status": "failed",
    "limit": 50
}

failed_jobs = client.get_post_processor_jobs(my_dataset.id, params)

Parameter	Type	Description
dataset_id	str	The dataset id
params	dict	Optional query parameters

PostProcessorJobWithDetails properties:

Property	Type	Description
id	str	Unique job id
post_processor_id	str	The post-processor that created this job
post_processor_name	str	Name of the post-processor
dataset_item_id	str	The item being processed
item_name	str	Name of the item
status	str	Status: “pending”, “processing”, “completed”, “failed”
attempts	int	Number of processing attempts
max_attempts	int	Maximum retry attempts
error	str	Error message if failed
result	str	Processing result
started_at	str	When processing started
completed_at	str	When processing completed

Example: Audio Transcription Pipeline

## 1. Create a dataset for audio files
dataset = Dataset(
    name="Interview Recordings",
    content_type=DatasetContentType.DOCUMENTS,
    multi_label=False,
    default_splits=True
)
dataset = client.create_dataset(dataset)

# 2. Add a speech-to-text post-processor
stt_processor = CreatePostProcessorRequest(
    name="Transcribe Audio",
    model_id=whisper_model.id,
    model_version=whisper_model.active_version_id,
    model_type=PostProcessorModelType.STT,
    output_target=PostProcessorOutputTarget.TEXT,
    enabled=True,
    order=0
)
client.create_post_processor(dataset.id, stt_processor)

# 3. Add an LLM post-processor for summarization
llm_processor = CreatePostProcessorRequest(
    name="Summarize Transcript",
    external_provider="openai",
    external_model="gpt-4",
    external_config='{"prompt": "Summarize the key points from this transcript:"}',
    model_type=PostProcessorModelType.LLM,
    output_target=PostProcessorOutputTarget.TEXT,
    enabled=True,
    order=1  # Runs after transcription
)
client.create_post_processor(dataset.id, llm_processor)

# 4. Upload audio files - they will be automatically processed
# ... upload dataset items ...

# 5. Monitor processing jobs
jobs = client.get_post_processor_jobs(dataset.id)

Graphs Organizations