Post-Processors
Post-Processors
Post-processors automatically process dataset items using AI models. They can transcribe audio, classify content, detect objects, extract named entities, and more. Post-processors run automatically when new items are added to a dataset.
Post-Processors
Get all post-processors
Get all post-processors for a dataset:
processors = client.get_post_processors(my_dataset.id)| Parameter | Type | Description |
|---|---|---|
| dataset_id | str | The dataset id |
Create a post-processor
Create a post-processor using a SeeMe.ai model:
from seeme.types import CreatePostProcessorRequest, PostProcessorModelType, PostProcessorOutputTarget
processor = CreatePostProcessorRequest(
name="Audio Transcription",
description="Transcribe audio files to text",
model_id=my_stt_model.id,
model_version=my_stt_model.active_version_id,
model_type=PostProcessorModelType.STT,
output_target=PostProcessorOutputTarget.TEXT,
enabled=True,
order=0
)
processor = client.create_post_processor(my_dataset.id, processor)Create a post-processor using an external provider (OpenAI, Anthropic):
processor = CreatePostProcessorRequest(
name="LLM Summarization",
description="Summarize text using GPT-4",
external_provider="openai",
external_model="gpt-4",
external_config='{"prompt": "Summarize the following text:"}',
model_type=PostProcessorModelType.LLM,
output_target=PostProcessorOutputTarget.TEXT,
enabled=True,
order=1
)
processor = client.create_post_processor(my_dataset.id, processor)| Parameter | Type | Description |
|---|---|---|
| dataset_id | str | The dataset id |
| processor | CreatePostProcessorRequest | The post-processor configuration |
CreatePostProcessorRequest properties:
| Property | Type | Description |
|---|---|---|
| name | str | The post-processor name |
| description | str | Description of what it does |
| enabled | bool | Whether it’s active. Default: True |
| order | int | Processing order (lower runs first). Default: 0 |
| model_id | str | SeeMe.ai model id (for internal models) |
| model_version | str | Model version id (for internal models) |
| external_provider | str | External provider: “openai” or “anthropic” |
| external_model | str | External model name (e.g., “gpt-4”) |
| external_config | str | JSON config for external models |
| model_type | str | Type of processing (see below) |
| output_target | str | Where to store results: “text”, “annotations”, “both” |
Model Types
| Type | Description |
|---|---|
stt | Speech-to-text transcription |
stt-diarization | Speech-to-text with speaker diarization |
classification | Text or image classification |
detection | Object detection |
ner | Named entity recognition |
ocr | Optical character recognition |
llm | Large language model processing |
Get a post-processor
processor = client.get_post_processor(my_dataset.id, processor_id)| Parameter | Type | Description |
|---|---|---|
| dataset_id | str | The dataset id |
| processor_id | str | The post-processor id |
PostProcessorWithModel properties:
| Property | Type | Description |
|---|---|---|
| id | str | Unique id |
| name | str | The post-processor name |
| description | str | Description |
| model_name | str | Name of the associated model |
| model_kind | str | Type of model |
| is_external | bool | Whether it uses an external provider |
| provider_name | str | External provider name if applicable |
| model_type | str | The processing type |
Update a post-processor
from seeme.types import UpdatePostProcessorRequest
updates = UpdatePostProcessorRequest(
enabled=False,
order=2
)
processor = client.update_post_processor(my_dataset.id, processor_id, updates)| Parameter | Type | Description |
|---|---|---|
| dataset_id | str | The dataset id |
| processor_id | str | The post-processor id |
| updates | UpdatePostProcessorRequest | Fields to update |
Delete a post-processor
client.delete_post_processor(my_dataset.id, processor_id)| Parameter | Type | Description |
|---|---|---|
| dataset_id | str | The dataset id |
| processor_id | str | The post-processor id |
Post-Processor Jobs
When a post-processor runs on a dataset item, it creates a job. You can monitor these jobs to track processing status.
Get post-processor jobs
jobs = client.get_post_processor_jobs(my_dataset.id)
for job in jobs:
print(f"{job.name}: {job.status}")With filtering:
params = {
"status": "failed",
"limit": 50
}
failed_jobs = client.get_post_processor_jobs(my_dataset.id, params)| Parameter | Type | Description |
|---|---|---|
| dataset_id | str | The dataset id |
| params | dict | Optional query parameters |
PostProcessorJobWithDetails properties:
| Property | Type | Description |
|---|---|---|
| id | str | Unique job id |
| post_processor_id | str | The post-processor that created this job |
| post_processor_name | str | Name of the post-processor |
| dataset_item_id | str | The item being processed |
| item_name | str | Name of the item |
| status | str | Status: “pending”, “processing”, “completed”, “failed” |
| attempts | int | Number of processing attempts |
| max_attempts | int | Maximum retry attempts |
| error | str | Error message if failed |
| result | str | Processing result |
| started_at | str | When processing started |
| completed_at | str | When processing completed |
Example: Audio Transcription Pipeline
## 1. Create a dataset for audio files
dataset = Dataset(
name="Interview Recordings",
content_type=DatasetContentType.DOCUMENTS,
multi_label=False,
default_splits=True
)
dataset = client.create_dataset(dataset)
# 2. Add a speech-to-text post-processor
stt_processor = CreatePostProcessorRequest(
name="Transcribe Audio",
model_id=whisper_model.id,
model_version=whisper_model.active_version_id,
model_type=PostProcessorModelType.STT,
output_target=PostProcessorOutputTarget.TEXT,
enabled=True,
order=0
)
client.create_post_processor(dataset.id, stt_processor)
# 3. Add an LLM post-processor for summarization
llm_processor = CreatePostProcessorRequest(
name="Summarize Transcript",
external_provider="openai",
external_model="gpt-4",
external_config='{"prompt": "Summarize the key points from this transcript:"}',
model_type=PostProcessorModelType.LLM,
output_target=PostProcessorOutputTarget.TEXT,
enabled=True,
order=1 # Runs after transcription
)
client.create_post_processor(dataset.id, llm_processor)
# 4. Upload audio files - they will be automatically processed
# ... upload dataset items ...
# 5. Monitor processing jobs
jobs = client.get_post_processor_jobs(dataset.id)