Model Distillation

Train a small, fast model that matches the quality of a large, expensive model. Use the large model as a “teacher” to label data, then train a “student” model on those labels.

Why Distillation?

graph LR
    subgraph "Before Distillation"
        A[Large Model
Accurate but slow
$$$] --> B[Predictions]
    end

    subgraph "After Distillation"
        C[Small Model
Same accuracy
10-100x faster
$] --> D[Predictions]
    end

Metric	Large Model (Teacher)	Small Model (Student)
Accuracy	95%	93%
Latency	500ms	10ms
Cost per 1000 predictions	$10	$0.10
Runs on mobile	No	Yes
GPU required	Yes	No

When to Use Distillation

✅ Good candidates:

You have a large model (or LLM) that works well but is too slow/expensive
You need to deploy to mobile, edge, or high-throughput APIs
You want to reduce inference costs
You have unlabeled data you can label with the teacher

❌ Not ideal when:

You already have enough labeled data for direct training
The task is too complex for a small model
Teacher model accuracy is insufficient

The Distillation Process

graph TD
    A[1. Select Teacher Model] --> B[2. Label Data with Teacher]
    B --> C[3. Review Labels]
    C --> D[4. Train Student Model]
    D --> E[5. Evaluate Both]
    E --> F{Student good enough?}
    F -->|Yes| G[6. Deploy Student]
    F -->|No| H[Add more data]
    H --> B

Guide Sections

1. Setup Teacher 2. Label Data 3. Train Student 4. Evaluate & Compare 5. Deploy

Quick Start

Complete Example

from seeme import Client
import time

client = Client()

## --- Step 1: Setup ---
# Teacher: Large accurate model (or LLM)
teacher_model = client.get_model("large-classifier-id")  # or LLM

# Dataset for distillation
dataset = client.create_dataset(
    name="Distillation Dataset",
    task_type="image_classification"
)
version = client.create_dataset_version(dataset_id=dataset.id, name="v1")

# Create labels
for label in ["cat", "dog", "bird", "other"]:
    client.create_label(version_id=version.id, name=label)

# --- Step 2: Label data with teacher ---
processor = client.create_post_processor(
    dataset_id=dataset.id,
    name="Teacher Labeler",
    model_type="classification",  # or "llm" for LLM teacher
    model_id=teacher_model.id,
    output_target="annotations",
    confidence_threshold=0.8,
    auto_create_labels=True,
    enabled=True
)

# Upload unlabeled images
import glob
for image_path in glob.glob("./unlabeled/*.jpg"):
    client.create_dataset_item(
        version_id=version.id,
        split_id=train_split.id,
        file_path=image_path
    )

# Wait for labeling to complete
while True:
    jobs = client.get_post_processor_jobs(dataset_id=dataset.id, status="pending")
    if len(jobs) == 0:
        break
    time.sleep(10)

print("Labeling complete!")

# --- Step 3: Review labels (in web UI or programmatically) ---
# Check a sample, correct errors

# --- Step 4: Train student ---
student_job = client.create_job(
    dataset_id=dataset.id,
    version_id=version.id,
    name="Student: MobileNet",
    config={
        "base_model": "mobilenet_v2",  # Small, fast model
        "epochs": 30,
        "learning_rate": 0.001,
        "batch_size": 32
    }
)

# Wait for training
while student_job.status in ["pending", "running"]:
    time.sleep(30)
    student_job = client.get_job(student_job.id)

student_model = client.get_model(student_job.model_id)
print(f"Student trained! Accuracy: {student_job.best_metrics['val_accuracy']:.2%}")

# --- Step 5: Compare teacher vs student ---
# On held-out validation set with ground truth labels
val_dataset = client.get_dataset("validation-dataset-id")

teacher_results = client.evaluate_model(
    model_id=teacher_model.id,
    dataset_id=val_dataset.id,
    split="test"
)

student_results = client.evaluate_model(
    model_id=student_model.id,
    dataset_id=val_dataset.id,
    split="test"
)

print(f"\nComparison:")
print(f"Teacher accuracy: {teacher_results['accuracy']:.2%}")
print(f"Student accuracy: {student_results['accuracy']:.2%}")
print(f"Gap: {teacher_results['accuracy'] - student_results['accuracy']:.2%}")

# --- Step 6: Deploy student if good enough ---
if student_results['accuracy'] > 0.90:  # Your threshold
    optimized = client.optimize_model(
        model_id=student_model.id,
        target_format="onnx",
        quantize=True
    )
    client.deploy_model(model_id=optimized.id, name="Production Classifier")
    print("Student deployed!")

Teacher Model Options

Teacher Type	Best For	Example
Large classifier	When you have a big pre-trained model	EfficientNet-B4, ViT-Large
LLM (Ollama)	Flexible labeling with reasoning	Llama, Mistral, Mixtral
External LLM	Best accuracy, highest cost	GPT-4, Claude
Ensemble	Maximum accuracy	Multiple models voting

Using an LLM as Teacher

# LLM teacher can handle complex classification with reasoning
processor = client.create_post_processor(
    dataset_id=dataset.id,
    name="LLM Teacher",
    model_type="llm",
    model_id=llm_model.id,
    prompt="""
    Classify this image into exactly one category:
    - cat: Any cat, kitten, or feline
    - dog: Any dog, puppy, or canine
    - bird: Any bird species
    - other: Anything else

    Look carefully at the image and return only the category name.
    """,
    output_target="annotations",
    auto_create_labels=True
)

Student Model Selection

Choose the smallest model that can still learn the task:

Task	Teacher	Student	Typical Gap
Image Classification	ViT-Large	MobileNet v2	2-5%
Image Classification	EfficientNet-B4	EfficientNet-B0	1-3%
Object Detection	YOLOv5-L	YOLOv5-S	3-8%
Text Classification	BERT-Large	DistilBERT	1-3%
Text Classification	GPT-4	DistilBERT	3-7%

Best Practices

Use confidence thresholds - Only keep high-confidence teacher predictions
Review a sample - Teacher labels aren’t perfect; review 5-10%
Evaluate on ground truth - Compare both models on human-labeled test set
Iterate - Add more data where student struggles
Right-size the student - Too small = can’t learn; too big = no benefit
Don’t forget inference cost - The whole point is to reduce it