Deploy Student Model

Deploy Student Model

Deploy your distilled student model to production.

Optimize for Production

Before deployment, optimize the model for inference:

Verify Optimized Model

Ensure optimization didn’t hurt accuracy:

# Evaluate optimized model on test set
optimized_results = client.evaluate_model(
    model_id=optimized_model.id,
    dataset_id=test_dataset.id,
    version_id=test_version.id,
    split="test"
)

print("\nPost-Optimization Accuracy Check")
print("-" * 50)
print(f"Original student: {student_results['accuracy']:.2%}")
print(f"Optimized student: {optimized_results['accuracy']:.2%}")
print(f"Accuracy loss: {student_results['accuracy'] - optimized_results['accuracy']:.2%}")

if student_results['accuracy'] - optimized_results['accuracy'] > 0.01:
    print("⚠️  Warning: Optimization caused >1% accuracy loss")
    print("   Consider: FP16 quantization instead of INT8")

Deploy to Cloud API

Use the Deployed Model

Export for Edge/Mobile

For on-device deployment:

Retire the Teacher

Once the student is deployed and validated in production:

# Disable teacher post-processor (stop paying for teacher inference)
client.update_post_processor(
    processor_id=teacher_processor.id,
    enabled=False
)

print("Teacher post-processor disabled")

# Optionally: Update to use student for future labeling
# (active learning - student labels, humans review, retrain)
student_processor = client.create_post_processor(
    dataset_id=dataset.id,
    name="Student Auto-Labeler",
    model_type="classification",
    model_id=optimized_model.id,
    output_target="annotations",
    confidence_threshold=0.9,  # High threshold for auto-labeling
    enabled=True
)

Monitor Production Performance

Set up monitoring to catch accuracy degradation:

# Log predictions for monitoring
def predict_with_logging(client, model_id, file_path):
    result = client.predict(model_id=model_id, item=file_path)

    # Log for monitoring
    client.log_prediction(
        model_id=model_id,
        prediction=result.prediction,
        confidence=result.confidence,
        metadata={
            "file": file_path,
            "timestamp": datetime.now().isoformat()
        }
    )

    return result

# Set up alerts for low confidence predictions
alert_config = {
    "low_confidence_threshold": 0.7,
    "low_confidence_alert_pct": 0.1,  # Alert if >10% predictions are low confidence
    "drift_detection": True
}

client.configure_model_monitoring(
    model_id=optimized_model.id,
    config=alert_config
)

Continuous Improvement

Keep improving the student over time:

graph LR
    A[Student in Production] --> B[Collect Low-Confidence Predictions]
    B --> C[Human Review]
    C --> D[Add to Training Data]
    D --> E[Retrain Student]
    E --> F[Evaluate]
    F --> G{Better?}
    G -->|Yes| H[Deploy New Student]
    G -->|No| A
    H --> A
# Collect predictions for review
low_conf_predictions = client.get_predictions(
    model_id=optimized_model.id,
    max_confidence=0.7,
    min_date="2024-01-01",
    limit=500
)

print(f"Found {len(low_conf_predictions)} low-confidence predictions to review")

# Add to training dataset for review
for pred in low_conf_predictions:
    client.create_dataset_item(
        version_id=review_version.id,
        split_id=review_split.id,
        file_path=pred.input_path,
        metadata={
            "source": "production_low_confidence",
            "original_prediction": pred.prediction,
            "original_confidence": pred.confidence
        }
    )

# After human review → retrain → evaluate → deploy if better

Deployment Checklist

ℹ️

Before deploying:

  • Optimized model (ONNX, quantized)
  • Verified accuracy after optimization
  • Tested inference latency
  • Set up monitoring and alerts
  • Documented model version and training data
  • Rollback plan in place

After deploying:

  • Verify predictions in production
  • Monitor confidence distribution
  • Track latency and throughput
  • Set up feedback collection
  • Schedule periodic retraining

Summary

You’ve completed the full distillation pipeline:

  1. ✅ Set up a teacher model
  2. ✅ Labeled data with the teacher
  3. ✅ Trained a small student model
  4. ✅ Evaluated and compared both models
  5. ✅ Deployed the optimized student

Results:

  • Smaller model (10-50x)
  • Faster inference (10-100x)
  • Lower cost (10-1000x)
  • Similar accuracy (within 2-5%)

Related Guides