Deploy Student Model
Deploy your distilled student model to production.
Optimize for Production
Before deployment, optimize the model for inference:
Verify Optimized Model
Ensure optimization didn’t hurt accuracy:
# Evaluate optimized model on test set
optimized_results = client.evaluate_model(
model_id=optimized_model.id,
dataset_id=test_dataset.id,
version_id=test_version.id,
split="test"
)
print("\nPost-Optimization Accuracy Check")
print("-" * 50)
print(f"Original student: {student_results['accuracy']:.2%}")
print(f"Optimized student: {optimized_results['accuracy']:.2%}")
print(f"Accuracy loss: {student_results['accuracy'] - optimized_results['accuracy']:.2%}")
if student_results['accuracy'] - optimized_results['accuracy'] > 0.01:
print("⚠️ Warning: Optimization caused >1% accuracy loss")
print(" Consider: FP16 quantization instead of INT8")Deploy to Cloud API
Use the Deployed Model
Export for Edge/Mobile
For on-device deployment:
Retire the Teacher
Once the student is deployed and validated in production:
# Disable teacher post-processor (stop paying for teacher inference)
client.update_post_processor(
processor_id=teacher_processor.id,
enabled=False
)
print("Teacher post-processor disabled")
# Optionally: Update to use student for future labeling
# (active learning - student labels, humans review, retrain)
student_processor = client.create_post_processor(
dataset_id=dataset.id,
name="Student Auto-Labeler",
model_type="classification",
model_id=optimized_model.id,
output_target="annotations",
confidence_threshold=0.9, # High threshold for auto-labeling
enabled=True
)Monitor Production Performance
Set up monitoring to catch accuracy degradation:
# Log predictions for monitoring
def predict_with_logging(client, model_id, file_path):
result = client.predict(model_id=model_id, item=file_path)
# Log for monitoring
client.log_prediction(
model_id=model_id,
prediction=result.prediction,
confidence=result.confidence,
metadata={
"file": file_path,
"timestamp": datetime.now().isoformat()
}
)
return result
# Set up alerts for low confidence predictions
alert_config = {
"low_confidence_threshold": 0.7,
"low_confidence_alert_pct": 0.1, # Alert if >10% predictions are low confidence
"drift_detection": True
}
client.configure_model_monitoring(
model_id=optimized_model.id,
config=alert_config
)Continuous Improvement
Keep improving the student over time:
graph LR
A[Student in Production] --> B[Collect Low-Confidence Predictions]
B --> C[Human Review]
C --> D[Add to Training Data]
D --> E[Retrain Student]
E --> F[Evaluate]
F --> G{Better?}
G -->|Yes| H[Deploy New Student]
G -->|No| A
H --> A# Collect predictions for review
low_conf_predictions = client.get_predictions(
model_id=optimized_model.id,
max_confidence=0.7,
min_date="2024-01-01",
limit=500
)
print(f"Found {len(low_conf_predictions)} low-confidence predictions to review")
# Add to training dataset for review
for pred in low_conf_predictions:
client.create_dataset_item(
version_id=review_version.id,
split_id=review_split.id,
file_path=pred.input_path,
metadata={
"source": "production_low_confidence",
"original_prediction": pred.prediction,
"original_confidence": pred.confidence
}
)
# After human review → retrain → evaluate → deploy if betterDeployment Checklist
ℹ️
Before deploying:
- Optimized model (ONNX, quantized)
- Verified accuracy after optimization
- Tested inference latency
- Set up monitoring and alerts
- Documented model version and training data
- Rollback plan in place
After deploying:
- Verify predictions in production
- Monitor confidence distribution
- Track latency and throughput
- Set up feedback collection
- Schedule periodic retraining
Summary
You’ve completed the full distillation pipeline:
- ✅ Set up a teacher model
- ✅ Labeled data with the teacher
- ✅ Trained a small student model
- ✅ Evaluated and compared both models
- ✅ Deployed the optimized student
Results:
- Smaller model (10-50x)
- Faster inference (10-100x)
- Lower cost (10-1000x)
- Similar accuracy (within 2-5%)