Deploy Your Model
Your model is trained and optimized. Now let’s deploy it to production. SeeMe.ai supports cloud, on-premise, edge, and mobile deployment.
Deployment Options
graph TD
A[Optimized Model] --> B{Deployment Target}
B --> C[Cloud API]
B --> D[On-Premise]
B --> E[Edge Device]
B --> F[Mobile App]
C --> G[SeeMe.ai Hosted]
C --> H[Your Cloud]
D --> I[Docker Container]
E --> J[NVIDIA Jetson]
E --> K[Raspberry Pi]
F --> L[iOS App]
F --> M[Android App]Cloud Deployment (Default)
Your model is automatically deployed to SeeMe.ai’s cloud when training completes.
Using the Web Platform
- Navigate to Models > Your Model
- Click API tab
- View your endpoint URL and authentication
Your model endpoint:
POST https://api.seeme.ai/api/v1/inferences/{model_id}Making Predictions
On-Premise Deployment
Deploy models within your own infrastructure for data sovereignty and air-gapped environments.
Docker Container
- Export your model container:
# Generate deployment container
container = client.create_deployment(
model_id=model.id,
version_id=version.id,
deployment_type="docker"
)
print(f"Container image: {container.image_url}")- Pull and run the container:
# Pull the model container
docker pull registry.seeme.ai/models/your-model-id:latest
# Run the container
docker run -d \
-p 8080:8080 \
--name my-model \
registry.seeme.ai/models/your-model-id:latest- Make predictions:
curl -X POST "http://localhost:8080/predict" \
-F "file=@image.jpg"Docker Compose
version: '3.8'
services:
model:
image: registry.seeme.ai/models/your-model-id:latest
ports:
- "8080:8080"
environment:
- WORKERS=4
- TIMEOUT=30
volumes:
- ./models:/models
deploy:
resources:
reservations:
devices:
- driver: nvidia
count: 1
capabilities: [gpu]Kubernetes Deployment
apiVersion: apps/v1
kind: Deployment
metadata:
name: image-classifier
spec:
replicas: 3
selector:
matchLabels:
app: image-classifier
template:
metadata:
labels:
app: image-classifier
spec:
containers:
- name: model
image: registry.seeme.ai/models/your-model-id:latest
ports:
- containerPort: 8080
resources:
limits:
nvidia.com/gpu: 1
requests:
memory: "2Gi"
cpu: "1000m"
---
apiVersion: v1
kind: Service
metadata:
name: image-classifier-service
spec:
selector:
app: image-classifier
ports:
- port: 80
targetPort: 8080
type: LoadBalancerEdge Deployment
Deploy to edge devices for low-latency inference without internet connectivity.
NVIDIA Jetson
# On your Jetson device
docker pull registry.seeme.ai/models/your-model-id:jetson
docker run -d \
--runtime nvidia \
-p 8080:8080 \
registry.seeme.ai/models/your-model-id:jetsonRaspberry Pi
# Export TFLite model
client.export_model(
model_id=model.id,
version_id=version.id,
format="tflite",
quantization="int8",
output_path="./model.tflite"
)Then on your Raspberry Pi:
import tflite_runtime.interpreter as tflite
import numpy as np
from PIL import Image
# Load model
interpreter = tflite.Interpreter(model_path="model.tflite")
interpreter.allocate_tensors()
# Get input/output details
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
# Prepare image
image = Image.open("test.jpg").resize((224, 224))
input_data = np.array(image, dtype=np.float32)
input_data = np.expand_dims(input_data, axis=0)
# Run inference
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
output = interpreter.get_tensor(output_details[0]['index'])
print(f"Predictions: {output}")Mobile Deployment
Deploy models directly to iOS and Android apps.
iOS (Core ML)
- Export to Core ML:
client.export_model(
model_id=model.id,
version_id=version.id,
format="coreml",
output_path="./MyModel.mlmodel"
)- Add to your Xcode project and use:
import CoreML
import Vision
// Load model
guard let model = try? VNCoreMLModel(for: MyModel().model) else {
return
}
// Create request
let request = VNCoreMLRequest(model: model) { request, error in
guard let results = request.results as? [VNClassificationObservation] else {
return
}
for result in results.prefix(3) {
print("\(result.identifier): \(result.confidence)")
}
}
// Run on image
let handler = VNImageRequestHandler(cgImage: image)
try? handler.perform([request])Android (TensorFlow Lite)
- Export to TFLite:
client.export_model(
model_id=model.id,
version_id=version.id,
format="tflite",
output_path="./model.tflite"
)- Add to your Android project:
import org.tensorflow.lite.Interpreter
import java.nio.ByteBuffer
// Load model
val model = FileUtil.loadMappedFile(context, "model.tflite")
val interpreter = Interpreter(model)
// Prepare input
val inputBuffer = ByteBuffer.allocateDirect(224 * 224 * 3 * 4)
// ... fill buffer with image data
// Run inference
val outputBuffer = Array(1) { FloatArray(numClasses) }
interpreter.run(inputBuffer, outputBuffer)
// Get results
val predictions = outputBuffer[0]Model Versioning
Manage multiple model versions in production:
# List all versions
versions = client.get_model_versions(model_id)
for v in versions:
print(f"Version {v.version_number}: {v.status}")
# Set active version
client.set_active_version(
model_id=model.id,
version_id=new_version.id
)
# Rollback to previous version
client.set_active_version(
model_id=model.id,
version_id=previous_version.id
)Scaling Your Deployment
Auto-scaling Configuration
# Configure auto-scaling
client.update_deployment(
model_id=model.id,
config={
"min_replicas": 1,
"max_replicas": 10,
"target_concurrency": 50,
"scale_down_delay": 300 # 5 minutes
}
)Load Balancing
For high-throughput applications:
upstream model_servers {
least_conn;
server model1:8080;
server model2:8080;
server model3:8080;
}
server {
listen 80;
location /predict {
proxy_pass http://model_servers;
proxy_connect_timeout 5s;
proxy_read_timeout 30s;
}
}Security Considerations
API Key Management
# Create a scoped API key for this model only
api_key = client.create_api_key(
name="Production Image Classifier",
scopes=["models:predict"],
model_ids=[model.id],
expires_in_days=90
)
print(f"API Key: {api_key.key}") # Only shown onceNetwork Security
- Use HTTPS for all API calls
- Restrict access with IP allowlists
- Enable request rate limiting
- Monitor for unusual patterns
Deployment Checklist
Before going live:
- Model tested with production-like data
- API key created with minimal permissions
- Error handling implemented
- Logging and monitoring configured
- Rollback plan documented
- Performance benchmarked
- Security review completed