With your data prepared, it’s time to train a model. This section covers training configuration, monitoring, and understanding results.
Training Overview
graph LR
A[Prepared Dataset] --> B[Configure Training]
B --> C[Start Job]
C --> D[Monitor Progress]
D --> E[Trained Model]
Using the Web Platform
Configure Training Job
Navigate to Jobs > New Job
Select your dataset and version
Configure training settings:
Setting
Recommended
Description
Framework
fast.ai
Best for image classification
Epochs
10-20
More = longer training, potentially better
Image Size
224
Larger = more detail, slower
Batch Size
32
Adjust based on GPU memory
Learning Rate
Auto
Let the system find optimal
Click Start Training
Monitor Training Progress
The training dashboard shows:
Loss curve: Should decrease over time
Accuracy: Should increase
Current epoch: Progress indicator
Estimated time remaining
Understanding the curves:
Pattern
Meaning
Action
Loss decreasing steadily
Training well
Continue
Loss plateaued
Converged
May stop early
Loss increasing
Overfitting
Add more data or regularization
Validation > Training loss
Overfitting
Use data augmentation
Review Training Results
When training completes, review:
Final metrics
Training accuracy
Validation accuracy
Loss values
Confusion matrix
Shows which classes are confused
Diagonal = correct predictions
Top losses
Images the model got wrong
Useful for finding labeling errors
Using the Python SDK
Basic Training
fromseemeimportClientclient=Client()# Start a training jobjob=client.create_job(dataset_id="your-dataset-id",dataset_version_id="your-version-id",framework="fastai",application="image_classification")print(f"Training started: {job.id}")print(f"Status: {job.status}")
Custom Configuration
# Advanced training configurationjob=client.create_job(dataset_id=dataset.id,dataset_version_id=version.id,framework="fastai",application="image_classification",config={# Training parameters"epochs":20,"image_size":224,"batch_size":32,# Model architecture"architecture":"resnet34",# or resnet50, efficientnet_b0# Data augmentation"augmentation":{"flip_horizontal":True,"flip_vertical":False,"rotation":15,"zoom":0.1,"lighting":0.2},# Learning rate"learning_rate":0.001,"lr_scheduler":"one_cycle",# Regularization"dropout":0.5,"weight_decay":0.01,# Export formats"export_formats":["pytorch","onnx","coreml","tflite"]})
Monitor Training Progress
importtimedefmonitor_job(client,job_id):"""Monitor training progress until completion."""whileTrue:job=client.get_job(job_id)print(f"Status: {job.status}")print(f"Progress: {job.progress}%")ifjob.metrics:print(f" Training Loss: {job.metrics.get('train_loss','N/A')}")print(f" Validation Loss: {job.metrics.get('valid_loss','N/A')}")print(f" Accuracy: {job.metrics.get('accuracy','N/A')}")ifjob.status=="completed":print(f"\nTraining complete!")print(f"Model ID: {job.model_id}")returnjobifjob.status=="failed":print(f"\nTraining failed: {job.error}")returnjobtime.sleep(30)# Start monitoringjob=monitor_job(client,job.id)
REST API
# Start a basic training jobcurl -X POST "https://api.seeme.ai/api/v1/jobs"\
-H "Authorization: myusername:my-api-key"\
-H "Content-Type: application/json"\
-d '{
"dataset_id": "your-dataset-id",
"dataset_version_id": "your-version-id",
"framework": "fastai",
"application": "image_classification"
}'# Start training with custom configurationcurl -X POST "https://api.seeme.ai/api/v1/jobs"\
-H "Authorization: myusername:my-api-key"\
-H "Content-Type: application/json"\
-d '{
"dataset_id": "your-dataset-id",
"dataset_version_id": "your-version-id",
"framework": "fastai",
"application": "image_classification",
"config": {
"epochs": 20,
"image_size": 224,
"batch_size": 32,
"architecture": "resnet34",
"learning_rate": 0.001,
"augmentation": {
"flip_horizontal": true,
"rotation": 15
}
}
}'# Get job status and metricscurl -X GET "https://api.seeme.ai/api/v1/jobs/{job_id}"\
-H "Authorization: myusername:my-api-key"# Get training metrics historycurl -X GET "https://api.seeme.ai/api/v1/jobs/{job_id}/metrics"\
-H "Authorization: myusername:my-api-key"
Model Architectures
Choose the right architecture for your needs:
Architecture
Accuracy
Speed
Size
Best For
ResNet18
Good
Fast
45MB
Quick prototypes
ResNet34
Better
Medium
85MB
Balanced default
ResNet50
Best
Slower
100MB
Production quality
EfficientNet-B0
Best
Fast
20MB
Mobile deployment
EfficientNet-B3
Excellent
Medium
50MB
High accuracy
# Example: Use EfficientNet for mobile deploymentjob=client.create_job(dataset_id=dataset.id,dataset_version_id=version.id,framework="fastai",application="image_classification",config={"architecture":"efficientnet_b0","epochs":15,"export_formats":["coreml","tflite"]})
Data Augmentation
Augmentation creates variations of your images during training, helping the model generalize better.
Available Augmentations
Augmentation
Description
When to Use
flip_horizontal
Mirror left-right
Most images
flip_vertical
Mirror top-bottom
Satellite imagery
rotation
Rotate by degrees
When orientation varies
zoom
Random zoom in/out
Product photos
lighting
Brightness/contrast
Varying lighting conditions
warp
Perspective distortion
Documents, signs
config={"augmentation":{"flip_horizontal":True,"flip_vertical":False,"rotation":20,# Rotate up to 20 degrees"zoom":0.2,# Zoom up to 20%"lighting":0.3,# Vary lighting 30%"warp":0.1# Slight perspective changes}}
Understanding Training Metrics
Loss
Training loss: How well the model fits training data
Validation loss: How well the model generalizes
ℹ️
Healthy training: Both losses decrease. Validation loss slightly higher than training loss.
Accuracy
Top-1 accuracy: % of images where the top prediction is correct
Top-5 accuracy: % where correct class is in top 5 predictions
Confusion Matrix
# Get confusion matrix after trainingmodel=client.get_model(job.model_id)confusion=model.confusion_matrix# Shows predictions vs actual labels# High values on diagonal = good performance# Off-diagonal = confused classes
Troubleshooting Training
Low Accuracy
Symptom
Likely Cause
Solution
< 50% accuracy
Too little data
Add more images
Stuck at random guess
Bad labels
Check for labeling errors
Good training, bad validation
Overfitting
More data, augmentation
Training Failed
Common failure reasons:
Out of memory: Reduce batch size
Corrupt images: Check dataset for bad files
Empty splits: Ensure train/validation have data
# Check for common issuesversion=client.get_dataset_version(version_id)print(f"Total items: {version.item_count}")forsplitinversion.splits:print(f" {split.name}: {split.item_count} items")# Should have items in both train and validation
Best Practices
Start small: Begin with 10 epochs, increase if needed
Use validation: Always split data for honest evaluation
Monitor overfitting: Watch for validation loss increasing
Save checkpoints: Enable model checkpointing for long runs
Experiment: Try different architectures and augmentations