Configure Training
Set up hyperparameters and training configuration for optimal finetuning results.
Key Hyperparameters
| Parameter | What It Controls | Typical Range |
|---|---|---|
| Learning rate | How fast the model learns | 0.0001 - 0.01 |
| Batch size | Samples processed together | 8 - 128 |
| Epochs | Passes through the dataset | 5 - 50 |
| Image size | Input resolution | 224 - 512 |
| Freeze layers | Which layers to keep fixed | none / early / most |
Learning Rate
The most important hyperparameter. Too high = unstable training. Too low = slow convergence.
Recommended Starting Points
| Scenario | Learning Rate |
|---|---|
| Finetuning (freeze early layers) | 0.001 |
| Finetuning (all layers) | 0.0001 |
| Training from scratch | 0.01 |
| Large model (ViT, BERT-large) | 0.00001 - 0.0001 |
| Small dataset (<500 samples) | 0.0001 (lower to avoid overfitting) |
Learning Rate Schedulers
## Constant learning rate (simple)
config = {
"learning_rate": 0.001,
"lr_scheduler": "constant"
}
# Step decay (reduce every N epochs)
config = {
"learning_rate": 0.001,
"lr_scheduler": "step",
"lr_step_size": 10, # Reduce every 10 epochs
"lr_gamma": 0.1 # Multiply by 0.1
}
# Cosine annealing (smooth decay)
config = {
"learning_rate": 0.001,
"lr_scheduler": "cosine",
"lr_min": 0.00001 # Minimum learning rate
}
# One-cycle (warmup → max → decay)
config = {
"learning_rate": 0.001,
"lr_scheduler": "one_cycle",
"lr_max": 0.01 # Peak learning rate
}graph LR
subgraph "Learning Rate Schedules"
A[Constant] --> B[───────────]
C[Step] --> D[───┐__┐__]
E[Cosine] --> F[╲________]
G[One-Cycle] --> H[╱╲______]
endBatch Size
Larger batches = more stable gradients, faster training. But limited by GPU memory.
| GPU Memory | Recommended Batch Size |
|---|---|
| 4 GB | 8-16 |
| 8 GB | 16-32 |
| 16 GB | 32-64 |
| 24 GB+ | 64-128 |
# Adjust batch size to fit GPU memory
config = {
"batch_size": 32,
"gradient_accumulation": 2 # Effective batch size = 32 * 2 = 64
}ℹ️
If you get out-of-memory errors, reduce batch size or use gradient accumulation.
Epochs
How many times to iterate through the training data.
| Dataset Size | Suggested Epochs |
|---|---|
| Small (<500) | 30-50 (with early stopping) |
| Medium (500-5000) | 15-30 |
| Large (5000+) | 5-15 |
# With early stopping to prevent overfitting
config = {
"epochs": 50,
"early_stopping": True,
"early_stopping_patience": 5, # Stop if no improvement for 5 epochs
"early_stopping_metric": "val_loss"
}Layer Freezing
Control which layers get updated during finetuning.
graph LR
subgraph "Model Layers"
A[Input] --> B[Early Layers
General Features]
B --> C[Middle Layers
Abstract Features]
C --> D[Late Layers
Task-Specific]
D --> E[Output]
end
style B fill:#fee2e2
style C fill:#fef3c7
style D fill:#d1fae5| Strategy | What’s Frozen | When to Use |
|---|---|---|
| freeze=none | Nothing | Large dataset, different domain |
| freeze=early | First ~50% of layers | Default for finetuning |
| freeze=most | First ~80% of layers | Small dataset, similar domain |
| freeze=all_but_head | Everything except final layer | Very small dataset |
# Freeze early layers (recommended default)
config = {
"base_model": "efficientnet_b0",
"freeze_layers": "early",
"learning_rate": 0.001
}
# Progressive unfreezing (advanced)
config = {
"base_model": "efficientnet_b0",
"freeze_layers": "progressive",
"unfreeze_epochs": [5, 10, 15] # Unfreeze more layers at these epochs
}Data Augmentation
Artificially expand your dataset by transforming images.
Image Augmentation
config = {
"augmentation": {
# Geometric transforms
"horizontal_flip": True, # Flip left-right
"vertical_flip": False, # Flip up-down (if makes sense)
"rotation": 15, # Random rotation ±15 degrees
"scale": [0.8, 1.2], # Random scaling 80-120%
"translate": 0.1, # Random translation ±10%
# Color transforms
"brightness": 0.2, # Random brightness ±20%
"contrast": 0.2, # Random contrast ±20%
"saturation": 0.2, # Random saturation ±20%
"hue": 0.1, # Random hue shift ±10%
# Advanced
"cutout": 0.5, # Random rectangular cutout
"mixup": 0.2, # Mix two images
"auto_augment": "imagenet" # Learned augmentation policy
}
}When to Use Which Augmentation
| Augmentation | Good For | Avoid When |
|---|---|---|
| Horizontal flip | Most images | Text, asymmetric objects |
| Vertical flip | Aerial images, microscopy | Faces, scenes with gravity |
| Rotation | Objects at various angles | Documents, fixed orientation |
| Color jitter | Varying lighting conditions | Color is diagnostic |
| Cutout | Robustness to occlusion | Small objects |
Text Augmentation
config = {
"augmentation": {
"synonym_replacement": 0.1, # Replace 10% of words with synonyms
"random_insertion": 0.1, # Insert random synonyms
"random_swap": 0.1, # Swap adjacent words
"random_deletion": 0.1, # Delete random words
"back_translation": True # Translate to another language and back
}
}Regularization
Prevent overfitting, especially important for small datasets.
config = {
# Dropout
"dropout": 0.3, # Randomly drop 30% of neurons
# Weight decay (L2 regularization)
"weight_decay": 0.01,
# Label smoothing
"label_smoothing": 0.1, # Soften labels (0.9 instead of 1.0)
# Early stopping
"early_stopping": True,
"early_stopping_patience": 5
}Complete Configuration Examples
Image Classification (Balanced)
job = client.create_job(
dataset_id=dataset.id,
version_id=version.id,
name="Product Classifier v1",
job_type="finetune",
config={
# Model
"base_model": "efficientnet_b0",
"image_size": 224,
"freeze_layers": "early",
# Training
"epochs": 30,
"batch_size": 32,
"learning_rate": 0.001,
"lr_scheduler": "cosine",
# Regularization
"dropout": 0.2,
"weight_decay": 0.01,
"label_smoothing": 0.1,
"early_stopping": True,
"early_stopping_patience": 5,
# Augmentation
"augmentation": {
"horizontal_flip": True,
"rotation": 15,
"brightness": 0.2,
"contrast": 0.2
}
}
)Object Detection
job = client.create_job(
dataset_id=dataset.id,
version_id=version.id,
name="Defect Detector v1",
job_type="finetune",
config={
# Model
"base_model": "yolov5m",
"image_size": 640,
# Training
"epochs": 100,
"batch_size": 16,
"learning_rate": 0.01,
"lr_scheduler": "cosine",
# Detection-specific
"iou_threshold": 0.5,
"confidence_threshold": 0.25,
"nms_threshold": 0.45,
# Augmentation
"augmentation": {
"mosaic": True,
"mixup": 0.1,
"hsv_h": 0.015,
"hsv_s": 0.7,
"hsv_v": 0.4,
"flip_lr": 0.5,
"scale": 0.5
}
}
)Text Classification
job = client.create_job(
dataset_id=dataset.id,
version_id=version.id,
name="Sentiment Classifier v1",
job_type="finetune",
config={
# Model
"base_model": "distilbert-base-uncased",
"max_length": 256,
# Training
"epochs": 5,
"batch_size": 16,
"learning_rate": 2e-5,
"lr_scheduler": "linear",
"warmup_ratio": 0.1,
# Regularization
"dropout": 0.1,
"weight_decay": 0.01
}
)Configuration Tips
- Start with defaults - Use recommended values, then tune
- One change at a time - Adjust one hyperparameter per experiment
- Monitor validation loss - It tells you if you’re overfitting
- Use early stopping - Don’t overtrain
- Log everything - You’ll want to compare runs later
- Save checkpoints - Keep models from different epochs
Hyperparameter Search (Advanced)
For systematic hyperparameter optimization:
# Define search space
search_config = {
"search_space": {
"learning_rate": {"type": "log_uniform", "min": 1e-5, "max": 1e-2},
"batch_size": {"type": "choice", "values": [16, 32, 64]},
"dropout": {"type": "uniform", "min": 0.1, "max": 0.5},
"freeze_layers": {"type": "choice", "values": ["early", "most", "none"]}
},
"num_trials": 20,
"optimization_metric": "val_accuracy",
"optimization_direction": "maximize"
}
# Run hyperparameter search
search_job = client.create_hyperparameter_search(
dataset_id=dataset.id,
version_id=version.id,
name="HP Search - Product Classifier",
base_config={
"base_model": "efficientnet_b0",
"epochs": 20,
"early_stopping": True
},
search_config=search_config
)Next Step
With your configuration ready, proceed to Train & Monitor to start training and track progress.