Configure Training

Configure Training

Set up hyperparameters and training configuration for optimal finetuning results.

Key Hyperparameters

ParameterWhat It ControlsTypical Range
Learning rateHow fast the model learns0.0001 - 0.01
Batch sizeSamples processed together8 - 128
EpochsPasses through the dataset5 - 50
Image sizeInput resolution224 - 512
Freeze layersWhich layers to keep fixednone / early / most

Learning Rate

The most important hyperparameter. Too high = unstable training. Too low = slow convergence.

Recommended Starting Points

ScenarioLearning Rate
Finetuning (freeze early layers)0.001
Finetuning (all layers)0.0001
Training from scratch0.01
Large model (ViT, BERT-large)0.00001 - 0.0001
Small dataset (<500 samples)0.0001 (lower to avoid overfitting)

Learning Rate Schedulers

## Constant learning rate (simple)
config = {
    "learning_rate": 0.001,
    "lr_scheduler": "constant"
}

# Step decay (reduce every N epochs)
config = {
    "learning_rate": 0.001,
    "lr_scheduler": "step",
    "lr_step_size": 10,      # Reduce every 10 epochs
    "lr_gamma": 0.1          # Multiply by 0.1
}

# Cosine annealing (smooth decay)
config = {
    "learning_rate": 0.001,
    "lr_scheduler": "cosine",
    "lr_min": 0.00001        # Minimum learning rate
}

# One-cycle (warmup → max → decay)
config = {
    "learning_rate": 0.001,
    "lr_scheduler": "one_cycle",
    "lr_max": 0.01           # Peak learning rate
}
graph LR
    subgraph "Learning Rate Schedules"
        A[Constant] --> B[───────────]
        C[Step] --> D[───┐__┐__]
        E[Cosine] --> F[╲________]
        G[One-Cycle] --> H[╱╲______]
    end

Batch Size

Larger batches = more stable gradients, faster training. But limited by GPU memory.

GPU MemoryRecommended Batch Size
4 GB8-16
8 GB16-32
16 GB32-64
24 GB+64-128
# Adjust batch size to fit GPU memory
config = {
    "batch_size": 32,
    "gradient_accumulation": 2  # Effective batch size = 32 * 2 = 64
}
ℹ️
If you get out-of-memory errors, reduce batch size or use gradient accumulation.

Epochs

How many times to iterate through the training data.

Dataset SizeSuggested Epochs
Small (<500)30-50 (with early stopping)
Medium (500-5000)15-30
Large (5000+)5-15
# With early stopping to prevent overfitting
config = {
    "epochs": 50,
    "early_stopping": True,
    "early_stopping_patience": 5,    # Stop if no improvement for 5 epochs
    "early_stopping_metric": "val_loss"
}

Layer Freezing

Control which layers get updated during finetuning.

graph LR
    subgraph "Model Layers"
        A[Input] --> B[Early Layers
General Features] B --> C[Middle Layers
Abstract Features] C --> D[Late Layers
Task-Specific] D --> E[Output] end style B fill:#fee2e2 style C fill:#fef3c7 style D fill:#d1fae5
StrategyWhat’s FrozenWhen to Use
freeze=noneNothingLarge dataset, different domain
freeze=earlyFirst ~50% of layersDefault for finetuning
freeze=mostFirst ~80% of layersSmall dataset, similar domain
freeze=all_but_headEverything except final layerVery small dataset
# Freeze early layers (recommended default)
config = {
    "base_model": "efficientnet_b0",
    "freeze_layers": "early",
    "learning_rate": 0.001
}

# Progressive unfreezing (advanced)
config = {
    "base_model": "efficientnet_b0",
    "freeze_layers": "progressive",
    "unfreeze_epochs": [5, 10, 15]  # Unfreeze more layers at these epochs
}

Data Augmentation

Artificially expand your dataset by transforming images.

Image Augmentation

config = {
    "augmentation": {
        # Geometric transforms
        "horizontal_flip": True,      # Flip left-right
        "vertical_flip": False,       # Flip up-down (if makes sense)
        "rotation": 15,               # Random rotation ±15 degrees
        "scale": [0.8, 1.2],         # Random scaling 80-120%
        "translate": 0.1,             # Random translation ±10%

        # Color transforms
        "brightness": 0.2,            # Random brightness ±20%
        "contrast": 0.2,              # Random contrast ±20%
        "saturation": 0.2,            # Random saturation ±20%
        "hue": 0.1,                   # Random hue shift ±10%

        # Advanced
        "cutout": 0.5,                # Random rectangular cutout
        "mixup": 0.2,                 # Mix two images
        "auto_augment": "imagenet"    # Learned augmentation policy
    }
}

When to Use Which Augmentation

AugmentationGood ForAvoid When
Horizontal flipMost imagesText, asymmetric objects
Vertical flipAerial images, microscopyFaces, scenes with gravity
RotationObjects at various anglesDocuments, fixed orientation
Color jitterVarying lighting conditionsColor is diagnostic
CutoutRobustness to occlusionSmall objects

Text Augmentation

config = {
    "augmentation": {
        "synonym_replacement": 0.1,   # Replace 10% of words with synonyms
        "random_insertion": 0.1,      # Insert random synonyms
        "random_swap": 0.1,           # Swap adjacent words
        "random_deletion": 0.1,       # Delete random words
        "back_translation": True      # Translate to another language and back
    }
}

Regularization

Prevent overfitting, especially important for small datasets.

config = {
    # Dropout
    "dropout": 0.3,                  # Randomly drop 30% of neurons

    # Weight decay (L2 regularization)
    "weight_decay": 0.01,

    # Label smoothing
    "label_smoothing": 0.1,          # Soften labels (0.9 instead of 1.0)

    # Early stopping
    "early_stopping": True,
    "early_stopping_patience": 5
}

Complete Configuration Examples

Image Classification (Balanced)

job = client.create_job(
    dataset_id=dataset.id,
    version_id=version.id,
    name="Product Classifier v1",
    job_type="finetune",
    config={
        # Model
        "base_model": "efficientnet_b0",
        "image_size": 224,
        "freeze_layers": "early",

        # Training
        "epochs": 30,
        "batch_size": 32,
        "learning_rate": 0.001,
        "lr_scheduler": "cosine",

        # Regularization
        "dropout": 0.2,
        "weight_decay": 0.01,
        "label_smoothing": 0.1,
        "early_stopping": True,
        "early_stopping_patience": 5,

        # Augmentation
        "augmentation": {
            "horizontal_flip": True,
            "rotation": 15,
            "brightness": 0.2,
            "contrast": 0.2
        }
    }
)

Object Detection

job = client.create_job(
    dataset_id=dataset.id,
    version_id=version.id,
    name="Defect Detector v1",
    job_type="finetune",
    config={
        # Model
        "base_model": "yolov5m",
        "image_size": 640,

        # Training
        "epochs": 100,
        "batch_size": 16,
        "learning_rate": 0.01,
        "lr_scheduler": "cosine",

        # Detection-specific
        "iou_threshold": 0.5,
        "confidence_threshold": 0.25,
        "nms_threshold": 0.45,

        # Augmentation
        "augmentation": {
            "mosaic": True,
            "mixup": 0.1,
            "hsv_h": 0.015,
            "hsv_s": 0.7,
            "hsv_v": 0.4,
            "flip_lr": 0.5,
            "scale": 0.5
        }
    }
)

Text Classification

job = client.create_job(
    dataset_id=dataset.id,
    version_id=version.id,
    name="Sentiment Classifier v1",
    job_type="finetune",
    config={
        # Model
        "base_model": "distilbert-base-uncased",
        "max_length": 256,

        # Training
        "epochs": 5,
        "batch_size": 16,
        "learning_rate": 2e-5,
        "lr_scheduler": "linear",
        "warmup_ratio": 0.1,

        # Regularization
        "dropout": 0.1,
        "weight_decay": 0.01
    }
)

Configuration Tips

  1. Start with defaults - Use recommended values, then tune
  2. One change at a time - Adjust one hyperparameter per experiment
  3. Monitor validation loss - It tells you if you’re overfitting
  4. Use early stopping - Don’t overtrain
  5. Log everything - You’ll want to compare runs later
  6. Save checkpoints - Keep models from different epochs

Hyperparameter Search (Advanced)

For systematic hyperparameter optimization:

# Define search space
search_config = {
    "search_space": {
        "learning_rate": {"type": "log_uniform", "min": 1e-5, "max": 1e-2},
        "batch_size": {"type": "choice", "values": [16, 32, 64]},
        "dropout": {"type": "uniform", "min": 0.1, "max": 0.5},
        "freeze_layers": {"type": "choice", "values": ["early", "most", "none"]}
    },
    "num_trials": 20,
    "optimization_metric": "val_accuracy",
    "optimization_direction": "maximize"
}

# Run hyperparameter search
search_job = client.create_hyperparameter_search(
    dataset_id=dataset.id,
    version_id=version.id,
    name="HP Search - Product Classifier",
    base_config={
        "base_model": "efficientnet_b0",
        "epochs": 20,
        "early_stopping": True
    },
    search_config=search_config
)

Next Step

With your configuration ready, proceed to Train & Monitor to start training and track progress.