Configure Training

Set up hyperparameters and training configuration for optimal finetuning results.

Key Hyperparameters

Parameter	What It Controls	Typical Range
Learning rate	How fast the model learns	0.0001 - 0.01
Batch size	Samples processed together	8 - 128
Epochs	Passes through the dataset	5 - 50
Image size	Input resolution	224 - 512
Freeze layers	Which layers to keep fixed	none / early / most

Learning Rate

The most important hyperparameter. Too high = unstable training. Too low = slow convergence.

Recommended Starting Points

Scenario	Learning Rate
Finetuning (freeze early layers)	0.001
Finetuning (all layers)	0.0001
Training from scratch	0.01
Large model (ViT, BERT-large)	0.00001 - 0.0001
Small dataset (<500 samples)	0.0001 (lower to avoid overfitting)

Learning Rate Schedulers

## Constant learning rate (simple)
config = {
    "learning_rate": 0.001,
    "lr_scheduler": "constant"
}

# Step decay (reduce every N epochs)
config = {
    "learning_rate": 0.001,
    "lr_scheduler": "step",
    "lr_step_size": 10,      # Reduce every 10 epochs
    "lr_gamma": 0.1          # Multiply by 0.1
}

# Cosine annealing (smooth decay)
config = {
    "learning_rate": 0.001,
    "lr_scheduler": "cosine",
    "lr_min": 0.00001        # Minimum learning rate
}

# One-cycle (warmup → max → decay)
config = {
    "learning_rate": 0.001,
    "lr_scheduler": "one_cycle",
    "lr_max": 0.01           # Peak learning rate
}

graph LR
    subgraph "Learning Rate Schedules"
        A[Constant] --> B[───────────]
        C[Step] --> D[───┐__┐__]
        E[Cosine] --> F[╲________]
        G[One-Cycle] --> H[╱╲______]
    end

Batch Size

Larger batches = more stable gradients, faster training. But limited by GPU memory.

GPU Memory	Recommended Batch Size
4 GB	8-16
8 GB	16-32
16 GB	32-64
24 GB+	64-128

# Adjust batch size to fit GPU memory
config = {
    "batch_size": 32,
    "gradient_accumulation": 2  # Effective batch size = 32 * 2 = 64
}

ℹ️

If you get out-of-memory errors, reduce batch size or use gradient accumulation.

Epochs

How many times to iterate through the training data.

Dataset Size	Suggested Epochs
Small (<500)	30-50 (with early stopping)
Medium (500-5000)	15-30
Large (5000+)	5-15

# With early stopping to prevent overfitting
config = {
    "epochs": 50,
    "early_stopping": True,
    "early_stopping_patience": 5,    # Stop if no improvement for 5 epochs
    "early_stopping_metric": "val_loss"
}

Layer Freezing

Control which layers get updated during finetuning.

graph LR
    subgraph "Model Layers"
        A[Input] --> B[Early Layers
General Features]
        B --> C[Middle Layers
Abstract Features]
        C --> D[Late Layers
Task-Specific]
        D --> E[Output]
    end

    style B fill:#fee2e2
    style C fill:#fef3c7
    style D fill:#d1fae5

Strategy	What’s Frozen	When to Use
freeze=none	Nothing	Large dataset, different domain
freeze=early	First ~50% of layers	Default for finetuning
freeze=most	First ~80% of layers	Small dataset, similar domain
freeze=all_but_head	Everything except final layer	Very small dataset

# Freeze early layers (recommended default)
config = {
    "base_model": "efficientnet_b0",
    "freeze_layers": "early",
    "learning_rate": 0.001
}

# Progressive unfreezing (advanced)
config = {
    "base_model": "efficientnet_b0",
    "freeze_layers": "progressive",
    "unfreeze_epochs": [5, 10, 15]  # Unfreeze more layers at these epochs
}

Data Augmentation

Artificially expand your dataset by transforming images.

Image Augmentation

config = {
    "augmentation": {
        # Geometric transforms
        "horizontal_flip": True,      # Flip left-right
        "vertical_flip": False,       # Flip up-down (if makes sense)
        "rotation": 15,               # Random rotation ±15 degrees
        "scale": [0.8, 1.2],         # Random scaling 80-120%
        "translate": 0.1,             # Random translation ±10%

        # Color transforms
        "brightness": 0.2,            # Random brightness ±20%
        "contrast": 0.2,              # Random contrast ±20%
        "saturation": 0.2,            # Random saturation ±20%
        "hue": 0.1,                   # Random hue shift ±10%

        # Advanced
        "cutout": 0.5,                # Random rectangular cutout
        "mixup": 0.2,                 # Mix two images
        "auto_augment": "imagenet"    # Learned augmentation policy
    }
}

When to Use Which Augmentation

Augmentation	Good For	Avoid When
Horizontal flip	Most images	Text, asymmetric objects
Vertical flip	Aerial images, microscopy	Faces, scenes with gravity
Rotation	Objects at various angles	Documents, fixed orientation
Color jitter	Varying lighting conditions	Color is diagnostic
Cutout	Robustness to occlusion	Small objects

Text Augmentation

config = {
    "augmentation": {
        "synonym_replacement": 0.1,   # Replace 10% of words with synonyms
        "random_insertion": 0.1,      # Insert random synonyms
        "random_swap": 0.1,           # Swap adjacent words
        "random_deletion": 0.1,       # Delete random words
        "back_translation": True      # Translate to another language and back
    }
}

Regularization

Prevent overfitting, especially important for small datasets.

config = {
    # Dropout
    "dropout": 0.3,                  # Randomly drop 30% of neurons

    # Weight decay (L2 regularization)
    "weight_decay": 0.01,

    # Label smoothing
    "label_smoothing": 0.1,          # Soften labels (0.9 instead of 1.0)

    # Early stopping
    "early_stopping": True,
    "early_stopping_patience": 5
}

Complete Configuration Examples

Image Classification (Balanced)

job = client.create_job(
    dataset_id=dataset.id,
    version_id=version.id,
    name="Product Classifier v1",
    job_type="finetune",
    config={
        # Model
        "base_model": "efficientnet_b0",
        "image_size": 224,
        "freeze_layers": "early",

        # Training
        "epochs": 30,
        "batch_size": 32,
        "learning_rate": 0.001,
        "lr_scheduler": "cosine",

        # Regularization
        "dropout": 0.2,
        "weight_decay": 0.01,
        "label_smoothing": 0.1,
        "early_stopping": True,
        "early_stopping_patience": 5,

        # Augmentation
        "augmentation": {
            "horizontal_flip": True,
            "rotation": 15,
            "brightness": 0.2,
            "contrast": 0.2
        }
    }
)

Object Detection

job = client.create_job(
    dataset_id=dataset.id,
    version_id=version.id,
    name="Defect Detector v1",
    job_type="finetune",
    config={
        # Model
        "base_model": "yolov5m",
        "image_size": 640,

        # Training
        "epochs": 100,
        "batch_size": 16,
        "learning_rate": 0.01,
        "lr_scheduler": "cosine",

        # Detection-specific
        "iou_threshold": 0.5,
        "confidence_threshold": 0.25,
        "nms_threshold": 0.45,

        # Augmentation
        "augmentation": {
            "mosaic": True,
            "mixup": 0.1,
            "hsv_h": 0.015,
            "hsv_s": 0.7,
            "hsv_v": 0.4,
            "flip_lr": 0.5,
            "scale": 0.5
        }
    }
)

Text Classification

job = client.create_job(
    dataset_id=dataset.id,
    version_id=version.id,
    name="Sentiment Classifier v1",
    job_type="finetune",
    config={
        # Model
        "base_model": "distilbert-base-uncased",
        "max_length": 256,

        # Training
        "epochs": 5,
        "batch_size": 16,
        "learning_rate": 2e-5,
        "lr_scheduler": "linear",
        "warmup_ratio": 0.1,

        # Regularization
        "dropout": 0.1,
        "weight_decay": 0.01
    }
)

Configuration Tips

Start with defaults - Use recommended values, then tune
One change at a time - Adjust one hyperparameter per experiment
Monitor validation loss - It tells you if you’re overfitting
Use early stopping - Don’t overtrain
Log everything - You’ll want to compare runs later
Save checkpoints - Keep models from different epochs

Hyperparameter Search (Advanced)

For systematic hyperparameter optimization:

# Define search space
search_config = {
    "search_space": {
        "learning_rate": {"type": "log_uniform", "min": 1e-5, "max": 1e-2},
        "batch_size": {"type": "choice", "values": [16, 32, 64]},
        "dropout": {"type": "uniform", "min": 0.1, "max": 0.5},
        "freeze_layers": {"type": "choice", "values": ["early", "most", "none"]}
    },
    "num_trials": 20,
    "optimization_metric": "val_accuracy",
    "optimization_direction": "maximize"
}

# Run hyperparameter search
search_job = client.create_hyperparameter_search(
    dataset_id=dataset.id,
    version_id=version.id,
    name="HP Search - Product Classifier",
    base_config={
        "base_model": "efficientnet_b0",
        "epochs": 20,
        "early_stopping": True
    },
    search_config=search_config
)

Next Step

With your configuration ready, proceed to Train & Monitor to start training and track progress.

Choose Base Model Train & Monitor