Choose Base Model

Choose Base Model

Selecting the right pre-trained model is critical for finetuning success. The base model provides the starting point—choose one that’s appropriate for your task, data, and deployment requirements.

Model Selection Criteria

Consider these factors when choosing a base model:

FactorQuestion to Ask
Task matchWas it trained on similar data?
SizeCan it run on your deployment target?
AccuracyDoes it perform well on standard benchmarks?
SpeedIs inference fast enough for your use case?
LicenseCan you use it commercially?

Image Classification Models

Model Comparison

ModelParametersSizeAccuracy (ImageNet)Inference SpeedBest For
MobileNet v23.4M14 MB72%⚡⚡⚡⚡⚡Mobile, edge, real-time
EfficientNet B05.3M21 MB77%⚡⚡⚡⚡Balanced accuracy/speed
EfficientNet B29.2M36 MB80%⚡⚡⚡Better accuracy, still efficient
ResNet-1811.7M45 MB70%⚡⚡⚡⚡Simple, well-understood
ResNet-5025.6M98 MB76%⚡⚡⚡Good balance
EfficientNet B419.3M75 MB83%⚡⚡High accuracy
ViT-Base86.6M330 MB85%Maximum accuracy
ConvNeXt-Tiny28.6M110 MB82%⚡⚡Modern architecture

Recommendations

graph TD
    A{Deployment Target?} --> B[Mobile / Edge]
    A --> C[Cloud Server]
    A --> D[On-Premise GPU]

    B --> E[MobileNet v2]
    B --> F[EfficientNet B0]

    C --> G{Priority?}
    G --> H[Speed: EfficientNet B0-B2]
    G --> I[Accuracy: EfficientNet B4+]

    D --> J{GPU Memory?}
    J --> K[<8GB: EfficientNet B4]
    J --> L[8GB+: ViT-Base]

Example: Choosing for Manufacturing Defect Detection

## For real-time inspection on edge device
config = {
    "base_model": "mobilenet_v2",
    "image_size": 224
}

# For high-accuracy cloud-based analysis
config = {
    "base_model": "efficientnet_b4",
    "image_size": 380
}

# For maximum accuracy (research/offline)
config = {
    "base_model": "vit_base_patch16_224",
    "image_size": 224
}

Object Detection Models

ModelParametersSpeed (GPU)mAP (COCO)Best For
YOLOv4-tiny6M3 ms~40%Real-time, edge
YOLOv464M12 ms~65%Balanced
YOLOv5s7M4 ms~56%Fast, modern
YOLOv5m21M8 ms~64%Balanced, modern
YOLOv5l47M15 ms~68%Higher accuracy
Faster R-CNN41M50 ms~67%Two-stage, more accurate
# Real-time detection on camera feed
config = {
    "base_model": "yolov4_tiny",
    "image_size": 416
}

# Production detection with good accuracy
config = {
    "base_model": "yolov5m",
    "image_size": 640
}

Text Classification Models

ModelParametersSizeSpeedBest For
DistilBERT66M250 MB⚡⚡⚡⚡Fast inference, good accuracy
BERT-base110M420 MB⚡⚡⚡Standard choice
RoBERTa-base125M480 MB⚡⚡⚡Better pre-training
BERT-large340M1.3 GB⚡⚡Higher accuracy
DeBERTa-base139M530 MB⚡⚡State-of-the-art
# Production API with latency requirements
config = {
    "base_model": "distilbert-base-uncased",
    "max_length": 256
}

# Best accuracy for offline processing
config = {
    "base_model": "deberta-base",
    "max_length": 512
}

NER Models

ModelLanguagesSpeedBest For
spaCy smPer-language⚡⚡⚡⚡⚡Production, speed
spaCy lgPer-language⚡⚡⚡Better accuracy
BERT-NERMultilingual⚡⚡Custom entities
FlairMultilingualResearch, complex NER
# Fast entity extraction
config = {
    "base_model": "en_core_web_sm",  # spaCy small
}

# Custom entities with BERT
config = {
    "base_model": "bert-base-cased",
    "ner_architecture": "token_classification"
}

Checking Available Models

List models available for finetuning:

Domain-Specific Base Models

For specialized domains, look for models pre-trained on similar data:

DomainBase ModelsWhy Better
Medical imagingMedCLIP, BiomedCLIPPre-trained on medical images
Satellite imagerySatMAE, SSL4EOUnderstands aerial perspective
DocumentsLayoutLM, DiTUnderstands document structure
Scientific textSciBERT, PubMedBERTScientific vocabulary
Legal textLegalBERTLegal terminology
CodeCodeBERT, GraphCodeBERTProgramming languages
# Medical image classification
config = {
    "base_model": "medclip_vit_base",
    "image_size": 224
}

# Document understanding
config = {
    "base_model": "layoutlm_base",
    "image_size": 224
}

Trade-off Decision Guide

Speed vs Accuracy

Accuracy
    ↑
    │     ViT-Large
    │       ○
    │   EfficientNet-B4    ViT-Base
    │       ○                ○
    │  EfficientNet-B2
    │       ○
    │  EfficientNet-B0   ResNet-50
    │       ○               ○
    │  MobileNet-v2
    │       ○
    └──────────────────────────→ Speed

Decision Matrix

PriorityRecommendation
Maximum accuracyViT-Base, EfficientNet-B4
BalancedEfficientNet-B0/B2, ResNet-50
Fast inferenceMobileNet v2, DistilBERT
Mobile deploymentMobileNet v2, YOLOv4-tiny
Limited GPU memorySmaller models (B0, DistilBERT)
Large dataset (10k+)Larger models can utilize more data
Small dataset (<500)Smaller models less likely to overfit

Best Practices

  1. Start with EfficientNet-B0 for images, DistilBERT for text—good defaults
  2. Match input size - Use the size the model was pre-trained on (e.g., 224, 384)
  3. Consider deployment - Don’t train a large model you can’t deploy
  4. Domain models help - If available for your domain, use them
  5. Benchmark first - Test base model on a few examples before committing to training
  6. Larger isn’t always better - Small datasets benefit from smaller models

Next Step

With your base model selected, proceed to Configure Training to set up hyperparameters.