Choose Base Model

Selecting the right pre-trained model is critical for finetuning success. The base model provides the starting point—choose one that’s appropriate for your task, data, and deployment requirements.

Model Selection Criteria

Consider these factors when choosing a base model:

Factor	Question to Ask
Task match	Was it trained on similar data?
Size	Can it run on your deployment target?
Accuracy	Does it perform well on standard benchmarks?
Speed	Is inference fast enough for your use case?
License	Can you use it commercially?

Image Classification Models

Model Comparison

Model	Parameters	Size	Accuracy (ImageNet)	Inference Speed	Best For
MobileNet v2	3.4M	14 MB	72%	⚡⚡⚡⚡⚡	Mobile, edge, real-time
EfficientNet B0	5.3M	21 MB	77%	⚡⚡⚡⚡	Balanced accuracy/speed
EfficientNet B2	9.2M	36 MB	80%	⚡⚡⚡	Better accuracy, still efficient
ResNet-18	11.7M	45 MB	70%	⚡⚡⚡⚡	Simple, well-understood
ResNet-50	25.6M	98 MB	76%	⚡⚡⚡	Good balance
EfficientNet B4	19.3M	75 MB	83%	⚡⚡	High accuracy
ViT-Base	86.6M	330 MB	85%	⚡	Maximum accuracy
ConvNeXt-Tiny	28.6M	110 MB	82%	⚡⚡	Modern architecture

Recommendations

graph TD
    A{Deployment Target?} --> B[Mobile / Edge]
    A --> C[Cloud Server]
    A --> D[On-Premise GPU]

    B --> E[MobileNet v2]
    B --> F[EfficientNet B0]

    C --> G{Priority?}
    G --> H[Speed: EfficientNet B0-B2]
    G --> I[Accuracy: EfficientNet B4+]

    D --> J{GPU Memory?}
    J --> K[<8GB: EfficientNet B4]
    J --> L[8GB+: ViT-Base]

Example: Choosing for Manufacturing Defect Detection

## For real-time inspection on edge device
config = {
    "base_model": "mobilenet_v2",
    "image_size": 224
}

# For high-accuracy cloud-based analysis
config = {
    "base_model": "efficientnet_b4",
    "image_size": 380
}

# For maximum accuracy (research/offline)
config = {
    "base_model": "vit_base_patch16_224",
    "image_size": 224
}

Object Detection Models

Model	Parameters	Speed (GPU)	mAP (COCO)	Best For
YOLOv4-tiny	6M	3 ms	~40%	Real-time, edge
YOLOv4	64M	12 ms	~65%	Balanced
YOLOv5s	7M	4 ms	~56%	Fast, modern
YOLOv5m	21M	8 ms	~64%	Balanced, modern
YOLOv5l	47M	15 ms	~68%	Higher accuracy
Faster R-CNN	41M	50 ms	~67%	Two-stage, more accurate

# Real-time detection on camera feed
config = {
    "base_model": "yolov4_tiny",
    "image_size": 416
}

# Production detection with good accuracy
config = {
    "base_model": "yolov5m",
    "image_size": 640
}

Text Classification Models

Model	Parameters	Size	Speed	Best For
DistilBERT	66M	250 MB	⚡⚡⚡⚡	Fast inference, good accuracy
BERT-base	110M	420 MB	⚡⚡⚡	Standard choice
RoBERTa-base	125M	480 MB	⚡⚡⚡	Better pre-training
BERT-large	340M	1.3 GB	⚡⚡	Higher accuracy
DeBERTa-base	139M	530 MB	⚡⚡	State-of-the-art

# Production API with latency requirements
config = {
    "base_model": "distilbert-base-uncased",
    "max_length": 256
}

# Best accuracy for offline processing
config = {
    "base_model": "deberta-base",
    "max_length": 512
}

NER Models

Model	Languages	Speed	Best For
spaCy sm	Per-language	⚡⚡⚡⚡⚡	Production, speed
spaCy lg	Per-language	⚡⚡⚡	Better accuracy
BERT-NER	Multilingual	⚡⚡	Custom entities
Flair	Multilingual	⚡	Research, complex NER

# Fast entity extraction
config = {
    "base_model": "en_core_web_sm",  # spaCy small
}

# Custom entities with BERT
config = {
    "base_model": "bert-base-cased",
    "ner_architecture": "token_classification"
}

Checking Available Models

List models available for finetuning:

# List available base models for your task type
base_models = client.list_base_models(
    task_type="image_classification"  # or: object_detection, text_classification, ner
)

for model in base_models:
    print(f"{model.name}")
    print(f"  Parameters: {model.parameters / 1e6:.1f}M")
    print(f"  Size: {model.size_mb:.0f} MB")
    print(f"  Benchmark: {model.benchmark_accuracy:.1%}")
    print()

Domain-Specific Base Models

For specialized domains, look for models pre-trained on similar data:

Domain	Base Models	Why Better
Medical imaging	MedCLIP, BiomedCLIP	Pre-trained on medical images
Satellite imagery	SatMAE, SSL4EO	Understands aerial perspective
Documents	LayoutLM, DiT	Understands document structure
Scientific text	SciBERT, PubMedBERT	Scientific vocabulary
Legal text	LegalBERT	Legal terminology
Code	CodeBERT, GraphCodeBERT	Programming languages

# Medical image classification
config = {
    "base_model": "medclip_vit_base",
    "image_size": 224
}

# Document understanding
config = {
    "base_model": "layoutlm_base",
    "image_size": 224
}

Trade-off Decision Guide

Speed vs Accuracy

Accuracy
    ↑
    │     ViT-Large
    │       ○
    │   EfficientNet-B4    ViT-Base
    │       ○                ○
    │  EfficientNet-B2
    │       ○
    │  EfficientNet-B0   ResNet-50
    │       ○               ○
    │  MobileNet-v2
    │       ○
    └──────────────────────────→ Speed

Decision Matrix

Priority	Recommendation
Maximum accuracy	ViT-Base, EfficientNet-B4
Balanced	EfficientNet-B0/B2, ResNet-50
Fast inference	MobileNet v2, DistilBERT
Mobile deployment	MobileNet v2, YOLOv4-tiny
Limited GPU memory	Smaller models (B0, DistilBERT)
Large dataset (10k+)	Larger models can utilize more data
Small dataset (<500)	Smaller models less likely to overfit

Best Practices

Start with EfficientNet-B0 for images, DistilBERT for text—good defaults
Match input size - Use the size the model was pre-trained on (e.g., 224, 384)
Consider deployment - Don’t train a large model you can’t deploy
Domain models help - If available for your domain, use them
Benchmark first - Test base model on a few examples before committing to training
Larger isn’t always better - Small datasets benefit from smaller models

Next Step

With your base model selected, proceed to Configure Training to set up hyperparameters.

Prepare Data for Finetuning Configure Training