Skip to main content

Forge Training Pipeline

Forge takes the preference pairs produced by Refine and trains a domain-specific model. The result is an AI that knows your company's knowledge -- not generic internet data.

How It Works

  1. Import -- Resolved findings from Refine become SFT/DPO training pairs
  2. Train -- SFT injects domain knowledge, DPO refines preferences
  3. Eval -- Before/after comparison prevents regression
  4. Deploy -- Push the trained model to a registry

CLI Commands

# Import resolved data from Refine
forge dataset import

# 2-stage SFT→DPO pipeline (default, recommended)
forge train from-refine --from-refine ./export.jsonl --base qwen2.5:7b --method auto

# Single-method training
forge train from-refine --from-refine ./export.jsonl --base qwen2.5:7b --method sft
forge train from-refine --from-refine ./export.jsonl --base qwen2.5:7b --method dpo

# Immediate local training
forge train from-refine --from-refine ./export.jsonl --base qwen2.5:7b --method auto --run

# Remote training on Modal cloud GPUs
forge train start --dataset <id> --base qwen2.5:7b --method sft --remote

# Evaluate the trained model
forge eval

# Deploy to registry
forge deploy

Training Methods

--method auto (default)

Runs a 2-stage pipeline:

  1. SFT (Supervised Fine-Tuning) -- Injects domain knowledge into the base model
  2. DPO (Direct Preference Optimization) -- Refines the model's preference alignment

This is the recommended approach. SFT alone teaches facts; DPO alone shifts preferences but does not inject knowledge.

--method sft

SFT only. Best for initial domain knowledge injection when you have instruction-response pairs.

--method dpo

DPO only. Best for preference refinement on a model that already has domain knowledge.

Remote Training (Modal)

Forge can dispatch training to Modal cloud GPUs:

forge train start --dataset <id> --base qwen2.5:7b --method sft --remote

Requirements:

  • MODAL_TOKEN_ID and MODAL_TOKEN_SECRET env vars
  • Get credentials via modal token new

GPU auto-selection: T4 for models under 7B parameters, A100 for 7B and larger.

Training Results

Real training proof (Qwen2.5-3B, 125 pairs, 53 seconds on Modal T4 GPU):

MetricBase modelFine-tuned
Domain questions0/5 correct3/5 correct

The fine-tuned model correctly identifies company tech stack, tools, and domain facts that the base model has no knowledge of.

Architecture

packages/
├── feedback/ # Signal collector, reputation, reward derivation
└── training/ # Dataset registry, model registry, orchestrator

apps/
└── forge/ # CLI: dataset, train, eval, deploy

scripts/
├── train_sft.py
├── train_dpo.py
├── train_grpo.py
└── convert_to_gguf.sh