Forge Training Pipeline
Forge takes the preference pairs produced by Refine and trains a domain-specific model. The result is an AI that knows your company's knowledge -- not generic internet data.
How It Works
- Import -- Resolved findings from Refine become SFT/DPO training pairs
- Train -- SFT injects domain knowledge, DPO refines preferences
- Eval -- Before/after comparison prevents regression
- Deploy -- Push the trained model to a registry
CLI Commands
# Import resolved data from Refine
forge dataset import
# 2-stage SFT→DPO pipeline (default, recommended)
forge train from-refine --from-refine ./export.jsonl --base qwen2.5:7b --method auto
# Single-method training
forge train from-refine --from-refine ./export.jsonl --base qwen2.5:7b --method sft
forge train from-refine --from-refine ./export.jsonl --base qwen2.5:7b --method dpo
# Immediate local training
forge train from-refine --from-refine ./export.jsonl --base qwen2.5:7b --method auto --run
# Remote training on Modal cloud GPUs
forge train start --dataset <id> --base qwen2.5:7b --method sft --remote
# Evaluate the trained model
forge eval
# Deploy to registry
forge deploy
Training Methods
--method auto (default)
Runs a 2-stage pipeline:
- SFT (Supervised Fine-Tuning) -- Injects domain knowledge into the base model
- DPO (Direct Preference Optimization) -- Refines the model's preference alignment
This is the recommended approach. SFT alone teaches facts; DPO alone shifts preferences but does not inject knowledge.
--method sft
SFT only. Best for initial domain knowledge injection when you have instruction-response pairs.
--method dpo
DPO only. Best for preference refinement on a model that already has domain knowledge.
Remote Training (Modal)
Forge can dispatch training to Modal cloud GPUs:
forge train start --dataset <id> --base qwen2.5:7b --method sft --remote
Requirements:
MODAL_TOKEN_IDandMODAL_TOKEN_SECRETenv vars- Get credentials via
modal token new
GPU auto-selection: T4 for models under 7B parameters, A100 for 7B and larger.
Training Results
Real training proof (Qwen2.5-3B, 125 pairs, 53 seconds on Modal T4 GPU):
| Metric | Base model | Fine-tuned |
|---|---|---|
| Domain questions | 0/5 correct | 3/5 correct |
The fine-tuned model correctly identifies company tech stack, tools, and domain facts that the base model has no knowledge of.
Architecture
packages/
├── feedback/ # Signal collector, reputation, reward derivation
└── training/ # Dataset registry, model registry, orchestrator
apps/
└── forge/ # CLI: dataset, train, eval, deploy
scripts/
├── train_sft.py
├── train_dpo.py
├── train_grpo.py
└── convert_to_gguf.sh