Fine-tune LLMs, Vision, and Audio models on your Mac
SFT, DPO, GRPO, Vision, TTS, STT, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.
Fine-tune Locally
Train LLMs, vision, and audio models on M1–M5 Macs natively with Apple’s MLX framework. No cloud GPU required.
Unified Memory
Access up to 512GB unified RAM on Mac Studio. Load larger models than discrete GPU VRAM allows.
Unsloth-Compatible API
Your existing Unsloth training scripts run on Apple Silicon. Change the import, keep everything else.
Export Anywhere
Save to HuggingFace format, GGUF for Ollama and llama.cpp, or merged weights for deployment.
Choose your track
Each modality has its own guide with Quick Start, API reference, examples, and tips.
LLM Fine-Tuning
SFT, DPO, GRPO, KTO, SimPO. Chat templates, dataset utilities, GGUF export.
Vision & Audio Fine-Tuning
Gemma 4 (vision + audio), Qwen3.5 Vision. Image+text and audio STT/ASR training. LoRA on vision, audio, and language layers.
Audio Fine-Tuning
TTS (Orpheus, OuteTTS, Spark, Sesame, Qwen3-TTS) and STT (Whisper, Qwen3-ASR, Canary, Voxtral, Voxtral Realtime, Moonshine, Parakeet TDT). First native audio LoRA on Apple Silicon.
Embedding Fine-Tuning
Sentence embeddings with contrastive learning (InfoNCE). BERT, ModernBERT, Qwen3-Embedding, Harrier. Semantic search on Apple Silicon.
OCR Fine-Tuning
Document OCR with DeepSeek-OCR, VLM-to-OCR with Qwen3.5, handwriting recognition, GRPO with CER reward, and multilingual receipts.
MoE Fine-Tuning
Mixture of Experts training with per-expert LoRA. Qwen3.5-35B-A3B, Phi-3.5-MoE, Mixtral, DeepSeek — 39+ architectures, same API.
Continual Pretraining
Adapt models to new domains, languages, or capabilities. Raw text training with decoupled embedding LR. LFM2, SmolLM2, any model.
Unsloth Migration
Translate your existing Unsloth scripts. Side-by-side comparisons and config mapping.
One import change. That’s it.
Existing Unsloth training scripts work on Apple Silicon with minimal changes.
from unsloth import FastLanguageModel
from trl import SFTTrainer, SFTConfig
from mlx_tune import FastLanguageModel
from mlx_tune import SFTTrainer, SFTConfig
Rest of your code stays exactly the same.
Up and running in minutes
A complete fine-tuning pipeline in under 20 lines.
from mlx_tune import FastLanguageModel, SFTTrainer, SFTConfig
from datasets import load_dataset
# Load any HuggingFace model (1B model for quick start)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="mlx-community/Llama-3.2-1B-Instruct-4bit",
max_seq_length=2048,
load_in_4bit=True,
)
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_alpha=16,
)
# Load a dataset
dataset = load_dataset("yahma/alpaca-cleaned", split="train[:100]")
# Train with SFTTrainer (same API as TRL!)
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
tokenizer=tokenizer,
args=SFTConfig(
output_dir="outputs",
per_device_train_batch_size=2,
learning_rate=2e-4,
max_steps=50,
),
)
trainer.train()
# Save (same API as Unsloth!)
model.save_pretrained("lora_model") # Adapters only
model.save_pretrained_merged("merged", tokenizer) # Full model
Get mlx-tune
Using uv (recommended)
uv pip install mlx-tune
Using pip
pip install mlx-tune
With audio support (TTS/STT)
uv pip install 'mlx-tune[audio]'
From source (development)
git clone https://github.com/ARahim3/mlx-tune.git
cd mlx-tune
uv pip install -e .
Requirements
| Hardware | Apple Silicon Mac (M1 / M2 / M3 / M4 / M5) |
| OS | macOS 13.0+ |
| Memory | 8 GB+ unified RAM (16 GB+ recommended) |
| Python | 3.9+ |
Supported trainers
All trainers use native MLX — no subprocess calls or CUDA wrappers.
| Method | Trainer | Use Case |
|---|---|---|
| SFT | SFTTrainer |
Instruction fine-tuning |
| DPO | DPOTrainer |
Preference learning |
| ORPO | ORPOTrainer |
Combined SFT + odds ratio preference |
| GRPO | GRPOTrainer |
Reasoning with multi-generation (DeepSeek R1 style) |
| KTO | KTOTrainer + KTOConfig |
Binary feedback (Kahneman-Tversky optimization) |
| SimPO | SimPOTrainer + SimPOConfig |
No reference model (length-normalized log probs) |
| VLM SFT | VLMSFTTrainer |
Vision-Language model fine-tuning |
| Vision GRPO | VLMGRPOTrainer + VLMGRPOConfig |
Vision-Language GRPO reasoning training |
| TTS SFT | TTSSFTTrainer |
Text-to-Speech fine-tuning (Orpheus) |
| STT SFT | STTSFTTrainer |
Speech-to-Text fine-tuning (Whisper, Qwen3-ASR, Canary, Voxtral, Voxtral Realtime, Moonshine, Parakeet TDT) |
| Embedding | EmbeddingSFTTrainer |
Sentence embedding fine-tuning (BERT, ModernBERT, Qwen3-Embedding, Harrier — InfoNCE/contrastive) |
| OCR SFT | OCRSFTTrainer |
Document OCR fine-tuning (DeepSeek-OCR, Qwen3.5 VLM-to-OCR, GLM-OCR — LaTeX, handwriting, multilingual) |
| OCR GRPO | OCRGRPOTrainer |
OCR reasoning training with CER/edit-distance reward functions |
| MoE | SFTTrainer |
Mixture of Experts fine-tuning (Qwen3.5-MoE, Phi-3.5-MoE, Mixtral, DeepSeek — 39+ architectures) |
| LFM2 | SFTTrainer |
Liquid Foundation Models (LFM2/LFM2.5 from Liquid AI — hybrid gated-conv + GQA architecture) |
| CPT | CPTTrainer + CPTConfig |
Continual pretraining on raw text (loss on all tokens, optional embed/lm_head training, decoupled LR) |
mlx-tune vs Unsloth
| Feature | Unsloth (CUDA) | mlx-tune |
|---|---|---|
| Platform | NVIDIA GPUs | Apple Silicon |
| Backend | Triton Kernels | MLX Framework |
| Memory | VRAM (limited) | Unified (up to 512 GB) |
| API | Original | 100% Compatible |
| Best For | Production training | Local dev & large models |
mlx-tune is not a replacement for Unsloth. It’s a bridge: prototype on your Mac, then deploy to CUDA with the original Unsloth for production training.