v0.4.25

Fine-tune LLMs, Vision, and Audio models on your Mac

SFT, DPO, GRPO, Vision, TTS, STT, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

$ pip install mlx-tune

Get Started View on GitHub

⚙

Fine-tune Locally

Train LLMs, vision, and audio models on M1–M5 Macs natively with Apple’s MLX framework. No cloud GPU required.

▒

Unified Memory

Access up to 512GB unified RAM on Mac Studio. Load larger models than discrete GPU VRAM allows.

⇄

Unsloth-Compatible API

Your existing Unsloth training scripts run on Apple Silicon. Change the import, keep everything else.

✉

Export Anywhere

Save to HuggingFace format, GGUF for Ollama and llama.cpp, or merged weights for deployment.

Documentation

Choose your track

Each modality has its own guide with Quick Start, API reference, examples, and tips.

💬

LLM Fine-Tuning

SFT, DPO, GRPO, KTO, SimPO. Chat templates, dataset utilities, GGUF export.

📸

Vision & Audio Fine-Tuning

Gemma 4 (vision + audio), Qwen3.5 Vision. Image+text and audio STT/ASR training. LoRA on vision, audio, and language layers.

🎧

Audio Fine-Tuning

TTS (Orpheus, OuteTTS, Spark, Sesame, Qwen3-TTS) and STT (Whisper, Qwen3-ASR, Canary, Voxtral, Voxtral Realtime, Moonshine, Parakeet TDT). First native audio LoRA on Apple Silicon.

🔍

Embedding Fine-Tuning

Sentence embeddings with contrastive learning (InfoNCE). BERT, ModernBERT, Qwen3-Embedding, Harrier. Semantic search on Apple Silicon.

📄

OCR Fine-Tuning

Document OCR with DeepSeek-OCR, VLM-to-OCR with Qwen3.5, handwriting recognition, GRPO with CER reward, and multilingual receipts.

⚙

MoE Fine-Tuning

Mixture of Experts training with per-expert LoRA. Qwen3.5-35B-A3B, Phi-3.5-MoE, Mixtral, DeepSeek — 39+ architectures, same API.

📚

Continual Pretraining

Adapt models to new domains, languages, or capabilities. Raw text training with decoupled embedding LR. LFM2, SmolLM2, any model.

⇄

Unsloth Migration

Translate your existing Unsloth scripts. Side-by-side comparisons and config mapping.

The Idea

One import change. That’s it.

Existing Unsloth training scripts work on Apple Silicon with minimal changes.

Unsloth (CUDA)

from unsloth import FastLanguageModel
from trl import SFTTrainer, SFTConfig

mlx-tune (Apple Silicon)

from mlx_tune import FastLanguageModel
from mlx_tune import SFTTrainer, SFTConfig

Rest of your code stays exactly the same.

Quick Start

Up and running in minutes

A complete fine-tuning pipeline in under 20 lines.

from mlx_tune import FastLanguageModel, SFTTrainer, SFTConfig
from datasets import load_dataset

# Load any HuggingFace model (1B model for quick start)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="mlx-community/Llama-3.2-1B-Instruct-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha=16,
)

# Load a dataset
dataset = load_dataset("yahma/alpaca-cleaned", split="train[:100]")

# Train with SFTTrainer (same API as TRL!)
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    tokenizer=tokenizer,
    args=SFTConfig(
        output_dir="outputs",
        per_device_train_batch_size=2,
        learning_rate=2e-4,
        max_steps=50,
    ),
)
trainer.train()

# Save (same API as Unsloth!)
model.save_pretrained("lora_model")           # Adapters only
model.save_pretrained_merged("merged", tokenizer)  # Full model

Installation

Get mlx-tune

Using uv (recommended)

uv pip install mlx-tune

Using pip

pip install mlx-tune

With audio support (TTS/STT)

uv pip install 'mlx-tune[audio]'

From source (development)

git clone https://github.com/ARahim3/mlx-tune.git
cd mlx-tune
uv pip install -e .

Requirements

Hardware	Apple Silicon Mac (M1 / M2 / M3 / M4 / M5)
OS	macOS 13.0+
Memory	8 GB+ unified RAM (16 GB+ recommended)
Python	3.9+

Training Methods

Supported trainers

All trainers use native MLX — no subprocess calls or CUDA wrappers.

Method	Trainer	Use Case
SFT	`SFTTrainer`	Instruction fine-tuning
DPO	`DPOTrainer`	Preference learning
ORPO	`ORPOTrainer`	Combined SFT + odds ratio preference
GRPO	`GRPOTrainer`	Reasoning with multi-generation (DeepSeek R1 style)
KTO	`KTOTrainer` + `KTOConfig`	Binary feedback (Kahneman-Tversky optimization)
SimPO	`SimPOTrainer` + `SimPOConfig`	No reference model (length-normalized log probs)
VLM SFT	`VLMSFTTrainer`	Vision-Language model fine-tuning
Vision GRPO	`VLMGRPOTrainer` + `VLMGRPOConfig`	Vision-Language GRPO reasoning training
TTS SFT	`TTSSFTTrainer`	Text-to-Speech fine-tuning (Orpheus)
STT SFT	`STTSFTTrainer`	Speech-to-Text fine-tuning (Whisper, Qwen3-ASR, Canary, Voxtral, Voxtral Realtime, Moonshine, Parakeet TDT)
Embedding	`EmbeddingSFTTrainer`	Sentence embedding fine-tuning (BERT, ModernBERT, Qwen3-Embedding, Harrier — InfoNCE/contrastive)
OCR SFT	`OCRSFTTrainer`	Document OCR fine-tuning (DeepSeek-OCR, Qwen3.5 VLM-to-OCR, GLM-OCR — LaTeX, handwriting, multilingual)
OCR GRPO	`OCRGRPOTrainer`	OCR reasoning training with CER/edit-distance reward functions
MoE	`SFTTrainer`	Mixture of Experts fine-tuning (Qwen3.5-MoE, Phi-3.5-MoE, Mixtral, DeepSeek — 39+ architectures)
LFM2	`SFTTrainer`	Liquid Foundation Models (LFM2/LFM2.5 from Liquid AI — hybrid gated-conv + GQA architecture)
CPT	`CPTTrainer` + `CPTConfig`	Continual pretraining on raw text (loss on all tokens, optional embed/lm_head training, decoupled LR)

Comparison

mlx-tune vs Unsloth

Feature	Unsloth (CUDA)	mlx-tune
Platform	NVIDIA GPUs	Apple Silicon
Backend	Triton Kernels	MLX Framework
Memory	VRAM (limited)	Unified (up to 512 GB)
API	Original	100% Compatible
Best For	Production training	Local dev & large models

Note

mlx-tune is not a replacement for Unsloth. It’s a bridge: prototype on your Mac, then deploy to CUDA with the original Unsloth for production training.