v0.4.25

Fine-tune LLMs, Vision, and Audio models on your Mac

SFT, DPO, GRPO, Vision, TTS, STT, and OCR fine-tuning — natively on MLX. Unsloth-compatible API.

Copied! $ pip install mlx-tune

Fine-tune Locally

Train LLMs, vision, and audio models on M1–M5 Macs natively with Apple’s MLX framework. No cloud GPU required.

Unified Memory

Access up to 512GB unified RAM on Mac Studio. Load larger models than discrete GPU VRAM allows.

Unsloth-Compatible API

Your existing Unsloth training scripts run on Apple Silicon. Change the import, keep everything else.

Export Anywhere

Save to HuggingFace format, GGUF for Ollama and llama.cpp, or merged weights for deployment.

Choose your track

Each modality has its own guide with Quick Start, API reference, examples, and tips.

One import change. That’s it.

Existing Unsloth training scripts work on Apple Silicon with minimal changes.

Unsloth (CUDA)
from unsloth import FastLanguageModel
from trl import SFTTrainer, SFTConfig
mlx-tune (Apple Silicon)
from mlx_tune import FastLanguageModel
from mlx_tune import SFTTrainer, SFTConfig

Rest of your code stays exactly the same.

Up and running in minutes

A complete fine-tuning pipeline in under 20 lines.

from mlx_tune import FastLanguageModel, SFTTrainer, SFTConfig
from datasets import load_dataset

# Load any HuggingFace model (1B model for quick start)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="mlx-community/Llama-3.2-1B-Instruct-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha=16,
)

# Load a dataset
dataset = load_dataset("yahma/alpaca-cleaned", split="train[:100]")

# Train with SFTTrainer (same API as TRL!)
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    tokenizer=tokenizer,
    args=SFTConfig(
        output_dir="outputs",
        per_device_train_batch_size=2,
        learning_rate=2e-4,
        max_steps=50,
    ),
)
trainer.train()

# Save (same API as Unsloth!)
model.save_pretrained("lora_model")           # Adapters only
model.save_pretrained_merged("merged", tokenizer)  # Full model

Get mlx-tune

Using uv (recommended)

uv pip install mlx-tune

Using pip

pip install mlx-tune

With audio support (TTS/STT)

uv pip install 'mlx-tune[audio]'

From source (development)

git clone https://github.com/ARahim3/mlx-tune.git
cd mlx-tune
uv pip install -e .

Requirements

HardwareApple Silicon Mac (M1 / M2 / M3 / M4 / M5)
OSmacOS 13.0+
Memory8 GB+ unified RAM  (16 GB+ recommended)
Python3.9+

Supported trainers

All trainers use native MLX — no subprocess calls or CUDA wrappers.

Method Trainer Use Case
SFT SFTTrainer Instruction fine-tuning
DPO DPOTrainer Preference learning
ORPO ORPOTrainer Combined SFT + odds ratio preference
GRPO GRPOTrainer Reasoning with multi-generation (DeepSeek R1 style)
KTO KTOTrainer + KTOConfig Binary feedback (Kahneman-Tversky optimization)
SimPO SimPOTrainer + SimPOConfig No reference model (length-normalized log probs)
VLM SFT VLMSFTTrainer Vision-Language model fine-tuning
Vision GRPO VLMGRPOTrainer + VLMGRPOConfig Vision-Language GRPO reasoning training
TTS SFT TTSSFTTrainer Text-to-Speech fine-tuning (Orpheus)
STT SFT STTSFTTrainer Speech-to-Text fine-tuning (Whisper, Qwen3-ASR, Canary, Voxtral, Voxtral Realtime, Moonshine, Parakeet TDT)
Embedding EmbeddingSFTTrainer Sentence embedding fine-tuning (BERT, ModernBERT, Qwen3-Embedding, Harrier — InfoNCE/contrastive)
OCR SFT OCRSFTTrainer Document OCR fine-tuning (DeepSeek-OCR, Qwen3.5 VLM-to-OCR, GLM-OCR — LaTeX, handwriting, multilingual)
OCR GRPO OCRGRPOTrainer OCR reasoning training with CER/edit-distance reward functions
MoE SFTTrainer Mixture of Experts fine-tuning (Qwen3.5-MoE, Phi-3.5-MoE, Mixtral, DeepSeek — 39+ architectures)
LFM2 SFTTrainer Liquid Foundation Models (LFM2/LFM2.5 from Liquid AI — hybrid gated-conv + GQA architecture)
CPT CPTTrainer + CPTConfig Continual pretraining on raw text (loss on all tokens, optional embed/lm_head training, decoupled LR)

mlx-tune vs Unsloth

Feature Unsloth (CUDA) mlx-tune
Platform NVIDIA GPUs Apple Silicon
Backend Triton Kernels MLX Framework
Memory VRAM (limited) Unified (up to 512 GB)
API Original 100% Compatible
Best For Production training Local dev & large models
Note

mlx-tune is not a replacement for Unsloth. It’s a bridge: prototype on your Mac, then deploy to CUDA with the original Unsloth for production training.