v0.4.2

Fine-tune LLMs on your Mac with Apple Silicon

Prototype locally, scale to cloud. Same code, just change the import.

$ pip install mlx-tune

Get Started View on GitHub

⚙

Fine-tune Locally

Train LLMs on M1–M5 Macs natively with Apple’s MLX framework. No cloud GPU required.

▒

Unified Memory

Access up to 512GB unified RAM on Mac Studio. Load larger models than discrete GPU VRAM allows.

⇄

Same API as Unsloth

Write once, run on Mac or CUDA. Just change the import line—your training code stays the same.

✉

Export Anywhere

Save to HuggingFace format, GGUF for Ollama and llama.cpp, or merged weights for deployment.

The Idea

One import change. That’s it.

Your Unsloth training scripts work on Apple Silicon with a single line change.

Unsloth (CUDA)

from unsloth import FastLanguageModel
from trl import SFTTrainer, SFTConfig

MLX-Tune (Apple Silicon)

from mlx_tune import FastLanguageModel
from mlx_tune import SFTTrainer, SFTConfig

Rest of your code stays exactly the same.

Quick Start

Up and running in minutes

A complete fine-tuning pipeline in under 20 lines.

from mlx_tune import FastLanguageModel, SFTTrainer, SFTConfig
from datasets import load_dataset

# Load any HuggingFace model (1B model for quick start)
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="mlx-community/Llama-3.2-1B-Instruct-4bit",
    max_seq_length=2048,
    load_in_4bit=True,
)

# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
    model,
    r=16,
    target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
    lora_alpha=16,
)

# Load a dataset
dataset = load_dataset("yahma/alpaca-cleaned", split="train[:100]")

# Train with SFTTrainer (same API as TRL!)
trainer = SFTTrainer(
    model=model,
    train_dataset=dataset,
    tokenizer=tokenizer,
    args=SFTConfig(
        output_dir="outputs",
        per_device_train_batch_size=2,
        learning_rate=2e-4,
        max_steps=50,
    ),
)
trainer.train()

# Save (same API as Unsloth!)
model.save_pretrained("lora_model")           # Adapters only
model.save_pretrained_merged("merged", tokenizer)  # Full model

Installation

Get MLX-Tune

Using uv (recommended)

uv pip install mlx-tune

Using pip

pip install mlx-tune

From source (development)

git clone https://github.com/ARahim3/mlx-tune.git
cd mlx-tune
uv pip install -e .

Requirements

Hardware	Apple Silicon Mac (M1 / M2 / M3 / M4 / M5)
OS	macOS 13.0+
Memory	16 GB+ unified RAM (32 GB+ for 7B+ models)
Python	3.9+

Training Methods

Supported trainers

All trainers use native MLX — no subprocess calls or CUDA wrappers.

Method	Trainer	Use Case
SFT	`SFTTrainer`	Instruction fine-tuning
DPO	`DPOTrainer`	Preference learning
ORPO	`ORPOTrainer`	Combined SFT + odds ratio preference
GRPO	`GRPOTrainer`	Reasoning with multi-generation (DeepSeek R1 style)
KTO	`KTOTrainer`	Kahneman-Tversky optimization
SimPO	`SimPOTrainer`	Simple preference optimization
VLM SFT	`VLMSFTTrainer`	Vision-Language model fine-tuning

Comparison

MLX-Tune vs Unsloth

Feature	Unsloth (CUDA)	MLX-Tune
Platform	NVIDIA GPUs	Apple Silicon
Backend	Triton Kernels	MLX Framework
Memory	VRAM (limited)	Unified (up to 512 GB)
API	Original	100% Compatible
Best For	Production training	Local dev & large models

Note

MLX-Tune is not a replacement for Unsloth. It’s a bridge: prototype on your Mac, then deploy to CUDA with the original Unsloth for production training.