Fine-tune LLMs on your Mac with Apple Silicon
Prototype locally, scale to cloud. Same code, just change the import.
Fine-tune Locally
Train LLMs on M1–M5 Macs natively with Apple’s MLX framework. No cloud GPU required.
Unified Memory
Access up to 512GB unified RAM on Mac Studio. Load larger models than discrete GPU VRAM allows.
Same API as Unsloth
Write once, run on Mac or CUDA. Just change the import line—your training code stays the same.
Export Anywhere
Save to HuggingFace format, GGUF for Ollama and llama.cpp, or merged weights for deployment.
One import change. That’s it.
Your Unsloth training scripts work on Apple Silicon with a single line change.
from unsloth import FastLanguageModel
from trl import SFTTrainer, SFTConfig
from mlx_tune import FastLanguageModel
from mlx_tune import SFTTrainer, SFTConfig
Rest of your code stays exactly the same.
Up and running in minutes
A complete fine-tuning pipeline in under 20 lines.
from mlx_tune import FastLanguageModel, SFTTrainer, SFTConfig
from datasets import load_dataset
# Load any HuggingFace model (1B model for quick start)
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="mlx-community/Llama-3.2-1B-Instruct-4bit",
max_seq_length=2048,
load_in_4bit=True,
)
# Add LoRA adapters
model = FastLanguageModel.get_peft_model(
model,
r=16,
target_modules=["q_proj", "k_proj", "v_proj", "o_proj"],
lora_alpha=16,
)
# Load a dataset
dataset = load_dataset("yahma/alpaca-cleaned", split="train[:100]")
# Train with SFTTrainer (same API as TRL!)
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
tokenizer=tokenizer,
args=SFTConfig(
output_dir="outputs",
per_device_train_batch_size=2,
learning_rate=2e-4,
max_steps=50,
),
)
trainer.train()
# Save (same API as Unsloth!)
model.save_pretrained("lora_model") # Adapters only
model.save_pretrained_merged("merged", tokenizer) # Full model
Get MLX-Tune
Using uv (recommended)
uv pip install mlx-tune
Using pip
pip install mlx-tune
From source (development)
git clone https://github.com/ARahim3/mlx-tune.git
cd mlx-tune
uv pip install -e .
Requirements
| Hardware | Apple Silicon Mac (M1 / M2 / M3 / M4 / M5) |
| OS | macOS 13.0+ |
| Memory | 16 GB+ unified RAM (32 GB+ for 7B+ models) |
| Python | 3.9+ |
Supported trainers
All trainers use native MLX — no subprocess calls or CUDA wrappers.
| Method | Trainer | Use Case |
|---|---|---|
| SFT | SFTTrainer |
Instruction fine-tuning |
| DPO | DPOTrainer |
Preference learning |
| ORPO | ORPOTrainer |
Combined SFT + odds ratio preference |
| GRPO | GRPOTrainer |
Reasoning with multi-generation (DeepSeek R1 style) |
| KTO | KTOTrainer |
Kahneman-Tversky optimization |
| SimPO | SimPOTrainer |
Simple preference optimization |
| VLM SFT | VLMSFTTrainer |
Vision-Language model fine-tuning |
MLX-Tune vs Unsloth
| Feature | Unsloth (CUDA) | MLX-Tune |
|---|---|---|
| Platform | NVIDIA GPUs | Apple Silicon |
| Backend | Triton Kernels | MLX Framework |
| Memory | VRAM (limited) | Unified (up to 512 GB) |
| API | Original | 100% Compatible |
| Best For | Production training | Local dev & large models |
MLX-Tune is not a replacement for Unsloth. It’s a bridge: prototype on your Mac, then deploy to CUDA with the original Unsloth for production training.