Unsloth → MLX-Tune Translation Guide
Everything you learned from Unsloth tutorials works on your Mac. This page shows exactly what to change—and what stays the same.
Three steps to run on Mac
Take any Unsloth tutorial and make it run on Apple Silicon.
Pick an Unsloth Notebook
Find a tutorial from Unsloth’s docs or Colab notebooks. Any SFT, DPO, ORPO, or GRPO example will work.
Change Imports & Model
Swap the import lines and use mlx-community model names instead of unsloth/ models.
Run on Your Mac
Execute locally, iterate fast, then scale to CUDA with the original Unsloth when ready.
What to change in your imports
Every import maps one-to-one. Replace the left column with the right column.
| Unsloth / TRL | MLX-Tune |
|---|---|
from unsloth import FastLanguageModel |
from mlx_tune import FastLanguageModel |
from trl import SFTTrainer, SFTConfig |
from mlx_tune import SFTTrainer, SFTConfig |
from trl import DPOTrainer, DPOConfig |
from mlx_tune import DPOTrainer, DPOConfig |
from trl import ORPOTrainer, ORPOConfig |
from mlx_tune import ORPOTrainer, ORPOConfig |
from trl import GRPOTrainer, GRPOConfig |
from mlx_tune import GRPOTrainer, GRPOConfig |
from unsloth import FastVisionModel |
from mlx_tune import FastVisionModel |
from unsloth.trainer import UnslothVisionDataCollator |
from mlx_tune import UnslothVisionDataCollator |
from unsloth import get_chat_template |
from mlx_tune import get_chat_template |
from unsloth import train_on_responses_only |
from mlx_tune import train_on_responses_only |
from unsloth import to_sharegpt |
from mlx_tune import to_sharegpt |
HuggingFace model mapping
Unsloth uses models from the unsloth/ org on HuggingFace. MLX-Tune uses pre-converted models from the mlx-community/ org, which are optimized for Apple’s MLX framework.
| Unsloth Model | MLX-Tune Equivalent |
|---|---|
unsloth/Meta-Llama-3.1-8B-bnb-4bit |
mlx-community/Llama-3.2-8B-Instruct-4bit |
unsloth/Qwen2.5-7B-bnb-4bit |
mlx-community/Qwen2.5-7B-Instruct-4bit |
unsloth/gemma-2-9b-it-bnb-4bit |
mlx-community/gemma-2-9b-it-4bit |
unsloth/Phi-4-bnb-4bit |
mlx-community/Phi-4-4bit |
unsloth/mistral-7b-v0.3-bnb-4bit |
mlx-community/Mistral-7B-Instruct-v0.3-4bit |
Qwen/Qwen3.5-0.8B |
mlx-community/Qwen3.5-0.8B-bf16 |
Find MLX models at huggingface.co/mlx-community. Most popular models are available in 4-bit and 8-bit quantizations.
What changes in training config
Most parameters are identical. A few CUDA-specific options are either replaced or no longer needed.
| Parameter | Unsloth | MLX-Tune | Notes |
|---|---|---|---|
per_device_train_batch_size |
Same | Same | |
gradient_accumulation_steps |
Same | Same | |
learning_rate |
Same | Same | |
max_steps |
Same | Same | |
optim |
"adamw_8bit" |
"adam" |
MLX uses standard Adam |
fp16 / bf16 |
True |
Not needed | MLX handles precision automatically |
device_map |
"auto" |
Not needed | No device mapping on Apple Silicon |
dataset_num_proc |
2 |
Not needed | Single-process on Mac |
Actual Unsloth notebook → MLX-Tune
Here’s Unsloth’s Qwen3.5 Vision notebook converted to MLX-Tune. Only the highlighted lines change.
Out of ~40 lines of training code, only 8 lines change: imports (2), model name (1), trainer class (1), config class (1), batch size (1), optimizer (1), and removing torch import (1). Everything else — LoRA config, dataset prep, data collator, save — is identical.
Complete SFT script comparison
A full training script, side by side. The highlighted lines are the only differences.
from unsloth import FastLanguageModel
from trl import SFTTrainer, SFTConfig
from datasets import load_dataset
model, tokenizer = FastLanguageModel.from_pretrained(
"unsloth/Meta-Llama-3.1-8B-bnb-4bit",
max_seq_length=2048,
load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(
model, r=16, lora_alpha=16,
target_modules=["q_proj", "k_proj",
"v_proj", "o_proj"],
)
dataset = load_dataset(
"yahma/alpaca-cleaned", split="train"
)
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
tokenizer=tokenizer,
args=SFTConfig(
output_dir="outputs",
per_device_train_batch_size=2,
learning_rate=2e-4,
max_steps=100,
optim="adamw_8bit",
fp16=not is_bfloat16_supported(),
bf16=is_bfloat16_supported(),
),
)
trainer.train()
model.save_pretrained("lora_model")
from mlx_tune import FastLanguageModel
from mlx_tune import SFTTrainer, SFTConfig
from datasets import load_dataset
model, tokenizer = FastLanguageModel.from_pretrained(
"mlx-community/Llama-3.2-8B-Instruct-4bit",
max_seq_length=2048,
load_in_4bit=True,
)
model = FastLanguageModel.get_peft_model(
model, r=16, lora_alpha=16,
target_modules=["q_proj", "k_proj",
"v_proj", "o_proj"],
)
dataset = load_dataset(
"yahma/alpaca-cleaned", split="train"
)
trainer = SFTTrainer(
model=model,
train_dataset=dataset,
tokenizer=tokenizer,
args=SFTConfig(
output_dir="outputs",
per_device_train_batch_size=2,
learning_rate=2e-4,
max_steps=100,
optim="adam",
),
)
trainer.train()
model.save_pretrained("lora_model")
VLM fine-tuning translation
Vision-Language Model fine-tuning follows the same pattern, with a few MLX-Tune-specific additions.
Import changes
from unsloth import FastVisionModel
from unsloth.trainer import UnslothVisionDataCollator
from trl import SFTTrainer, SFTConfig
from mlx_tune import FastVisionModel
from mlx_tune import UnslothVisionDataCollator
from mlx_tune.vlm import VLMSFTTrainer, VLMSFTConfig
Key differences for VLM
| Aspect | Unsloth | MLX-Tune |
|---|---|---|
| Trainer class | SFTTrainer |
VLMSFTTrainer |
| Config class | SFTConfig |
VLMSFTConfig |
| Batch size | Flexible | Must be 1 |
| Data collator | UnslothVisionDataCollator |
UnslothVisionDataCollator |
VLM training in MLX-Tune must use per_device_train_batch_size=1. Images produce variable numbers of vision tokens, so batching is not supported. Use gradient_accumulation_steps to simulate larger effective batch sizes.
Everything else is identical
The vast majority of your Unsloth code works without any changes.
get_peft_model()parameters —r,lora_alpha,target_modules,lora_dropout, and all other LoRA settings- Dataset formats — Alpaca, ShareGPT, and ChatML are auto-detected and converted
- Chat templates —
get_chat_template()with the same template names (llama-3,qwen2.5,gemma,phi-4,mistral, etc.) - Response-only training —
train_on_responses_only()with the sameinstruction_partandresponse_partparameters - Save methods —
save_pretrained(),save_pretrained_merged(), andsave_pretrained_gguf() - LoRA configuration — Adapter loading and saving is fully compatible
- Dataset utilities —
to_sharegpt(),apply_column_mapping(), andHFDatasetConfig - Inference —
FastLanguageModel.for_inference(model)and streaming generation
If you can fine-tune with Unsloth, you can fine-tune with MLX-Tune. Change the imports, swap the model name, drop the CUDA-specific config—and you’re training on your Mac.