API Reference
All public APIs exported from mlx_tune. Import everything from the top-level package.
from mlx_tune import FastLanguageModel, SFTTrainer, SFTConfig
# ... or any other export listed below
Core Model
mlx_tune.modelFastLanguageModel
Main entry point for loading and configuring language models. Mirrors Unsloth’s FastLanguageModel API.
FastLanguageModel.from_pretrained()
Load a pretrained language model from HuggingFace.
| Parameter | Type | Description |
|---|---|---|
model_name | str | HuggingFace model ID (e.g., "mlx-community/Llama-3.2-1B-Instruct-4bit") or local path |
max_seq_length | int, optional | Maximum sequence length for training/inference |
load_in_4bit | bool | Load model with 4-bit quantization (QLoRA) |
load_in_8bit | bool | Load model with 8-bit quantization |
FastLanguageModel.get_peft_model()
Add LoRA adapters to the model for parameter-efficient fine-tuning.
| Parameter | Type | Description |
|---|---|---|
model | MLXModelWrapper | Model from from_pretrained() |
r | int | LoRA rank (higher = more parameters, better quality) |
target_modules | list[str], optional | Modules to apply LoRA to. Default: ["q_proj", "k_proj", "v_proj", "o_proj"] |
lora_alpha | int | LoRA scaling factor. Recommended: equal to r |
lora_dropout | float | Dropout for LoRA layers |
FastLanguageModel.for_inference()
Enable inference mode: activates KV caching, disables dropout. Always call before generating.
MLXModelWrapper
Internal wrapper providing Unsloth-compatible methods on MLX models. Returned by FastLanguageModel.from_pretrained().
Key Methods
SFT Training
mlx_tune.sft_trainerSFTTrainer
Supervised fine-tuning trainer. API-compatible with TRL’s SFTTrainer.
| Parameter | Type | Description |
|---|---|---|
model | MLXModelWrapper | Model with LoRA adapters configured |
train_dataset | Dataset | HuggingFace dataset or list of dicts |
tokenizer | Tokenizer | Tokenizer from from_pretrained() |
args | SFTConfig | Training configuration |
SFTConfig
Training configuration. Compatible with TRL’s SFTConfig parameters.
| Parameter | Default | Description |
|---|---|---|
output_dir | "outputs" | Directory for checkpoints and logs |
per_device_train_batch_size | 2 | Batch size per device |
gradient_accumulation_steps | 4 | Number of gradient accumulation steps |
learning_rate | 2e-4 | Peak learning rate |
max_steps | -1 | Total training steps (-1 = use epochs) |
max_seq_length | 2048 | Maximum sequence length |
optim | "adam" | Optimizer (use "adam" for MLX) |
warmup_steps | 5 | Linear warmup steps |
lr_scheduler_type | "linear" | LR scheduler: linear, cosine, constant |
RL Trainers
mlx_tune.rl_trainersDPOTrainer
Direct Preference Optimization trainer. Uses proper DPO loss with log-probability computation over chosen/rejected pairs.
ORPOTrainer
Odds Ratio Preference Optimization. Combines SFT loss with odds-ratio preference alignment.
GRPOTrainer
Group Relative Policy Optimization (DeepSeek R1 style). Generates multiple completions per prompt and optimizes based on relative rewards.
KTOTrainer & SimPOTrainer
Utilities
Vision Models
mlx_tune.vlmFastVisionModel
Vision-Language Model API. Mirrors Unsloth’s FastVisionModel.
FastVisionModel.from_pretrained()
Load a vision-language model. Returns model wrapper and processor (not tokenizer).
FastVisionModel.get_peft_model()
Add LoRA adapters to vision and/or language components.
| Parameter | Type | Description |
|---|---|---|
finetune_vision_layers | bool | Apply LoRA to vision encoder |
finetune_language_layers | bool | Apply LoRA to language model |
finetune_attention_modules | bool | Apply LoRA to attention modules |
finetune_mlp_modules | bool | Apply LoRA to MLP modules |
VLMSFTTrainer
Vision-Language model trainer. Batch size is forced to 1 (images produce variable token counts).
VLMSFTConfig
UnslothVisionDataCollator
Handles image preprocessing, vision token insertion, and batch preparation for VLM training.
Chat Templates
mlx_tune.chat_templatesget_chat_template()
Apply a chat template to the tokenizer. Supports 15 model templates.
| Parameter | Description |
|---|---|
tokenizer | Tokenizer to update |
chat_template | Template name or "auto" for auto-detection. Options: llama-3, llama-2, gemma, qwen-2.5, qwen-3, phi-3, phi-4, mistral-7b, deepseek, command-r, neural-chat, solar, tulu-2, zephyr, alpaca |
train_on_responses_only()
Modify trainer to compute loss only on assistant response tokens (not prompts). Significantly improves training quality.
Dataset Utilities
"alpaca", "sharegpt", or "chatml"Template Helpers
Constants
"llama3" → "llama-3")Loss Functions
mlx_tune.lossesLow-level loss functions for custom training loops.