Troubleshooting
Common issues and their solutions.
Model Not Found
Error: Model 'xyz' not found
mlx-tune uses models from the mlx-community/ organization on HuggingFace. CUDA-specific models (with -bnb-4bit suffix from unsloth/) won’t work.
# Don't use CUDA-specific models
# model_name = "unsloth/Meta-Llama-3.1-8B-bnb-4bit" # Won't work!
# Use MLX community models instead
model_name = "mlx-community/Llama-3.2-8B-Instruct-4bit"
Browse available models at huggingface.co/mlx-community.
Out of Memory
Symptom: Process gets killed or system becomes unresponsive during model loading or training.
Recommended models by RAM
| RAM | Recommended Size | Example Model |
|---|---|---|
| 8 GB | 0.5B–1B, 4-bit | mlx-community/Llama-3.2-1B-Instruct-4bit |
| 16 GB | 1B–3B, 4-bit | mlx-community/Llama-3.2-3B-Instruct-4bit |
| 32 GB | Up to 7B, 4-bit | mlx-community/Llama-3.2-7B-Instruct-4bit |
| 48 GB | 7B–13B, 4-bit | mlx-community/Llama-3.1-13B-Instruct-4bit |
| 64 GB+ | 13B+ or 8-bit | mlx-community/Llama-3.1-70B-Instruct-4bit |
Solutions
- Use a smaller model
- Use 4-bit quantization (
load_in_4bit=True) - Reduce
max_seq_length - Close other applications (browsers, IDEs consume significant RAM)
- Ensure your macOS is up to date for the latest MLX improvements
Slow Generation
Symptom: Text generation is slower than expected.
Solutions
1. Always enable inference mode before generating:
# Always do this before inference!
FastLanguageModel.for_inference(model)
# Then generate
from mlx_lm import generate
response = generate(model.model, tokenizer,
prompt=prompt, max_tokens=100)
- 2. Use 4-bit quantized models (faster than fp16 for inference)
- 3. Reduce
max_tokensin generation calls - 4. Keep macOS updated for the latest MLX optimizations
- 5. Close memory-heavy applications to free unified memory bandwidth
GGUF Export from Quantized Models
GGUF export (save_pretrained_gguf) doesn’t work with quantized (4-bit) base models. This is a known mlx-lm limitation, not an mlx-tune bug.
What works
| Training with quantized models (QLoRA) | Works |
Saving adapters (save_pretrained) | Works |
Saving merged model (save_pretrained_merged) | Works |
| Inference with trained model | Works |
| GGUF export from quantized base | Doesn’t work |
Workaround 1: Use a non-quantized base model
# Use fp16 model instead of 4-bit
model, tokenizer = FastLanguageModel.from_pretrained(
model_name="mlx-community/Llama-3.2-1B-Instruct", # NOT -4bit
max_seq_length=2048,
load_in_4bit=False, # Train in fp16
)
# Train normally, then export to GGUF
model.save_pretrained_gguf("model", tokenizer) # Works!
Workaround 2: Dequantize during export
model.save_pretrained_gguf("model", tokenizer, dequantize=True)
# Then re-quantize with llama.cpp:
# ./llama-quantize model.gguf model-q4_k_m.gguf Q4_K_M
Workaround 3: Skip GGUF entirely
If you only need the model for MLX/Python inference, use save_pretrained_merged() instead. No GGUF conversion needed.
VLM Issues
Batch size must be 1
VLM training requires per_device_train_batch_size=1 because images produce variable numbers of vision tokens. The VLMSFTTrainer enforces this automatically. Use gradient_accumulation_steps to simulate larger batch sizes.
Think tags in output
Qwen3.5 models may produce <think>...</think> tags in generated text. mlx-tune’s generate() method strips these automatically.
Image format
Images should be PIL Image objects. The UnslothVisionDataCollator handles conversion from datasets automatically.
Text-only VLM training
Qwen3.5 can be fine-tuned on text-only data without images. See example 11.
VLM adapter save/load
If you saved adapters with v0.4.5 or earlier, the adapter_config.json may be in mlx-vlm’s minimal format (missing required fields for GGUF export or load_adapter). Upgrade to v0.4.6+ and re-save:
# Re-save with proper config format
model.save_pretrained("my_adapters")
# Load into a fresh model
model2, processor2 = FastVisionModel.from_pretrained("mlx-community/Qwen3.5-0.8B-bf16")
model2.load_adapter("my_adapters")
Audio Issues (TTS/STT)
mlx-audio not installed
Audio fine-tuning requires the optional audio dependency group:
uv pip install 'mlx-tune[audio]'
SNAC codec model not found
TTS fine-tuning uses the SNAC audio codec. Always use the MLX-format model:
# Correct: MLX safetensors format
model, tokenizer = FastTTSModel.from_pretrained(
"mlx-community/orpheus-3b-0.1-ft-bf16",
codec_model="mlx-community/snac_24khz", # MLX format
)
Do not use hubertsiuzdak/snac_24khz — it only has PyTorch weights (no model.safetensors).
Whisper processor files missing
When loading Whisper models for STT, use the -asr-fp16 variants from mlx-community which include full processor files (preprocessor_config.json, tokenizer.json):
# Correct: has processor files
model, processor = FastSTTModel.from_pretrained("mlx-community/whisper-tiny-asr-fp16")
# Incorrect: missing preprocessor_config.json
# model, processor = FastSTTModel.from_pretrained("mlx-community/whisper-tiny")
Batch size must be 1
Like VLM training, audio training forces batch_size=1 due to variable-length audio sequences. Use gradient_accumulation_steps to simulate larger batches.
Sample rate mismatch
Each model expects a specific sample rate. Always cast your dataset to match:
| Model | Sample Rate |
|---|---|
| Orpheus (SNAC) | 24 kHz |
| OuteTTS (DAC) | 24 kHz |
| Spark-TTS (BiCodec) | 16 kHz |
| Sesame/CSM (Mimi) | 24 kHz |
| Whisper (STT) | 16 kHz |
| Moonshine (STT) | 16 kHz |
from datasets import Audio
# TTS: 24kHz (Orpheus, OuteTTS, Sesame) or 16kHz (Spark)
dataset = dataset.cast_column("audio", Audio(sampling_rate=24000))
# STT: 16kHz (Whisper, Moonshine)
dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))
Converting HF models to MLX
If your model only exists in HuggingFace (PyTorch) format, convert it first:
# LLM/TTS models
FastTTSModel.convert("canopylabs/orpheus-3b-0.1-ft", output_dir="./orpheus-mlx")
# STT models
FastSTTModel.convert("openai/whisper-large-v3", output_dir="./whisper-mlx")
FFmpeg not installed (audio loading errors)
Audio datasets require FFmpeg for decoding. If you see errors like RuntimeError: Failed to load audio or soundfile/librosa failures:
# macOS
brew install ffmpeg
# Verify
ffmpeg -version
datasets version conflict (torchcodec)
datasets ≥ 4.0 dropped soundfile/librosa audio backends and requires torchcodec, which needs specific FFmpeg versions (4–7) that conflict with Homebrew's FFmpeg 8. mlx-tune pins datasets < 4.0 to avoid this. If you see:
ImportError: torchcodec is required but not installed
# or
RuntimeError: FFmpeg version 8 is not supported by torchcodec
Fix by ensuring you have the correct datasets version:
uv pip install 'datasets>=2.14.0,<4.0.0'
Model-specific codec not loading
Each TTS model uses a different audio codec. mlx-tune auto-detects the codec from the model, but if you see errors about missing codec attributes:
- Orpheus — Uses SNAC. Pass
codec_model="mlx-community/snac_24khz"explicitly if auto-detection fails. - OuteTTS — Uses DAC. The codec is loaded via
AudioProcessor()internally. - Spark-TTS — Uses BiCodec. Accessed via
model._audio_tokenizerafter loading. - Sesame/CSM — Uses Mimi. Loaded from the model's codec attribute.
Getting Help
- Check this troubleshooting page first
- Browse the examples for working code
- Open an issue on GitHub
- MLX documentation: ml-explore.github.io/mlx
- mlx-lm issues: github.com/ml-explore/mlx-lm
- mlx-vlm issues: github.com/Blaizzy/mlx-vlm
- mlx-audio issues: github.com/Blaizzy/mlx-audio