Troubleshooting

Common issues and their solutions.

Model Not Found

Error: Model 'xyz' not found

mlx-tune uses models from the mlx-community/ organization on HuggingFace. CUDA-specific models (with -bnb-4bit suffix from unsloth/) won’t work.

# Don't use CUDA-specific models
# model_name = "unsloth/Meta-Llama-3.1-8B-bnb-4bit"  # Won't work!

# Use MLX community models instead
model_name = "mlx-community/Llama-3.2-8B-Instruct-4bit"

Tip

Browse available models at huggingface.co/mlx-community.

Out of Memory

Symptom: Process gets killed or system becomes unresponsive during model loading or training.

Recommended models by RAM

RAM	Recommended Size	Example Model
8 GB	0.5B–1B, 4-bit	`mlx-community/Llama-3.2-1B-Instruct-4bit`
16 GB	1B–3B, 4-bit	`mlx-community/Llama-3.2-3B-Instruct-4bit`
32 GB	Up to 7B, 4-bit	`mlx-community/Llama-3.2-7B-Instruct-4bit`
48 GB	7B–13B, 4-bit	`mlx-community/Llama-3.1-13B-Instruct-4bit`
64 GB+	13B+ or 8-bit	`mlx-community/Llama-3.1-70B-Instruct-4bit`

Solutions

Use a smaller model
Use 4-bit quantization (load_in_4bit=True)
Reduce max_seq_length
Close other applications (browsers, IDEs consume significant RAM)
Ensure your macOS is up to date for the latest MLX improvements

Slow Generation

Symptom: Text generation is slower than expected.

Solutions

1. Always enable inference mode before generating:

# Always do this before inference!
FastLanguageModel.for_inference(model)

# Then generate
from mlx_lm import generate
response = generate(model.model, tokenizer,
    prompt=prompt, max_tokens=100)

2. Use 4-bit quantized models (faster than fp16 for inference)
3. Reduce max_tokens in generation calls
4. Keep macOS updated for the latest MLX optimizations
5. Close memory-heavy applications to free unified memory bandwidth

GGUF Export from Quantized Models

GGUF export (save_pretrained_gguf) doesn’t work with quantized (4-bit) base models. This is a known mlx-lm limitation, not an mlx-tune bug.

What works

Training with quantized models (QLoRA)	Works
Saving adapters (`save_pretrained`)	Works
Saving merged model (`save_pretrained_merged`)	Works
Inference with trained model	Works
GGUF export from quantized base	Doesn’t work

Workaround 1: Use a non-quantized base model

# Use fp16 model instead of 4-bit
model, tokenizer = FastLanguageModel.from_pretrained(
    model_name="mlx-community/Llama-3.2-1B-Instruct",  # NOT -4bit
    max_seq_length=2048,
    load_in_4bit=False,  # Train in fp16
)
# Train normally, then export to GGUF
model.save_pretrained_gguf("model", tokenizer)  # Works!

Workaround 2: Dequantize during export

model.save_pretrained_gguf("model", tokenizer, dequantize=True)
# Then re-quantize with llama.cpp:
# ./llama-quantize model.gguf model-q4_k_m.gguf Q4_K_M

Workaround 3: Skip GGUF entirely

If you only need the model for MLX/Python inference, use save_pretrained_merged() instead. No GGUF conversion needed.

VLM Issues

Batch size must be 1

VLM training requires per_device_train_batch_size=1 because images produce variable numbers of vision tokens. The VLMSFTTrainer enforces this automatically. Use gradient_accumulation_steps to simulate larger batch sizes.

Think tags in output

Qwen3.5 models may produce <think>...</think> tags in generated text. mlx-tune’s generate() method strips these automatically.

Image format

Images should be PIL Image objects. The UnslothVisionDataCollator handles conversion from datasets automatically.

Text-only VLM training

Qwen3.5 can be fine-tuned on text-only data without images. See example 11.

VLM adapter save/load

If you saved adapters with v0.4.5 or earlier, the adapter_config.json may be in mlx-vlm’s minimal format (missing required fields for GGUF export or load_adapter). Upgrade to v0.4.6+ and re-save:

# Re-save with proper config format
model.save_pretrained("my_adapters")

# Load into a fresh model
model2, processor2 = FastVisionModel.from_pretrained("mlx-community/Qwen3.5-0.8B-bf16")
model2.load_adapter("my_adapters")

Audio Issues (TTS/STT)

mlx-audio not installed

Audio fine-tuning requires the optional audio dependency group:

uv pip install 'mlx-tune[audio]'

SNAC codec model not found

TTS fine-tuning uses the SNAC audio codec. Always use the MLX-format model:

# Correct: MLX safetensors format
model, tokenizer = FastTTSModel.from_pretrained(
    "mlx-community/orpheus-3b-0.1-ft-bf16",
    codec_model="mlx-community/snac_24khz",  # MLX format
)

Do not use hubertsiuzdak/snac_24khz — it only has PyTorch weights (no model.safetensors).

Whisper processor files missing

When loading Whisper models for STT, use the -asr-fp16 variants from mlx-community which include full processor files (preprocessor_config.json, tokenizer.json):

# Correct: has processor files
model, processor = FastSTTModel.from_pretrained("mlx-community/whisper-tiny-asr-fp16")

# Incorrect: missing preprocessor_config.json
# model, processor = FastSTTModel.from_pretrained("mlx-community/whisper-tiny")

Batch size must be 1

Like VLM training, audio training forces batch_size=1 due to variable-length audio sequences. Use gradient_accumulation_steps to simulate larger batches.

Sample rate mismatch

Each model expects a specific sample rate. Always cast your dataset to match:

Model	Sample Rate
Orpheus (SNAC)	24 kHz
OuteTTS (DAC)	24 kHz
Spark-TTS (BiCodec)	16 kHz
Sesame/CSM (Mimi)	24 kHz
Whisper (STT)	16 kHz
Moonshine (STT)	16 kHz

from datasets import Audio
# TTS: 24kHz (Orpheus, OuteTTS, Sesame) or 16kHz (Spark)
dataset = dataset.cast_column("audio", Audio(sampling_rate=24000))
# STT: 16kHz (Whisper, Moonshine)
dataset = dataset.cast_column("audio", Audio(sampling_rate=16000))

Converting HF models to MLX

If your model only exists in HuggingFace (PyTorch) format, convert it first:

# LLM/TTS models
FastTTSModel.convert("canopylabs/orpheus-3b-0.1-ft", output_dir="./orpheus-mlx")

# STT models
FastSTTModel.convert("openai/whisper-large-v3", output_dir="./whisper-mlx")

FFmpeg not installed (audio loading errors)

Audio datasets require FFmpeg for decoding. If you see errors like RuntimeError: Failed to load audio or soundfile/librosa failures:

# macOS
brew install ffmpeg

# Verify
ffmpeg -version

`datasets` version conflict (torchcodec)

datasets ≥ 4.0 dropped soundfile/librosa audio backends and requires torchcodec, which needs specific FFmpeg versions (4–7) that conflict with Homebrew's FFmpeg 8. mlx-tune pins datasets < 4.0 to avoid this. If you see:

ImportError: torchcodec is required but not installed
# or
RuntimeError: FFmpeg version 8 is not supported by torchcodec

Fix by ensuring you have the correct datasets version:

uv pip install 'datasets>=2.14.0,<4.0.0'

Model-specific codec not loading

Each TTS model uses a different audio codec. mlx-tune auto-detects the codec from the model, but if you see errors about missing codec attributes:

Orpheus — Uses SNAC. Pass codec_model="mlx-community/snac_24khz" explicitly if auto-detection fails.
OuteTTS — Uses DAC. The codec is loaded via AudioProcessor() internally.
Spark-TTS — Uses BiCodec. Accessed via model._audio_tokenizer after loading.
Sesame/CSM — Uses Mimi. Loaded from the model's codec attribute.

Getting Help

Check this troubleshooting page first
Browse the examples for working code
Open an issue on GitHub
MLX documentation: ml-explore.github.io/mlx
mlx-lm issues: github.com/ml-explore/mlx-lm
mlx-vlm issues: github.com/Blaizzy/mlx-vlm
mlx-audio issues: github.com/Blaizzy/mlx-audio

Troubleshooting

Model Not Found

Out of Memory

Recommended models by RAM

Solutions

Slow Generation

Solutions

GGUF Export from Quantized Models

What works

Workaround 1: Use a non-quantized base model

Workaround 2: Dequantize during export

Workaround 3: Skip GGUF entirely

VLM Issues

Batch size must be 1

Think tags in output

Image format

Text-only VLM training

VLM adapter save/load

Audio Issues (TTS/STT)

mlx-audio not installed

SNAC codec model not found

Whisper processor files missing

Batch size must be 1

Sample rate mismatch

Converting HF models to MLX

FFmpeg not installed (audio loading errors)

datasets version conflict (torchcodec)

Model-specific codec not loading

Getting Help

`datasets` version conflict (torchcodec)