LlamaFactory v0.9.5 Brings Support for Qwen3.5, Gemma4, and Transformers v5

By FintechExtra
30 May 2026

LlamaFactory v0.9.5 is out, and it's a big one. The open-source fine-tuning framework now supports Qwen3.5, Qwen3.6, and Gemma4 models out of the box. It also officially works with Transformers v5. This isn't just a routine maintenance release — there are real improvements under the hood that matter for anyone training custom LLMs.

What Changed

The headline feature is “primary support” for Qwen3.5, Qwen3.6, and Gemma4. That means you can now load these architectures directly without patching or custom scripts. For Qwen models, that's the latest 3.5 and 3.6 variants — both known for strong multilingual and reasoning capabilities. Gemma4 is Google's newest lightweight model family, so this opens up fine-tuning on a modern, efficient base.

Equally important: compatibility with Transformers v5. The Hugging Face library just had its fifth major version, and LlamaFactory now runs on top of it. This ensures you can use the latest tokenizers, model implementations, and training utilities without version conflicts.

Beyond those, the changelog reveals several targeted fixes. One PR adds FP8 support via the Transformer Engine backend — that's a performance boost for NVIDIA GPUs that can leverage 8-bit floating point. Another cleans up a crash when the architectures field in config.json is empty. And there's explicit support for Youtu-LLM-2B, a Chinese-language model from Tencent.

Why It Matters

These changes aren't just a compatibility checkbox. The Qwen3.x and Gemma4 models represent the cutting edge of open-weight LLMs. By supporting them in v0.9.5, LlamaFactory lets practitioners fine-tune state-of-the-art bases without waiting weeks for community forks. And the Transformers v5 bump means you're not stuck on an old version of the ecosystem — you get the latest optimizations and safety patches.

The FP8 backend is a subtle but meaningful addition. For teams running large-scale fine-tuning, 8-bit floating point can cut memory usage almost in half while preserving model quality. It's a quiet win for efficiency.

Youtu-LLM-2B support? That's niche, but it reflects LlamaFactory's growing reach into Chinese AI development. The ecosystem is increasingly multilingual, and this update acknowledges that reality.

Personally, I think the most important takeaway is the speed of iteration. LlamaFactory landed these new models within weeks of their release. That's a sign of a healthy project — one that prioritizes staying current over perfecting legacy code. If you're fine-tuning LLMs in production, v0.9.5 is likely worth the upgrade.

Official Source: https://github.com/hiyouga/LlamaFactory/releases/tag/v0.9.5