mlx-lm Server Crashes with transformers ≥ 5.0 — Workaround Available | Tool Update
A confirmed compatibility issue causes mlx lm.server to crash on every /v1/chat/completions request when transformers 5.0 or later is installed. The workaround…
Published on MyPrivateClaw
Mar 31, 2026, 6:50 AM UTC
Coverage date
Feb 15, 2026
Last updated
Apr 4, 2026, 5:45 AM UTC
News summary
A bug report filed on February 15, 2026 (GitHub issue 897) revealed that the mlx lm server crashes silently on every /v1/chat/completions request when running with transformers version 5.0 or later. The root cause is a breaking change in transformers 5.0: the apply chat template() function now returns a BatchEncoding dict by default instead of a plain list of integers. A prior fix (PR 691) had updated chat.py, generate.py, and other files to pass return dict=False, but server.py was missed. The result is that stream generate() receives an unexpected dict type and fails with an empty reply — no error message, just a dropped connection. The /v1/completions endpoint is unaffected; only the chat completions path triggers the crash. The fix is a one line change: adding return dict=False to the apply chat template() call in server.py. The issue was closed as completed by maintainer awni on Fe…