NinetyFive Blog

Engineering insights and research from the NinetyFive team

December 2025

torch.compile() Performance without torch.compile() Overhead

How we reach state of the art inference speeds by managing CUDA graphs directly.

November 2025

How Python's Garbage Collection Works

A deep dive into CPython's garbage collection internals, from reference counting to cyclic collection, with links to the actual source code. We encountered long pauses in the free-threaded build and traced it to a bug.

November 2025

Incremental Tokenization with Fenwick Trees

How we use Fenwick trees to achieve 10x faster tokenization updates for real-time code completion.

June 2025

Solving Character Prefix Conditioning with Coverage Beam Search

Language models are the latest way to autocomplete text the user is typing. However, implementing this naively results in an autocomplete that feels jarring.