Tensormesh raises $4.5M to squeeze more inference out of AI server loads

tensormesh-raises-$4.5m-to-squeeze-more-inference-out-of-ai-server-loads

With the AI infrastructure push reaching staggering proportions, there’s more pressure than ever to squeeze as much inference as possible out of the GPUs they have. And for researchers with expertise in a particular technique, it’s a great time to raise funding.

That’s part of the driving force behind Tensormesh, launching out of stealth this week with $4.5 million in seed funding. The investment was led by Laude Ventures, with additional angel funding from database pioneer Michael Franklin.

Tensormesh is using the money to build a commercial version of the open-source LMCache utility, launched and maintained by Tensormesh co-founder Yihua Cheng. Used well, LMCache can reduce inference costs by as much as ten times — a power that’s made it a staple in open-source deployments and drawn in integrations from heavy-hitters like Google and Nvidia. Now, Tensormesh is planning to parlay that academic reputation into a viable business.

The heart of the key-value cache (or KV cache), a memory system used to process complex inputs more efficiently by condensing them down to their key values. In traditional architectures, the KV cache is discarded at the end of each query — but TensorMesh CEO Juchen Jiang argues that this is an enormous source of inefficiency.

“It’s like having a very smart analyst reading all the data, but they forget what they have learned after each question,” says Tensormesh co-founder Junchen Jiang.

Instead of discarding that cache, Tensormesh’s systems hold onto it, allowing it to be redeployed when the model executes a similar process in a separate query. Because GPU memory is so precious, this can mean spreading data across several different storage layers, but the reward is significantly more inference power for the same server load.

The change is particularly powerful for chat interfaces, since models need to continually refer back to the growing chat log as the conversation progresses. Agentic systems have a similar issue, with a growing log of actions and goals.

In theory, these are changes AI companies can execute on their own — but the technical complexity makes it a daunting task. Given the Tensormesh team’s work researching the process and the intricacy of the detail itself, the company is betting there will be lots of demand for an out-of-the-box product.

“Keeping the KV cache in a secondary storage system and reused efficiently without slowing the whole system down is a very challenging problem,” says Jiang. “We’ve seen people hire 20 engineers and spend three or four months to build such a system. Or they can use our product and do it very efficiently.”

Russell Brandom has been covering the tech industry since 2012, with a focus on platform policy and emerging technologies. He previously worked at The Verge and Rest of World, and has written for Wired, The Awl and MIT’s Technology Review. He can be reached at russell.brandom@techcrunch.com or on Signal at 412-401-5489.

View Bio

Related Posts

Leave a Reply