Thinking Machines wants large language models to give consistent answers every time

amu · September 11, 2025, 7:00pm

Thinking Machines is tackling an important challenge in the realm of Large Language Models (LLMs) by attempting to ensure consistent outputs from these models whenever a user queries them. This initiative addresses the challenge associated with generating varying and sometimes unpredictable outcomes in response to the same question, depending on parameters like the model’s state and changes in data inputs.

One solution proposed is the use of context windows, managing the fixed amount of information that a model uses to draw its conclusions. Context windows enable LLMs to maintain consistency by framing each response with the same information available at the time of the question. However, managing context windows isn’t straightforward. Errors can be easily introduced or present, and lessons from previous generations have shown that ensuring perfect synchronization is practically impossible.

The goal of maintaining consistent responses isn’t about ensuring the same output verbatim every time but about ensuring a controlled outcome without unexpected variations, leading users to unreliable outputs with identical inputs.

Routines offer an interesting solution by providing templates through which an LLM can think through problems framed in familiar patterns, reducing the occurrence of random responses or errors, such as hallucination, and enabling the LLM to simplify complex queries by breaking them into more manageable tasks.

Thinking Machines is also aiming for disambiguation of terms, developing protocols to tackle the problems faced by LLMs when terms can have multiple meanings. By accurately identifying the right term from the available context, the model can avoid confusion and maintain consistent responses.

Collaborative conversations shared by multiple users can also offer a solution by establishing a framework for context management and data sharing among users. The approach should result in improved consistency across different versions or iterations of the model, aiding LLMs in maintaining a healthy balance between flexibility and consistency.

Current LLM structures aren’t entirely equipped to maintain consistency while running probabilistic models that are prone to randomness and variety, complicating the issue. Efforts to promote consistency could potentially lead to a loss of creativity and diversity in how these models function.

In conclusion, Thinking Machines approaches this issue by leveraging innovations such as advanced context windows and routing to overcome unpredictability and ensure reliable performance in LLMs. However, achieving consistency without compromising the strengths of LLMs remains a delicate balancing act.

What are your thoughts on this? I’d love to hear about your own experiences in the comments below.