Back to BlogJanuary 15, 2025

How should AI agents remember things?

Everyone's had frustrating experiences talking to automated customer service systems. You ask a simple question, and the AI has to repeat it back to you to make sure it got it right.

By OpenCall Team

assets_task_01jrp13jx0e22ssf1ndqj6et8j_img_1.webp

Everyone's had frustrating experiences talking to automated customer service systems.

You ask a simple question, and the AI has to repeat it back to you to make sure it got it right. Each piece of context - your name, address, order number - feels like a hurdle that has to be jumped before you can get to your actual problem. Sometimes, you end up with contradictory answers that don't align. The reason behind these annoyances lies in how these systems deal with complex requests: they have to choose between speed and sophistication.

If an AI tries to handle all corner cases in its initial programming, it becomes rigid and struggles to adapt when confronted with unusual queries. On the other hand, generating a response from scratch each time a customer asks something leads to painful delays that feel like an eternity. This challenge is especially acute over the phone, where users expect immediate answers that demonstrate a clear understanding of their particular situations.

Working with Cerebras, we've made significant progress in resolving this dilemma. The key insight is that not every interaction requires re-running the same extensive reasoning path. With Cerebras' inference architecture, our system can pull in relevant information in near real-time, using a chain of reasoning that is highly dependent on the specifics of the user's query. So when someone shares their address, the AI agent can instantly check it against a map, see if it falls within the service area, and link it to their internal account records. All this happens before the system even responds, keeping the conversation flowing smoothly.

Another major upside of this approach is moving away from clunky, monolithic prompts that try to cram every possible piece of data into a single query. Those quickly become hard to maintain and prone to errors. By contrast, our guided reasoning system breaks complex tasks into manageable chunks - checking a specific fact, comparing a new input to the available options, and so on. This modular structure is easier to fine-tune and extend over time.

All these efficiencies add up to make conversations feel significantly more streamlined. Picture a chat about scheduling an appointment. The old way, this might take ten back-and-forth messages as the AI clarifies dates, times, and details. With our guided reasoning approach, the agent can weave the user's preferences and constraints into a single cohesive recommendation, cutting response times by orders of magnitude. The end user sees a snappy, seamless interaction - no robot-like pauses or repetitive questions. Crucially, this improvement isn't really about giving the AI more knowledge, but about giving it the flexibility to focus that knowledge quickly and on-target.

Every improvement we make has real-world implications for the way companies interact with customers. By excising the tedium from customer-service interactions, we can save time for both agents and users. These streamlined conversations translate to better user satisfaction, reduced customer attrition, and fewer incoming queries that need to be escalated to human responders. The upshot of a more adaptive, guided approach is an AI that feels like an informed local expert rather than a rote question-answers bot. You walk away from the chat knowing you were understood and that the agent maximized the chance of resolving your request the first time around - even if it required some in-the-moment research and cross-checking. The typical customer will never notice the gears turning in the background, but they'll appreciate the difference in the front end experience.

Thanks for reading!

More Articles