Skip to main content

Command Palette

Search for a command to run...

AI Agents Are Entering Their Architecture Era

Why the most important shift in agent systems is happening below the prompt layer

Published
8 min read
AI Agents Are Entering Their Architecture Era
L
I am an engineer and a developer advocate who is excited about building the future with AI Agents.

AI agent discourse has changed fast.

A few months ago, most of the energy was around demos: autonomous browser flows, coding copilots chaining tools, and shiny benchmark clips that made everything feel one prompt away from automation. Now the interesting conversations are more grounded. People are asking the harder questions: how do you keep state across turns, where should tools execute, what happens when an agent needs to recover from failure, and what kind of protocol actually makes multi-step systems composable instead of brittle?

That shift matters. It usually means a space is growing up.

In this post, I want to unpack what the current wave of AI agent discussion is really pointing toward, and why the next phase is less about “smarter prompts” and more about architecture.

The conversation is moving from demos to systems

The biggest signal right now is that people are getting less impressed by isolated agent tricks.

A model calling a weather API is not a product. A coding assistant editing one file is not a software engineer. A browser agent completing a happy-path task is not the same thing as a reliable business workflow.

That doesn't mean the progress isn't real. It is. But once teams try to put these systems in front of users, the center of gravity changes.

The hard parts start to look familiar:

  • state management across long-lived interactions
  • tool authentication and execution boundaries
  • retries, timeouts, and recovery behavior
  • traceability and debugging
  • contracts between components
  • session restoration and memory handling

In other words, the actual work starts looking a lot like distributed systems and application platform engineering.

That's not bad news. It's a sign we're finally talking about the right layer.

"Agent" is still an overloaded word

One reason the discourse gets messy is that people use the word agent to describe very different things.

Sometimes they mean:

  • a single LLM call with tool access
  • a prompt chain with branching logic
  • a workflow with retries and guardrails
  • a long-lived stateful assistant
  • a multi-agent system with handoffs between specialized workers

Those are not the same category of system.

When you collapse them into one label, you get a lot of confusion. Teams think they're building autonomous systems when they're really building orchestrated workflows. Vendors market “agents” when the implementation is closer to tool-augmented chat. And engineers end up arguing past each other because they are solving different problems.

A more useful framing is to ask a few concrete questions:

  • Does the system keep meaningful state across turns?
  • Can it invoke tools safely and recover from failure?
  • Is there explicit orchestration, or is everything buried in prompts?
  • Can parts of the system be reused or composed with other parts?
  • Is there a clear contract between planning, execution, and handoff?

That framing tells you much more than the word agent ever will.

The orchestration layer is where the real complexity lives

This is the part that gets underestimated over and over again.

People obsess over model quality, but in production systems the model call is often the easy part. The engineering time gets burned on everything around it:

  • routing requests into the right execution path
  • maintaining session state
  • handling tool requests and continuations
  • streaming partial output to clients
  • persisting chat history for restore flows
  • enforcing auth boundaries
  • watching for stuck or failed steps

If you've built anything beyond a toy demo, you've probably felt this already.

This is also why thin abstractions often disappoint. A wrapper that makes the first 20 minutes feel magical can become a liability once you need visibility into execution, explicit control over state, or reliable handling of edge cases.

The useful abstractions are the ones that own orchestration cleanly without hiding the system from you.

Stateful sessions change what agents can actually do

A lot of “agent” architecture still assumes a mostly stateless model: send the conversation, get a response, repeat.

That works for basic chat. It breaks down fast once you need continuity.

Stateful sessions are what let systems feel coherent over time. They give you a place to accumulate context, store tool outputs, track resources, and resume work without reconstructing everything from scratch on every turn.

In Octavus, sessions are first-class. They store conversation history, track resources and variables, and enable stateful interactions across multiple messages. That sounds simple, but it matters a lot once you move past single-turn use cases.

Here's what session creation looks like with the Server SDK:

import { OctavusClient } from '@octavus/server-sdk';

const client = new OctavusClient({
  baseUrl: process.env.OCTAVUS_API_URL!,
  apiKey: process.env.OCTAVUS_API_KEY!,
});

const sessionId = await client.agentSessions.create('support-chat', {
  COMPANY_NAME: 'Acme Corp',
  PRODUCT_NAME: 'Widget Pro',
  USER_ID: 'user-123',
});

Once the session exists, you attach handlers and execute requests against it:

const session = client.agentSessions.attach(sessionId, {
  tools: {
    'get-user-account': async (args) => {
      const userId = args.userId as string;
      return await db.users.findById(userId);
    },
  },
});

const events = session.execute({
  type: 'trigger',
  triggerName: 'user-message',
  input: { USER_MESSAGE: 'Help me understand my subscription' },
});

That session-centric model is a lot closer to how real applications work. It gives you a durable execution context instead of forcing you to rebuild everything around stateless calls.

Tool execution should stay on your infrastructure

This is another topic the market is starting to take more seriously.

The easiest demos run tools “somewhere in the cloud” and abstract away the details. The problem is that real tools usually touch your database, your APIs, your auth model, and your audit boundaries.

That makes tool execution an infrastructure concern, not just a model feature.

Octavus gets this right by treating tools as something that can run on the developer's side. In the Server SDK, when a tool handler exists, it executes on your server and continues automatically. If no handler exists, the call can be forwarded to the client.

The distinction matters:

  • sensitive operations stay inside your environment
  • auth context stays aligned with your existing application
  • you can enforce logging, validation, and timeouts directly
  • you avoid sending critical data into someone else's execution layer

A basic server-side tool looks like this:

const session = client.agentSessions.attach(sessionId, {
  tools: {
    'create-support-ticket': async (args) => {
      const summary = args.summary as string;
      const priority = args.priority as string;

      const ticket = await ticketService.create({
        summary,
        priority,
        source: 'ai-chat',
      });

      return {
        ticketId: ticket.id,
        estimatedResponse: getEstimatedResponse(priority),
      };
    },
  },
});

That pattern is much easier to reason about than outsourcing execution to a provider-owned runtime with unclear auth and data boundaries.

The protocol question is becoming impossible to ignore

As soon as you have multiple steps, multiple actors, or multiple tools, you're in protocol design territory whether you admit it or not.

You need to decide:

  • what messages exist in the system
  • how execution state is represented
  • how tool requests are paused and resumed
  • how errors are surfaced
  • how one component hands work to another
  • how clients observe progress in real time

If those contracts live only inside imperative glue code and giant prompts, the system gets fragile fast.

This is why the protocol layer matters so much. It separates the what from the how. It makes behavior inspectable. It makes orchestration easier to reason about. And it gives you a path toward composition instead of endless one-off flows.

I think the agent ecosystem is still early here. We're closer to the RPC era than the REST era. A lot of integrations are custom, tightly coupled, and awkward to compose. Over time, the winners will be the systems that make these contracts explicit.

The real opportunity is composability, not just autonomy

A lot of marketing still frames the future as one super-agent doing everything.

Maybe that happens in a few narrow cases. In practice, the more interesting pattern is composition.

Think:

  • a planner agent that decides the workflow
  • a retrieval agent that gathers evidence
  • a coding or analysis agent that performs a specialized task
  • a reviewer agent that validates outputs
  • a UI-facing agent that turns results into something useful for the user

That kind of architecture is more modular, easier to debug, and more realistic for production teams.

It also mirrors how software systems usually mature. We move from monoliths of logic toward clearer boundaries, reusable components, and explicit contracts.

The hard part isn't imagining multiple agents. It's designing the handoffs well.

What teams should focus on next

If you're building in this space right now, I'd spend less time chasing the most theatrical demos and more time tightening the foundations.

A practical checklist:

  1. Make session state explicit

    • Know what should persist
    • Decide how you'll restore expired or interrupted work
    • Keep UI-ready history if you need reliable resume flows
  2. Treat tool calls like production integrations

    • Validate arguments
    • Set timeouts
    • log inputs and outputs
    • keep execution inside your infrastructure when possible
  3. Separate orchestration from prompt content

    • Don't bury system behavior inside string literals scattered through app code
    • Make prompts versionable and editable independently from control flow
  4. Design contracts for continuation and recovery

    • Decide how failures are represented
    • Define how partial work is resumed
    • Make retries intentional instead of accidental
  5. Optimize for observability

    • You need to inspect state, events, and tool behavior
    • If you can't see what's happening, you can't trust the system

The next phase of agents will look more boring — and that's good

The flashy part of the cycle is useful because it gets people experimenting. But the sustainable part is always a little less glamorous.

The next wave of agent systems will win on things like:

  • clear execution contracts
  • stateful session handling
  • durable tool integrations
  • composable architecture
  • operational visibility
  • sane developer experience

That's not as cinematic as “watch this agent build a startup in 30 seconds.”

But it's a much better foundation for software people can actually ship.

Conclusion

The strongest signal in AI agent discourse right now is not that autonomy is solved. It's that engineers are starting to care about the right problems.

We're moving from prompt theater to system design.

And honestly, that's where the interesting work begins.

If you're building agent products, the edge won't come from calling something an agent. It'll come from how well you handle sessions, orchestration, tool execution, and composition when the happy path ends.

That's the difference between a compelling demo and a system that survives production.