Why agents fail

#blog

Author: High Dimensional Research

Why Agents Fail

AI-based agents currently struggle with reliability. As the article notes, "they are really bad at doing things. They can't reliably do even basic addition and multiplication."

The Core Problem: Probabilistic Compounding

The fundamental issue stems from probability mathematics. When an agent must succeed at multiple sequential steps, the overall success rate compounds exponentially.

The Formula:

P(n) = p^n

Where p equals the success rate per step and n equals the number of steps.

Example: A coin flip landing heads five consecutive times has only a 3.125% success rate.

Real-World Impact

An agent performing at 90% accuracy per step would achieve:

59.05% success on 5-step tasks
34.87% success on 10-step tasks
72.90% success on 3-click web tasks

The article emphasizes: "Software that only works 72.90%, 95%, or even 99% of the time is bad software."

Production systems require "eleven nines" of reliability (AWS S3 standard).

When to (Not) Be Agentic

HDR proposes reducing probabilistic action space by combining AI models for reasoning with predetermined, verified actions. The hotel booking example demonstrates this—while websites differ, their structural flow is similar.

Solution components:

Collective Memory Index (search over web trajectories)
Accessibility tools for page structure understanding
Model-assisted reasoning for state-specific decisions

Resources mentioned:

hdr.is/memory
Nolita (web automation framework)