# Error Handling

Agent provides a hierarchy of typed exceptions for handling errors gracefully across all providers.

## Error Hierarchy

```
AgentError (base)
├── AuthenticationError
├── ProviderError
│   └── RateLimitError
├── TimeoutError
├── ToolExecutionError
├── SchemaValidationError
├── UnsupportedFeatureError
└── RoutingError
```

## Basic Error Handling

```python
from agent import Agent
from agent.errors import (
    AgentError,
    AuthenticationError,
    ProviderError,
    RateLimitError,
    TimeoutError,
    ToolExecutionError,
    SchemaValidationError,
    UnsupportedFeatureError,
    RoutingError,
)

agent = Agent(provider="openai", model="gpt-4o")

try:
    response = agent.run("Hello!")
except AuthenticationError as e:
    print(f"Invalid API key: {e}")
except RateLimitError as e:
    print(f"Rate limited. Retry after: {e.retry_after}s")
except TimeoutError as e:
    print(f"Request timed out after {e.timeout}s")
except ProviderError as e:
    print(f"Provider error ({e.status_code}): {e}")
except AgentError as e:
    print(f"Agent error: {e}")
```

## Error Types

### AgentError

Base exception for all Agent errors.

```python
class AgentError(Exception):
    message: str   # Error message
    raw: Any       # Original exception (if any)
```

### AuthenticationError

Raised when API authentication fails.

```python
try:
    agent = Agent(provider="openai", model="gpt-4o", api_key="invalid")
    agent.run("Hello")
except AuthenticationError as e:
    print(f"Auth failed: {e.message}")
    # Handle: check API key, refresh credentials
```

### ProviderError

Raised when the provider returns an error.

```python
class ProviderError(AgentError):
    provider: str | None    # Provider name
    status_code: int | None # HTTP status code

try:
    response = agent.run("Hello")
except ProviderError as e:
    print(f"Provider: {e.provider}")
    print(f"Status: {e.status_code}")
    print(f"Message: {e.message}")
    
    if e.status_code and 500 <= e.status_code < 600:
        print("Server error - try again later")
    elif e.status_code and 400 <= e.status_code < 500:
        print("Client error - check request")
```

### RateLimitError

Raised when rate limited by the provider.

```python
class RateLimitError(ProviderError):
    retry_after: float | None  # Seconds to wait

try:
    response = agent.run("Hello")
except RateLimitError as e:
    if e.retry_after:
        print(f"Waiting {e.retry_after}s...")
        time.sleep(e.retry_after)
        # Retry
    else:
        print("Rate limited, using exponential backoff")
        time.sleep(60)
```

### TimeoutError

Raised when a request times out.

```python
class TimeoutError(AgentError):
    timeout: float | None  # Configured timeout

try:
    response = agent.run("Complex query")
except TimeoutError as e:
    print(f"Timed out after {e.timeout}s")
    # Consider: increase timeout, simplify query
```

### ToolExecutionError

Raised when a tool fails to execute.

```python
class ToolExecutionError(AgentError):
    tool_name: str | None  # Name of failed tool

@tool
def risky_tool(data: str) -> str:
    raise ValueError("Something went wrong")

try:
    response = agent.run("Use the risky tool")
except ToolExecutionError as e:
    print(f"Tool '{e.tool_name}' failed: {e.message}")
```

### SchemaValidationError

Raised when structured output fails validation.

```python
class SchemaValidationError(AgentError):
    schema: Any  # The schema that failed
    output: Any  # The invalid output

try:
    response = agent.json("Get data", schema=MyModel)
except SchemaValidationError as e:
    print(f"Invalid output: {e.output}")
    print(f"Schema: {e.schema}")
    # Consider: retry, use different model, adjust schema
```

### UnsupportedFeatureError

Raised when a feature is not supported by the provider.

```python
class UnsupportedFeatureError(AgentError):
    feature: str | None   # Unsupported feature
    provider: str | None  # Provider name

agent = Agent(provider="deepseek", model="deepseek-chat")

try:
    response = agent.run(messages=[image_message])  # DeepSeek doesn't support vision
except UnsupportedFeatureError as e:
    print(f"Provider '{e.provider}' doesn't support '{e.feature}'")
    # Fall back to a different provider
```

### RoutingError

Raised when routing fails across all configured agents.

```python
class RoutingError(AgentError):
    errors: list[Exception]  # Errors from each agent

try:
    response = router.run("Hello")
except RoutingError as e:
    print(f"All {len(e.errors)} agents failed:")
    for i, error in enumerate(e.errors):
        print(f"  Agent {i}: {error}")
```

## Retry Strategies

### Automatic Retries

Agent automatically retries on transient errors:

```python
agent = Agent(
    provider="openai",
    model="gpt-4o",
    max_retries=3,  # Retry up to 3 times
)

# Automatically retries on:
# - Rate limit errors (with backoff)
# - 5xx server errors
# - Connection timeouts
```

### Custom Retry Logic

```python
import time
from agent.errors import RateLimitError, ProviderError

def run_with_retry(agent, prompt, max_attempts=3):
    for attempt in range(max_attempts):
        try:
            return agent.run(prompt)
        except RateLimitError as e:
            if attempt < max_attempts - 1:
                wait = e.retry_after or (2 ** attempt)
                print(f"Rate limited, waiting {wait}s...")
                time.sleep(wait)
            else:
                raise
        except ProviderError as e:
            if e.status_code and e.status_code >= 500:
                if attempt < max_attempts - 1:
                    time.sleep(2 ** attempt)
                else:
                    raise
            else:
                raise  # Don't retry client errors
```

## Error Handling Patterns

### Graceful Degradation

```python
def get_response(prompt: str) -> str:
    """Get response with graceful degradation."""
    
    # Try primary agent
    try:
        return primary_agent.run(prompt).text
    except AgentError:
        pass
    
    # Try backup agent
    try:
        return backup_agent.run(prompt).text
    except AgentError:
        pass
    
    # Return fallback
    return "I'm sorry, I'm having trouble responding right now."
```

### Circuit Breaker

```python
class CircuitBreaker:
    def __init__(self, failure_threshold=5, reset_timeout=60):
        self.failures = 0
        self.threshold = failure_threshold
        self.reset_timeout = reset_timeout
        self.last_failure = None
        self.state = "closed"  # closed, open, half-open
    
    def call(self, func, *args, **kwargs):
        if self.state == "open":
            if time.time() - self.last_failure > self.reset_timeout:
                self.state = "half-open"
            else:
                raise CircuitOpenError("Circuit breaker is open")
        
        try:
            result = func(*args, **kwargs)
            if self.state == "half-open":
                self.state = "closed"
                self.failures = 0
            return result
        except AgentError as e:
            self.failures += 1
            self.last_failure = time.time()
            if self.failures >= self.threshold:
                self.state = "open"
            raise

breaker = CircuitBreaker()

try:
    response = breaker.call(agent.run, "Hello")
except CircuitOpenError:
    # Use fallback
    pass
```

### Logging Errors

```python
import logging

logger = logging.getLogger(__name__)

class ErrorLoggingMiddleware(Middleware):
    def on_error(self, request, error):
        logger.error(
            "Agent error",
            extra={
                "error_type": type(error).__name__,
                "error_message": str(error),
                "provider": getattr(error, "provider", None),
                "status_code": getattr(error, "status_code", None),
                "input_preview": (request.input or "")[:100],
            }
        )
        return error

agent = Agent(
    provider="openai",
    model="gpt-4o",
    middleware=[ErrorLoggingMiddleware()],
)
```

## Async Error Handling

```python
import asyncio

async def main():
    agent = Agent(provider="openai", model="gpt-4o")
    
    try:
        response = await agent.run_async("Hello")
    except AuthenticationError:
        print("Check your API key")
    except RateLimitError as e:
        await asyncio.sleep(e.retry_after or 60)
        response = await agent.run_async("Hello")  # Retry
    except AgentError as e:
        print(f"Error: {e}")
```

## Streaming Error Handling

```python
try:
    for event in agent.stream("Hello"):
        if event.type == "error":
            print(f"Stream error: {event.error}")
            break
        if event.type == "text_delta":
            print(event.text, end="")
except AgentError as e:
    print(f"Connection error: {e}")
```

## Best Practices

### 1. Catch Specific Exceptions

```python
# Good - handle specific cases
try:
    response = agent.run(prompt)
except RateLimitError:
    handle_rate_limit()
except AuthenticationError:
    handle_auth_error()
except AgentError:
    handle_generic_error()

# Bad - catch everything
try:
    response = agent.run(prompt)
except Exception:
    pass  # Unknown what went wrong
```

### 2. Include Context in Logs

```python
except AgentError as e:
    logger.error(
        f"Failed to process: {e}",
        extra={
            "prompt_length": len(prompt),
            "provider": agent.provider,
            "model": agent.model,
        }
    )
```

### 3. Don't Expose Raw Errors to Users

```python
# Good - user-friendly message
except AgentError:
    return "I'm having trouble right now. Please try again."

# Bad - expose internal details
except AgentError as e:
    return f"OpenAI API error: {e.raw}"
```

## Next Steps

- [Middleware](middleware.md) - Error handling in middleware
- [Routing](routing.md) - Handle routing failures
- [Configuration](configuration.md) - Configure retry behavior