# iPaaS API Retry and Backoff System

## Overview

The iPaaS system implements automatic retry logic with exponential backoff for all outgoing API requests. This provides resilience against transient failures like rate limiting, server errors, and network issues without requiring manual configuration.

All API requests flow through a single, unified path: `ThrottleManager` → which handles both rate limiting and automatic retry logic.

## Architecture

### Request Flow

```
API Request
    ↓
IpaasHelper::executeThrottlerRequest()
    ↓
Connector::getThrottler()
    ↓
ThrottleManager::throttle(callback)
    ↓
├─> [Optional] Wait for throttle slot
├─> Execute request
├─> Check if success (2xx status)
│   ├─> YES → Return response
│   └─> NO → Check if retriable
│       ├─> YES → Wait with backoff → Retry (max 3 times)
│       └─> NO → Return error response
```

### Core Components

**ThrottleManager** (`src/App/Services/Ipaas/ThrottleManager.php`)
- Unified request handler
- Time-based rate limiting (requests per time window)
- Automatic retry logic with exponential backoff
- Throttle slot timeout protection (60 seconds)

**IpaasHelper** (`src/App/Helpers/IpaasHelper.php`)
- Entry point: `executeThrottlerRequest($parameters, $connectorId)`
- Routes all requests through ThrottleManager

## Retry Configuration

### Global Defaults

All retry behavior is **hardcoded** with sensible defaults:

```php
// Defined in ThrottleManager
private const MAX_RETRIES = 3;           // 3 retry attempts (4 total attempts)
private const BASE_DELAY_SECONDS = 1;    // 1 second base delay
```

**No connector-level configuration needed** - retry logic is always active.

### Retry Attempts

- **Initial attempt**: Original request
- **Retry 1**: After ~1 second (with jitter)
- **Retry 2**: After ~2 seconds (with jitter)
- **Retry 3**: After ~4 seconds (with jitter)

**Total**: 4 attempts maximum

## Retriable Conditions

### HTTP Status Codes

The system automatically retries on:

| Status Code | Description | Retry? |
|------------|-------------|--------|
| **429** | Too Many Requests | ✅ YES |
| **500** | Internal Server Error | ✅ YES |
| **502** | Bad Gateway | ✅ YES |
| **503** | Service Unavailable | ✅ YES |
| **504** | Gateway Timeout | ✅ YES |
| **4xx** (except 429) | Client Errors | ❌ NO |
| **2xx** | Success | ❌ NO (returns immediately) |

### Exceptions

The system automatically retries on these exceptions:

- **RedisException**: Redis connection failures
- **Connection errors**: Any exception with "Connection" in message
- **Timeout errors**: Any exception with "timeout" or "timed out" in message
- **cURL errors**: Any exception with "cURL Error" in message

All other exceptions are **not retriable** and thrown immediately.

## Backoff Strategy

### Exponential Backoff with Jitter

```php
delay = baseDelay * (2 ^ attempt)
jitter = delay * 0.25 * (random between -1 and 1)
finalDelay = max(0.1, delay + jitter)  // Minimum 100ms
```

**Example delays**:
- Attempt 0 → Retry 1: ~1.0s ± 0.25s (0.75s - 1.25s)
- Attempt 1 → Retry 2: ~2.0s ± 0.5s (1.5s - 2.5s)
- Attempt 2 → Retry 3: ~4.0s ± 1.0s (3.0s - 5.0s)

**Why jitter?**
- Prevents "thundering herd" when multiple requests retry simultaneously
- Distributes retry timing across a range
- Reduces likelihood of synchronized retry storms

## Throttle Configuration

Throttle limits are configured per connector:

```php
// On Connector model
$connector->throttle_max_requests = 10;    // Max requests
$connector->throttle_per_seconds = 60;     // Per time window (seconds)
```

**Example**: `10 requests per 60 seconds` = maximum 10 requests per minute

### Throttle + Retry Interaction

When throttle limits are configured:

1. **Check throttle slot** before each attempt (including retries)
2. **Wait for slot** if over limit (max 60 seconds)
3. **Execute request**
4. **Increment counter** (applies to time window)
5. **On failure** → Retry logic activates (slot check happens on retry too)

### Throttle Timeout Protection

To prevent infinite waits:

- **Maximum wait**: 60 seconds for a throttle slot
- **On timeout**: Logs warning and proceeds anyway
- **Rationale**: Retry logic will handle actual rate limit errors from API

```php
Log::warning('Throttle wait timeout exceeded', [
    'connector_id' => $connectorId,
    'wait_seconds' => 60,
    'action' => 'proceeding_anyway'
]);
```

## Error Handling

### Retry Exhaustion (Response)

When all retries are exhausted for a retriable response:

```php
// Returns the last error response
return [
    'httpStatusCode' => 429,
    'response' => 'Rate limit exceeded'
];

// Logs warning
Log::warning('API request failed after all retries (response)', [
    'connector_id' => $connectorId,
    'attempts' => 4,
    'response' => $lastResponse
]);
```

### Retry Exhaustion (Exception)

When all retries are exhausted for a retriable exception:

```php
// Throws the original exception
throw new Exception('Connection timeout');

// After logging error
Log::error('API request failed after all retries (exception)', [
    'connector_id' => $connectorId,
    'attempts' => 4,
    'exception' => get_class($exception),
    'message' => $exception->getMessage()
]);
```

### Laravel Queue Retry Layer

Jobs calling the ThrottleManager have their own retry layer:

```php
class ProcessApiRequest extends Job
{
    public $tries = 3;      // Job-level retries
    public $timeout = 300;  // 5 minute timeout
}
```

**Total retry attempts** for a single API call can be:
- 4 attempts (ThrottleManager)
- × 3 job retries (Laravel queue)
- = **12 maximum attempts** before final failure

## Usage

### Basic API Request

```php
use App\Helpers\IpaasHelper;

// Prepare parameters
$parameters = [
    'httpMethod' => 'GET',
    'relativeURL' => '/api/resource',
    'headers' => ['Accept' => 'application/json'],
    'contentType' => 'json',
    'requestBody' => '',
    'requestParams' => []
];

// Execute with automatic retry
$response = IpaasHelper::executeThrottlerRequest($parameters, $connectorId);

// Response format
if (is_array($response) && isset($response['httpStatusCode'])) {
    if ($response['httpStatusCode'] >= 200 && $response['httpStatusCode'] < 300) {
        // Success
        $data = json_decode($response['response'], true);
    } else {
        // Failed after retries
        $error = $response['response'];
    }
}
```

### Within Jobs

```php
use App\Jobs\ProcessApiRequest;

class ProcessApiRequest extends Job
{
    public $tries = 3;
    public $timeout = 300;

    public function handle()
    {
        setupTenantConnection($this->tenantDatabase);

        // This automatically includes retry logic
        $response = IpaasHelper::executeThrottlerRequest($this->parameters, $this->connector);

        // Handle response
        if (is_array($response) && isset($response['error'])) {
            Log::error('API request failed', ['error' => $response['error']]);
            throw new Exception($response['error']); // Triggers job-level retry
        }

        // Process successful response
        $this->processData($response);
    }
}
```

## Logging and Monitoring

### Success (No Retry)

No special logging - request completes normally.

### Retry Attempts

```php
Log::info('Retrying request due to retriable response', [
    'connector_id' => $connectorId,
    'attempt' => 2,              // Current attempt number
    'max_retries' => 3,
    'http_status' => 429
]);
```

```php
Log::info('Retrying request due to retriable exception', [
    'connector_id' => $connectorId,
    'attempt' => 1,
    'max_retries' => 3,
    'exception' => 'Exception',
    'message' => 'Connection timeout'
]);
```

### Retry Exhaustion

```php
Log::warning('API request failed after all retries (response)', [
    'connector_id' => $connectorId,
    'attempts' => 4,
    'response' => ['httpStatusCode' => 429, 'response' => 'Rate limited']
]);
```

### Throttle Timeout

```php
Log::warning('Throttle wait timeout exceeded', [
    'connector_id' => $connectorId,
    'wait_seconds' => 60.5,
    'attempts' => 605,           // 100ms checks for 60 seconds
    'max_wait_seconds' => 60,
    'action' => 'proceeding_anyway'
]);
```

### Debug Logging

For deep debugging, enable debug logging:

```php
Log::debug('Waiting before retry', [
    'connector_id' => $connectorId,
    'attempt' => 1,
    'base_delay' => 2,
    'final_delay' => 2.3         // With jitter applied
]);
```

## Best Practices

### 1. Let the System Handle Retries

Don't implement manual retry logic - the ThrottleManager handles it:

```php
// ❌ DON'T DO THIS
$attempts = 0;
while ($attempts < 3) {
    $response = IpaasHelper::executeThrottlerRequest($params, $connectorId);
    if (success) break;
    $attempts++;
    sleep(2);
}

// ✅ DO THIS
$response = IpaasHelper::executeThrottlerRequest($params, $connectorId);
// Retry logic is automatic
```

### 2. Configure Throttle Limits Appropriately

Match API provider limits:

```php
// NetSuite: ~10 requests per second
$connector->throttle_max_requests = 10;
$connector->throttle_per_seconds = 1;

// Slower API: 100 requests per minute
$connector->throttle_max_requests = 100;
$connector->throttle_per_seconds = 60;
```

### 3. Set Appropriate Job Timeouts

Jobs must allow time for retries + delays:

```php
class ProcessApiRequest extends Job
{
    // Allow time for:
    // - 4 attempts with 7s total delay (1+2+4s)
    // - 60s throttle wait per attempt
    // - 60s buffer for processing
    public $timeout = 300; // 5 minutes
}
```

### 4. Check Response Structure

Always validate response structure:

```php
$response = IpaasHelper::executeThrottlerRequest($params, $connectorId);

// Handle different response types
if (is_array($response)) {
    if (isset($response['httpStatusCode'])) {
        // Structured HTTP response
        $statusCode = $response['httpStatusCode'];
        $body = $response['response'];
    } elseif (isset($response['error'])) {
        // Error response
        $error = $response['error'];
    }
} elseif (is_string($response)) {
    // Plain string response (rare)
    $data = $response;
}
```

### 5. Monitor Retry Rates

High retry rates indicate issues:

```php
// Monitor logs for patterns
grep "Retrying request" storage/logs/laravel.log | wc -l

// Check which connectors retry most
grep "Retrying request" storage/logs/laravel.log | grep -o "connector_id[^,]*" | sort | uniq -c
```

## Performance Characteristics

### Best Case (Success)
- **Attempts**: 1
- **Time**: < 1 second
- **Overhead**: Minimal (throttle check only)

### Worst Case (Exhausted Retries)
- **Attempts**: 4 (1 initial + 3 retries)
- **Time**: ~7-10 seconds (1s + 2s + 4s + request times)
- **With throttle waits**: Up to 240 seconds (60s × 4 attempts)

### Average Case (Recoverable Error)
- **Attempts**: 2-3
- **Time**: 2-6 seconds
- **Success rate**: High (most transient errors resolve quickly)

## Troubleshooting

### Symptom: Requests Always Timeout

**Possible causes**:
1. Throttle wait timeout (60s) + retry delays
2. API actually down/slow

**Solution**:
```bash
# Check for throttle timeouts
grep "Throttle wait timeout exceeded" storage/logs/laravel.log

# If frequent, increase connector limits or reduce request rate
```

### Symptom: Jobs Failing After All Retries

**Possible causes**:
1. API returning persistent errors (not transient)
2. Authentication issues (not retriable)
3. Invalid requests (4xx errors)

**Solution**:
```bash
# Check what errors are occurring
grep "API request failed after all retries" storage/logs/laravel.log

# Look at the actual error responses
# Fix authentication or request parameters
```

### Symptom: Slow Request Processing

**Possible causes**:
1. Many requests hitting retry logic
2. Throttle limits too restrictive

**Solution**:
```bash
# Count retry occurrences
grep "Retrying request" storage/logs/laravel.log | wc -l

# Review connector throttle settings
# Consider increasing limits if API allows
```

## Implementation Details

### Code Location

- **ThrottleManager**: `src/App/Services/Ipaas/ThrottleManager.php`
- **IpaasHelper**: `src/App/Helpers/IpaasHelper.php`
- **Connector Model**: `src/Domain/Ipaas/Connectors/Models/Connector.php`
- **Tests**: `tests/Unit/Services/Ipaas/ThrottleManagerTest.php`

### Key Methods

```php
// Entry point
IpaasHelper::executeThrottlerRequest(&$parameters, $connectorId): mixed

// Throttle with retry
ThrottleManager::throttle(callable $callback): mixed

// Internal retry logic
ThrottleManager::executeWithRetry(callable $callback): mixed

// Retry conditions
ThrottleManager::isRetriableResponse($response): bool
ThrottleManager::isRetriableException(\Exception $e): bool

// Backoff calculation
ThrottleManager::waitBeforeRetry(int $attempt): void
```

### Test Coverage

**18 tests, 46 assertions, 100% passing**

Test scenarios include:
- Successful requests (no retry)
- Retry on 429, 5xx, network errors
- Retry exhaustion (response and exception)
- Non-retriable errors (4xx except 429)
- Throttle slot waiting and timeout
- Edge cases (null, string responses)
- Integration of throttle + retry

## Migration Notes

### From Old System

If migrating from the deprecated slot-based concurrency system:

1. **Remove config fields**: `concurrency_limit`, `max_retries`, `retry_backoff_strategy`, etc. are no longer used
2. **No code changes needed**: Existing code continues to work
3. **Improved reliability**: No more stuck jobs from slot reservation issues

### Database Cleanup

Run the migration to remove deprecated fields:

```bash
php artisan migrate --path=database/migrations/tenants/2025_10_30_101453_remove_concurrency_config_fields.php
```

## Related Documentation

- [iPaaS Overview](../ipass_overview.md)
- [Connector Configuration](./connectors.md)
- [Job Processing](./jobs.md)

