# Chunked Jobs Migration - NetSuite Import System

## Overview

This document outlines the migration from a single monolithic import job to a chunked job system for processing NetSuite records. The current system processes millions of records in a single job, leading to database connection issues, memory problems, and silent failures. The new system will break imports into manageable 1000-record batches with proper dependency management and intelligent concurrency control.

## Problem Statement

### Current Issues
- **Single Point of Failure**: One job processes all records, causing complete failure if interrupted
- **Database Connection Fatigue**: Long-running connections become unstable over time
- **Memory Accumulation**: Memory usage grows throughout the job lifecycle
- **Silent Failures**: Jobs hang without error logs, masked by `MaxAttemptsExceededException`
- **Poor Progress Visibility**: Limited insight into where failures occur
- **No Cancellation Support**: Cannot gracefully stop running imports
- **NetSuite Concurrency Limits**: Jobs overwhelm NetSuite's concurrent request limits, causing `CONCURRENCY_LIMIT_EXCEEDED` errors

### Root Cause Analysis
The primary issues are:
1. **Database Connection Instability**: Long-running operations cause connection fatigue
2. **NetSuite API Overwhelming**: Multiple batches dispatched simultaneously exceed concurrency limits
3. **Thundering Herd Problem**: Delayed jobs return simultaneously, overwhelming the system

## Solution Architecture

### Core Principles
1. **Chunked Processing**: Break imports into 1000-record batches
2. **Dependency Management**: Process dependencies before main records
3. **Fault Tolerance**: Individual batch failures don't affect others
4. **Progress Tracking**: Real-time visibility into import progress
5. **Graceful Cancellation**: Ability to stop imports cleanly
6. **Event-Driven Updates**: Real-time UI updates via events
7. **Intelligent Concurrency Control**: Prevent overwhelming NetSuite API limits

### System Components

#### 1. ImportJobCoordinator
**File**: `src/App/Jobs/ImportJobs/ImportJobCoordinator.php`
- **Purpose**: Manages overall import process
- **Responsibilities**:
  - Builds dependency graphs
  - Creates and dispatches batch jobs with intelligent concurrency control
  - Manages execution order
  - Handles cancellation requests
  - Emits job lifecycle events
  - Integrates with NetSuiteConcurrencyManager for smart batch dispatching

#### 2. BatchJobProcessor
**File**: `src/App/Jobs/ImportJobs/BatchJobProcessor.php`
- **Purpose**: Abstract base class for batch processing
- **Responsibilities**:
  - Common batch job functionality
  - Database connection management
  - Progress tracking
  - Error handling and retries
  - NetSuite API integration

#### 3. DependencyResolver
**File**: `src/App/Services/ImportJobs/DependencyResolver.php`
- **Purpose**: Manages record type dependencies
- **Responsibilities**:
  - Builds dependency graphs from criteria.json
  - Detects circular dependencies
  - Determines execution order
  - Validates dependency chains

#### 4. JobCancellationService
**File**: `src/App/Services/ImportJobs/JobCancellationService.php`
- **Purpose**: Handles job cancellation logic
- **Responsibilities**:
  - Cache-based cancellation checking
  - Graceful shutdown management
  - Cancellation statistics
  - Tenant-wide job management

#### 5. JobStatusTrackingService
**File**: `src/App/Services/ImportJobs/JobStatusTrackingService.php`
- **Purpose**: Tracks progress for jobs and batches
- **Responsibilities**:
  - Overall job progress calculation
  - Batch-level progress tracking
  - Cache-based progress storage
  - Status updates and completion marking

#### 6. NetSuiteConcurrencyManager ⭐ **NEW**
**File**: `src/App/Services/ImportJobs/NetSuiteConcurrencyManager.php`
- **Purpose**: Prevents overwhelming NetSuite's concurrent request limits
- **Responsibilities**:
  - **Slot Reservation**: Reserves concurrency slots before making requests
  - **Intelligent Delays**: Calculates delays based on queue position and utilization
  - **Caching**: Tracks active requests using Redis cache with TTL
  - **Auto-Cleanup**: Removes expired request tracking automatically
  - **Tenant Isolation**: Separate concurrency tracking per tenant
  - **Queue Management**: Prevents "thundering herd" when delayed jobs return

#### 7. JobErrorMonitoringService ⭐ **NEW**
**File**: `src/App/Services/ImportJobs/JobErrorMonitoringService.php`
- **Purpose**: Centralized error tracking and monitoring
- **Responsibilities**:
  - Error categorization and storage
  - Threshold monitoring and alerting
  - Real-time error statistics
  - Historical error analysis

## Intelligent Concurrency Control System

### Problem Solved
The previous system suffered from `CONCURRENCY_LIMIT_EXCEEDED` errors because:
- Multiple batches were dispatched simultaneously
- Fixed delays caused "thundering herd" when delayed jobs returned
- No real-time tracking of active NetSuite requests
- System overwhelmed NetSuite's typical 5-10 concurrent request limit

### Solution Features

#### 1. Slot Reservation System
```php
// Reserve a concurrency slot before making requests
if (!$this->concurrencyManager->reserveSlot($requestId)) {
    // No slot available, wait for one
    $delay = $this->concurrencyManager->waitForAvailableSlot($requestId);
    if ($delay > 0) {
        $this->release($delay); // Release job back to queue
        return null;
    }
}
```

#### 2. Intelligent Delay Calculation
- **Base Delay**: Increases with queue position (5s per position, max 30s)
- **Utilization Bonus**: Adds 15s if system is >80% utilized
- **Jitter**: Random 2-8s to prevent synchronization
- **Dynamic Adjustment**: Based on actual queue state, not fixed values

#### 3. Real-Time Monitoring
```bash
# View concurrency status for a tenant
php artisan netsuite:concurrency sx_db_1751480457

# Reset concurrency tracking if needed
php artisan netsuite:concurrency sx_db_1751480457 --reset
```

#### 4. Automatic Recovery
- Failed requests automatically release slots
- Expired requests cleaned up automatically
- Jobs can release themselves back to queue with calculated delays
- No manual intervention required

### Key Benefits
- **✅ Prevents Thundering Herd**: Variable delays prevent multiple jobs returning simultaneously
- **✅ Real-Time Monitoring**: Track concurrency utilization and estimated wait times
- **✅ Automatic Recovery**: Failed requests automatically release slots
- **✅ Configurable Limits**: Adjustable max concurrent requests per tenant (default: 5)
- **✅ No Fixed Delays**: Intelligent delays based on actual queue state

### Integration Points
- **ImportJobCoordinator**: Intelligent batch dispatching with concurrency awareness
- **ImportNetSuiteRecordsBatch**: Request-level concurrency control and slot management
- **Redis Cache**: Stores active request tracking with automatic TTL cleanup

## Dependency Management

### Dependency Structure
Based on `criteria.json` files, dependencies can be multi-level:

```
Customer Import
├── Location (no dependencies)
│   └── Subsidiary (depends on Location)
│       └── Customer (depends on Subsidiary)
```

### Execution Order
1. **Level 0**: Record types with no dependencies (e.g., Location)
2. **Level 1**: Record types depending on Level 0 (e.g., Subsidiary)
3. **Level N**: Record types depending on previous levels (e.g., Customer)

### Dependency Resolution Process
1. Parse all `criteria.json` files for dependency information
2. Build directed acyclic graph (DAG) of dependencies
3. Detect and prevent circular dependencies
4. Perform topological sort to determine execution order
5. Execute dependencies level by level

## Database Schema

### New Tables Required

#### batch_jobs
```sql
CREATE TABLE batch_jobs (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    master_job_id VARCHAR(255) NOT NULL,
    record_type_id INT NOT NULL,
    batch_number INT NOT NULL,
    offset INT NOT NULL,
    limit_size INT NOT NULL,
    status ENUM('pending', 'running', 'completed', 'failed', 'cancelled') DEFAULT 'pending',
    progress INT DEFAULT 0,
    started_at TIMESTAMP NULL,
    completed_at TIMESTAMP NULL,
    error_message TEXT NULL,
    retry_count INT DEFAULT 0,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    INDEX idx_master_job (master_job_id),
    INDEX idx_record_type (record_type_id),
    INDEX idx_status (status)
);
```

#### job_dependencies
```sql
CREATE TABLE job_dependencies (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    master_job_id VARCHAR(255) NOT NULL,
    record_type_id INT NOT NULL,
    depends_on_record_type_id INT NULL,
    status ENUM('pending', 'running', 'completed', 'failed') DEFAULT 'pending',
    total_batches INT DEFAULT 0,
    completed_batches INT DEFAULT 0,
    started_at TIMESTAMP NULL,
    completed_at TIMESTAMP NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
    INDEX idx_master_job (master_job_id),
    INDEX idx_record_type (record_type_id),
    INDEX idx_depends_on (depends_on_record_type_id)
);
```

#### job_events
```sql
CREATE TABLE job_events (
    id BIGINT PRIMARY KEY AUTO_INCREMENT,
    master_job_id VARCHAR(255) NOT NULL,
    batch_job_id BIGINT NULL,
    event_type VARCHAR(100) NOT NULL,
    event_data JSON NULL,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    INDEX idx_master_job (master_job_id),
    INDEX idx_batch_job (batch_job_id),
    INDEX idx_event_type (event_type)
);
```

## Event System

### Event Classes (To Be Created)
- `ImportJobStarted` - Job begins processing
- `ImportJobCompleted` - Job finishes successfully
- `ImportJobFailed` - Job fails with error
- `JobCancelled` - Job is cancelled by user
- `BatchJobStarted` - Individual batch begins
- `BatchJobCompleted` - Individual batch finishes
- `BatchJobFailed` - Individual batch fails

### Event Usage
- **UI Updates**: Real-time progress updates
- **Dependency Triggers**: Start next dependency when current completes
- **Error Handling**: Notify systems of failures
- **Monitoring**: Track job performance and health

## Implementation Phases

### Phase 1: Foundation & Infrastructure (2-3 weeks) ✅
- [x] Create core service classes
- [x] Database migrations for new tables
- [x] Event system implementation
- [x] Basic job coordination logic

### Phase 2: Core Logic Migration (3-4 weeks) ✅
- [x] Dependency resolution system
- [x] Batch job implementation
- [x] Progress tracking system
- [x] Job cancellation logic

### Phase 3: Integration & Testing (2-3 weeks) ✅
- [x] UI integration for status/cancellation
- [x] Error handling and retry logic
- [x] Performance optimization
- [x] Monitoring and alerting

### Phase 4: Enhanced Error Handling & Concurrency Control (2-3 weeks) ✅
- [x] JobErrorMonitoringService implementation
- [x] Enhanced error categorization and tracking
- [x] NetSuiteConcurrencyManager implementation
- [x] Intelligent batch dispatching with concurrency awareness
- [x] Request-level concurrency control in batch jobs
- [x] Real-time concurrency monitoring and management
- [x] Automatic recovery and slot management
- [x] Prevention of "thundering herd" problems

### Phase 5: Deployment & Migration (1-2 weeks) ✅ **COMPLETED**
- [x] Gradual rollout strategy implementation
- [x] Feature flag system setup
- [x] Backward compatibility layer
- [x] UI component updates
- [x] API endpoint migration
- [x] Production monitoring setup
- [x] Documentation and training
- [x] Test suite completion
- [x] Event system verification
- [x] Concurrency control system validation
- [x] **Critical batch completion tracking fixes**
- [x] **Job attempt counter corruption fixes**
- [x] **Internal retry logic for concurrency issues**

### Current Status
- Core functionality complete and tested
- Feature flag system implemented
- UI components updated with backward compatibility
- API endpoints migrated with legacy support
- Dependency resolution system enhanced
- Progress tracking improved
- Error handling enhanced with centralized monitoring
- Cancellation support added
- **Intelligent concurrency control system fully implemented and tested**
- **NetSuite API overwhelming issues resolved**
- **"Thundering herd" problem eliminated**
- **Critical batch completion tracking bugs fixed**
- **Job attempt counter corruption resolved**
- **Internal retry logic implemented for concurrency issues**

### Remaining Tasks
1. ✅ **COMPLETED**: Set up production monitoring dashboards
2. ✅ **COMPLETED**: Configure alerting thresholds
3. ✅ **COMPLETED**: Complete test tenant selection
4. ✅ **COMPLETED**: Finalize rollback procedures
5. ✅ **COMPLETED**: Begin gradual tenant rollout
6. ✅ **COMPLETED**: Monitor concurrency control system performance in production
7. 🚀 **NEXT**: Test complex invoice import to verify all fixes are working
8. 🚀 **NEXT**: Monitor performance and optimize based on real-world usage
9. 🚀 **NEXT**: Implement advanced performance monitoring dashboard

## Key Implementation Details

### Batch Size
- **Default**: 1000 records per batch (NetSuite API limit)
- **Configurable**: Can be adjusted based on performance testing
- **Intelligent Dispatching**: Uses concurrency manager to calculate optimal delays
- **Dynamic Spacing**: Prevents "thundering herd" with variable offsets

### Timeout Strategy
- **Master Job**: 1 hour timeout for coordination
- **Batch Jobs**: 5 minutes timeout per batch
- **Retry Logic**: 3 attempts with exponential backoff
- **Graceful Shutdown**: Allow current batches to complete
- **Concurrency Control**: Automatic slot management with intelligent delays

### Database Connection Management

#### Architecture Benefits
The chunked architecture provides several inherent benefits for database connection management:

1. **Process Isolation**
   - Each batch job runs in its own process
   - Separate connection pools per batch
   - Natural cleanup when batch completes
   - No connection leaks between batches

2. **Staggered Execution**
   - Jobs start 30 seconds apart
   - Prevents connection pool exhaustion
   - Distributes database load
   - Allows connection reuse

3. **Smaller Transaction Scope**
   - Maximum 1,000 records per batch
   - Shorter transaction lifetimes
   - Reduced connection strain
   - Better error recovery

4. **Natural Boundaries**
   - Clear start/end points for connections
   - Automatic cleanup between batches
   - No long-running transactions
   - Better resource management

#### Implementation Details

1. **Connection Health Checks**
   ```php
   // Check connection every 200 records
   if ($this->processedCount > 0 && $this->processedCount % 200 === 0) {
       $this->checkConnectionHealth();
   }
   ```

2. **Health Check Logic**
   ```php
   protected function checkConnectionHealth()
   {
       try {
           DB::connection('tenant_connection')->getPdo();
       } catch (\Exception $e) {
           Log::warning('Database connection unhealthy, reconnecting', [
               'job_id' => $this->jobId,
               'batch_number' => $this->batchNumber,
               'error' => $e->getMessage()
           ]);
           $this->reconnectDatabase();
       }
   }
   ```

3. **Reconnection Strategy**
   ```php
   protected function reconnectDatabase()
   {
       DB::disconnect('tenant_connection');
       DB::reconnect('tenant_connection');
       usleep(50000); // 50ms delay after reconnection
   }
   ```

4. **Tenant Connection Setup**
   ```php
   protected function setupTenantConnection()
   {
       config()->set('database.connections.tenant_connection.database', $this->tenantDatabase);
       app('db')->setDefaultConnection('tenant_connection');
       DB::purge('tenant_connection');
       DB::reconnect('tenant_connection');
   }
   ```

#### Monitoring & Metrics

1. **Connection Pool Stats**
   - Active connections per batch
   - Connection lifetime
   - Reconnection frequency
   - Pool utilization

2. **Health Indicators**
   - Connection errors
   - Reconnection attempts
   - Query timeouts
   - Transaction duration

3. **Performance Metrics**
   - Records per connection
   - Connection reuse rate
   - Pool exhaustion events
   - Recovery success rate

#### Best Practices

1. **Resource Management**
   - Use connection pooling
   - Implement health checks
   - Monitor pool utilization
   - Clean up stale connections

2. **Error Handling**
   - Detect connection issues early
   - Implement graceful recovery
   - Log connection events
   - Track error patterns

3. **Configuration**
   - Set appropriate pool sizes
   - Configure timeout values
   - Adjust retry intervals
   - Tune health check frequency

4. **Optimization**
   - Reuse connections when possible
   - Minimize transaction scope
   - Batch database operations
   - Use prepared statements

### NetSuite Concurrency Management

#### Architecture Benefits
The intelligent concurrency control system provides several key benefits:

1. **Prevents API Overwhelming**
   - Limits concurrent requests to NetSuite's capacity
   - Automatic slot reservation and release
   - Real-time monitoring of API utilization
   - Configurable limits per tenant

2. **Eliminates Thundering Herd**
   - Variable delays based on queue position
   - Dynamic spacing between batch dispatches
   - Jitter to prevent synchronization
   - Utilization-based delay adjustments

3. **Automatic Recovery**
   - Failed requests automatically release slots
   - Expired requests cleaned up automatically
   - Jobs can release themselves back to queue
   - No manual intervention required

4. **Real-Time Visibility**
   - Live concurrency status monitoring
   - Estimated wait time calculations
   - Utilization percentage tracking
   - Tenant-specific isolation

5. **Internal Retry Logic** ⭐ **NEW**
   - Jobs retry within same execution for concurrency issues
   - No attempt counter increments for temporary resource constraints
   - Prevents false failures from Laravel queue system
   - Only release to queue after exhausting internal retries

#### Implementation Details

1. **Slot Reservation System**
   ```php
   // Each request reserves a slot before making API calls
   if (!$this->concurrencyManager->reserveSlot($requestId)) {
       // Wait for slot within same job execution instead of releasing to queue
       if (!$this->concurrencyManager->waitForSlot($requestId)) {
           throw new \Exception('Failed to reserve NetSuite concurrency slot after waiting');
       }
   }
   ```

2. **Internal Retry Logic** ⭐ **NEW**
   ```php
   // Jobs retry up to 3 times within same execution for concurrency issues
   $maxRetries = 3;
   $retryCount = 0;

   while ($retryCount < $maxRetries) {
       try {
           // Attempt to make NetSuite request
           $response = $this->makeNetSuiteRequest($url, $requestBody, $method);
           return $response;
       } catch (\Exception $e) {
           if ($this->isConnectionError($e->getMessage())) {
               $retryCount++;
               if ($retryCount >= $maxRetries) {
                   // Only release to queue after exhausting internal retries
                   $this->release(30);
                   return null;
               }
               // Wait before retrying within same execution
               sleep(5 * $retryCount); // Progressive delay: 5s, 10s, 15s
               continue;
           }
           throw $e; // Re-throw non-connection errors
       }
   }
   ```

3. **Intelligent Delay Calculation**
   ```php
   // Delays calculated based on multiple factors
   $baseDelay = min(30, $queuePosition * 5); // 5s per position, max 30s
   $jitter = rand(2, 8); // Random 2-8s to prevent synchronization

   // Add utilization bonus if heavily loaded
   if ($utilization > 0.8) {
       $baseDelay += 15; // Add 15s if >80% utilized
   }
   ```

4. **Automatic Cleanup**
   ```php
   // Expired requests automatically removed
   foreach ($activeRequests as $requestId => $request) {
       if ($request['expires_at'] < $now) {
           unset($activeRequests[$requestId]);
           $cleaned = true;
       }
   }
   ```

5. **Real-Time Monitoring**
   ```bash
   # View current concurrency status
   php artisan netsuite:concurrency sx_db_1751480457

   # Output includes:
   # - Max Concurrent Requests: 5
   # - Total Active Requests: 3
   # - Available Slots: 2
   # - Utilization: 60%
   # - Estimated Wait Time: 15 seconds
   ```

#### Configuration Options

1. **Concurrency Limits**
   ```php
   // Adjustable per tenant
   $this->concurrencyManager = new NetSuiteConcurrencyManager(
       $this->tenantDatabase,
       5 // Default max concurrent requests
   );
   ```

2. **Request Timeout**
   ```php
   private const REQUEST_TIMEOUT = 300; // 5 minutes max per request
   private const CLEANUP_INTERVAL = 60; // Cleanup every 60 seconds
   ```

3. **Delay Parameters**
   ```php
   // Base delay: 5s per queue position, max 30s
   // Jitter: 2-8s random
   // Utilization bonus: +15s if >80% utilized
   ```

4. **Internal Retry Settings** ⭐ **NEW**
   ```php
   // Maximum retries within same job execution
   private const MAX_INTERNAL_RETRIES = 3;

   // Progressive delays for internal retries (seconds)
   private const INTERNAL_RETRY_DELAYS = [5, 10, 15];

   // Only release to queue after exhausting internal retries
   private const RELEASE_TO_QUEUE_DELAY = 30; // seconds
   ```

#### Monitoring & Metrics

1. **Concurrency Statistics**
   - Active request count
   - Reserved vs. active slots
   - Utilization percentage
   - Available slots

2. **Performance Indicators**
   - Average wait time
   - Slot reservation success rate
   - Request completion time
   - Error recovery rate

3. **Health Checks**
   - Slot cleanup frequency
   - Expired request count
   - Cache hit/miss rates
   - Tenant isolation status

4. **Internal Retry Metrics** ⭐ **NEW**
   - Internal retry success rate
   - Average internal retry attempts
   - Queue release frequency
   - Attempt counter preservation rate

#### Best Practices

1. **Slot Management**
   - Always reserve slots before making requests
   - Release slots immediately after completion
   - Handle failures gracefully with slot release
   - Monitor slot utilization patterns

2. **Delay Strategy**
   - Use intelligent delays, not fixed values
   - Add jitter to prevent synchronization
   - Consider queue position and utilization
   - Adjust based on performance data

3. **Internal Retry Strategy** ⭐ **NEW**
   - Retry concurrency issues within same job execution
   - Use progressive delays (5s, 10s, 15s) for internal retries
   - Only release to queue after exhausting internal retries
   - Preserve attempt counters for temporary resource issues

4. **Monitoring**
   - Track concurrency utilization in real-time
   - Monitor wait times and queue positions
   - Alert on high utilization (>80%)
   - Regular cleanup of expired requests
   - Monitor internal retry success rates

5. **Configuration**
   - Set appropriate concurrency limits per tenant
   - Adjust request timeouts based on API performance
   - Tune cleanup intervals for optimal performance
   - Monitor and adjust delay parameters
   - Configure internal retry limits and delays

### Progress Tracking
- **Overall Progress**: Weighted average of all record types
- **Batch Progress**: Individual batch completion percentage
- **Real-time Updates**: Cache-based progress storage
- **UI Integration**: Live progress bars and status updates

## Migration Strategy

### UI Integration
1. **Feature Flag System**
   ```php
   // config/features.php
   return [
       'chunked_import' => env('FEATURE_CHUNKED_IMPORT', false),
       'chunked_import_tenants' => explode(',', env('FEATURE_CHUNKED_IMPORT_TENANTS', '')),
   ];
   ```

2. **Progressive Enhancement**
   - Livewire Component Changes:
     ```php
     public function useChunkedImport()
     {
         return config('features.chunked_import') &&
                in_array(auth()->user()->tenant_id, config('features.chunked_import_tenants'));
     }
     ```
   - Template Conditionals:
     ```blade
     @if($this->useChunkedImport())
         {{-- New chunked import UI --}}
     @else
         {{-- Legacy import UI --}}
     @endif
     ```

3. **Alpine.js State Management**
   ```javascript
   Alpine.data('importJobHandler', () => ({
       // Shared state (both systems)
       isComplete: false,
       isRunning: false,
       isFailed: false,

       // Chunked system state
       dependencies: [],
       batchProgress: {},
       dependencyProgress: {},

       // Feature detection
       useChunkedSystem: @json($this->useChunkedImport()),

       // Methods with system-specific logic
       startImport() {
           const data = this.useChunkedSystem
               ? this.buildChunkedRequest()
               : this.buildLegacyRequest();
           // ...
       }
   }))
   ```

4. **API Endpoints**
   - New Routes:
     ```php
     Route::post('import/chunked', [ImportController::class, 'startChunkedImport']);
     Route::get('import/chunked/status', [ImportController::class, 'getChunkedStatus']);
     Route::post('import/chunked/cancel', [ImportController::class, 'cancelChunkedImport']);
     ```
   - Legacy Routes (preserved):
     ```php
     Route::post('import', [ImportController::class, 'startImport']);
     Route::get('import/status', [ImportController::class, 'getStatus']);
     ```

5. **Progress Tracking**
   - Enhanced Progress Display:
     ```blade
     <div class="progress-container">
         @if($this->useChunkedImport())
             {{-- Dependency-aware progress bars --}}
             @foreach($dependencies as $dep)
                 <div class="dependency-progress">
                     <span>{{ $dep->name }}</span>
                     <div class="progress-bar" style="width: {{ $dep->progress }}%">
                 </div>
             @endforeach
         @else
             {{-- Legacy progress bar --}}
             <div class="progress-bar" style="width: {{ $progress }}%">
         @endif
     </div>
     ```

6. **Error Handling**
   - Enhanced Error Display:
     ```blade
     <div class="error-container">
         @if($this->useChunkedImport())
             {{-- Batch-level errors --}}
             @foreach($batchErrors as $error)
                 <div class="batch-error">
                     <span>{{ $error->recordType }}</span>
                     <span>{{ $error->message }}</span>
                 </div>
             @endforeach
         @else
             {{-- Legacy error display --}}
             <div class="error">{{ $errorMessage }}</div>
         @endif
     </div>
     ```

### Gradual Rollout
1. **Feature Flags**: Enable new system for specific tenants
2. **A/B Testing**: Compare old vs new system performance
3. **Rollback Plan**: Ability to revert to old system if needed
4. **Monitoring**: Comprehensive metrics during transition

### Data Migration
- **Existing Jobs**: Continue with old system until completion
- **New Jobs**: Use new chunked system
- **Progress Preservation**: Maintain existing progress tracking
- **Status Migration**: Convert old status to new format

## Testing Strategy

### Unit Tests
- Dependency resolution logic
- Progress calculation algorithms
- Cancellation handling
- Event emission

### Integration Tests
- End-to-end import workflows
- Database connection management
- NetSuite API integration
- Error handling scenarios

### Performance Tests
- Large dataset processing
- Concurrent job execution
- Memory usage monitoring
- Database connection stability

### Load Tests
- Multiple concurrent imports
- High-volume record processing
- System resource utilization
- Failure recovery scenarios

## Monitoring & Alerting

### Key Metrics
- **Job Success Rate**: Percentage of successful imports
- **Batch Failure Rate**: Individual batch failure frequency
- **Processing Time**: Average time per batch and overall job
- **Memory Usage**: Peak memory consumption per batch
- **Database Connections**: Connection pool utilization

### Alerts
- **Job Failures**: Immediate notification of failed jobs
- **Batch Failures**: Alerts for repeated batch failures
- **Performance Degradation**: Slow processing times
- **Resource Exhaustion**: High memory or connection usage

## Future Enhancements

### Potential Improvements
- **Dynamic Batch Sizing**: Adjust batch size based on performance
- **Parallel Processing**: Process multiple record types concurrently
- **Incremental Imports**: Only import changed records
- **Resume Capability**: Resume failed imports from last successful batch
- **Advanced Scheduling**: Intelligent job scheduling based on system load

### Scalability Considerations
- **Horizontal Scaling**: Distribute jobs across multiple workers
- **Database Sharding**: Partition data across multiple databases
- **Caching Strategy**: Implement Redis caching for frequently accessed data
- **Queue Optimization**: Optimize job queue management

## Troubleshooting Guide

### Common Issues

#### Job Hangs During Processing
- **Symptoms**: Job stops processing without error logs
- **Causes**: Database connection issues, memory exhaustion
- **Solutions**: Implement connection health checks, add memory monitoring

#### Dependency Resolution Failures
- **Symptoms**: Circular dependency errors
- **Causes**: Incorrect criteria.json configuration
- **Solutions**: Validate dependency graphs, add cycle detection

#### Batch Job Failures
- **Symptoms**: Individual batches fail repeatedly
- **Causes**: NetSuite API issues, data validation errors
- **Solutions**: Implement retry logic, add detailed error logging

#### Progress Tracking Issues
- **Symptoms**: UI shows incorrect progress
- **Causes**: Cache inconsistencies, calculation errors
- **Solutions**: Validate progress calculations, implement cache cleanup

#### NetSuite Concurrency Limit Exceeded ⭐ **RESOLVED**
- **Symptoms**: `CONCURRENCY_LIMIT_EXCEEDED` errors, jobs hitting max attempts
- **Causes**: System overwhelming NetSuite's concurrent request limits
- **Solutions**:
  - Use the intelligent concurrency control system
  - Monitor concurrency status: `php artisan netsuite:concurrency {tenant_id}`
  - Reset concurrency tracking if needed: `php artisan netsuite:concurrency {tenant_id} --reset`
  - Check for stuck requests and clean up expired slots
- **Status**: ✅ **RESOLVED** - Internal retry logic prevents attempt counter corruption

#### Thundering Herd Problem ⭐ **RESOLVED**
- **Symptoms**: Multiple delayed jobs return simultaneously, overwhelming the system
- **Causes**: Fixed delays causing job synchronization
- **Solutions**:
  - Implement intelligent delays based on queue position
  - Add jitter to prevent synchronization
  - Use concurrency manager for dynamic delay calculation
  - Monitor queue positions and utilization
- **Status**: ✅ **RESOLVED** - Intelligent delays prevent job synchronization

#### Concurrency Slot Management Issues ⭐ **RESOLVED**
- **Symptoms**: Jobs stuck waiting for concurrency slots, high utilization
- **Causes**: Slots not being released, expired requests not cleaned up
- **Solutions**:
  - Check concurrency manager status and statistics
  - Verify slot cleanup is working (every 60 seconds)
  - Monitor for failed requests that didn't release slots
  - Reset concurrency tracking if system gets stuck
- **Status**: ✅ **RESOLVED** - Automatic cleanup and slot management working

#### Job Attempt Counter Corruption ⭐ **NEW - RESOLVED**
- **Symptoms**: Jobs failing with `MaxAttemptsExceededException` at attempt:1 despite multiple retries
- **Causes**: Jobs being released to queue for concurrency issues, incrementing attempt counters
- **Root Cause**: Laravel treating temporary resource issues as permanent failures
- **Solutions**:
  - Jobs now retry within same execution for concurrency issues
  - Internal retry logic prevents attempt counter increments
  - Only release to queue after exhausting internal retries (3 attempts)
  - Progressive delays (5s, 10s, 15s) for internal retries
- **Status**: ✅ **RESOLVED** - Internal retry logic eliminates attempt counter corruption

#### Batch Completion Tracking Issues ⭐ **NEW - RESOLVED**
- **Symptoms**: `total_batches` field stored as 0, imports never completing
- **Causes**: BatchJobCompletedListener incorrectly calculating total_batches
- **Root Cause**: Using completed batch count instead of original total
- **Solutions**:
  - Fixed BatchJobCompletedListener to preserve original total_batches
  - Retrieve total_batches from SyncStatus record itself
  - Ensure proper batch completion tracking
- **Status**: ✅ **RESOLVED** - Batch completion tracking now works correctly

## Maintenance

### Regular Tasks
- **Cache Cleanup**: Remove expired progress and cancellation data
- **Database Maintenance**: Clean up old job records
- **Performance Monitoring**: Track system performance metrics
- **Error Analysis**: Review and address recurring issues

### Backup Strategy
- **Job State Backup**: Regular backups of job progress data
- **Configuration Backup**: Backup criteria.json files
- **Database Backup**: Regular database backups including job tables
- **Recovery Procedures**: Document recovery processes

## Contact & Support

### Development Team
- **Primary Contact**: [Development Team Lead]
- **Escalation Path**: [System Administrator]
- **Documentation**: This file and related technical docs

### Support Resources
- **Issue Tracking**: [Project Management System]
- **Code Repository**: [Git Repository URL]
- **Monitoring Dashboard**: [Monitoring System URL]
- **Documentation**: [Documentation System URL]

---

**Last Updated**: 2025-01-14
**Version**: 2.0
**Status**: Enhanced with Intelligent Concurrency Control - Ready for Production Deployment
