# NetSuite Transaction Lines – Investigation & Current Status

Date: September 17, 2025
Status: Stable under current test settings; phased rollout recommended

---

## Executive Summary
- The recurring timeouts/empty responses on transaction lines were caused by heavy NetSuite query cost and inconsistent filters on SELECT queries.
- Stability was achieved by simplifying queries, enforcing a strict date filter, selecting only necessary columns (`SELECT id`), switching to POST with JSON body, and forcing conservative HTTP transport (HTTP/1.1, IPv4, Expect off, Connection: close).
- Latest runs return data consistently (~8.1s per 100-row page) with no empty responses or cURL 28 errors.

## Business Synopsis (for C-Suite)
- We tuned our NetSuite data pulls to be simpler and more predictable. This reduced timeouts and made imports reliable again.
- We’ll reintroduce fields and relax safeguards gradually, balancing speed with reliability while keeping user impact low.

---

## Root Cause Discovery (Validated)

### ❌ Initial Hypotheses (Eliminated)

#### 1. **PHP Process Corruption**
- **Theory**: PHP process corruption causing small response failures
- **Tests Tried**:
  - Process ID and memory tracking
  - Signal protection (SIGTERM, SIGINT, SIGHUP)
  - Garbage collection clearing
  - Output buffering controls
  - Opcache status monitoring
- **Result**: No correlation found, eliminated as cause

#### 2. **cURL Configuration Issues (alone)**
- **Theory**: cURL settings causing timeout or corruption
- **Tests Tried**:
  - Fresh cURL handles for every request
  - Conservative buffer sizes (8KB → 16KB)
  - HTTP/1.1 forced (vs HTTP/2)
  - Progress tracking for small responses
  - Connection reuse settings
- **Result**: No improvement, eliminated as cause

#### 3. **Response Size/Bandwidth Issues**
- **Theory**: Large response causing memory/buffer problems
- **Tests Tried**:
  - Streaming responses with incremental JSON parsing
  - Generator-based record processing
  - Memory limit increases (512MB)
  - Reduced batch sizes (1000 → 500 records)
- **Result**: No correlation - COUNT response is only 200 bytes vs 2MB+ data responses that work fine

#### 4. **PHP Socket Timeout Override Issues**
- **Theory**: PHP `default_socket_timeout` overriding cURL timeout
- **Tests Tried**:
  - `ini_set('default_socket_timeout', 120)` before requests
  - Increased cURL timeout to 90 → 120 seconds
- **Result**: ✅ **PARTIAL SUCCESS** - Fixed 60-second timeouts but 90-second issues remained

### ✅ Actual Root Cause: NetSuite Query Complexity & Missing Filters

**The real issue**: Complex `COUNT(*)` queries with INNER JOINs on massive datasets hitting NetSuite's **server-side resource limits**.

#### Failed Query (Complex):
```sql
SELECT COUNT(*) as total
FROM transactionline
INNER JOIN transaction ON transactionline.transaction = transaction.ID
WHERE transactionline.mainline = 'F'
AND linecreateddate >= TO_DATE('2025-01-01', 'YYYY-MM-DD')
```

#### Working Query (Simplified):
```sql
SELECT COUNT(*) as total
FROM transactionline
WHERE transactionline.mainline = 'F'
AND linecreateddate >= TO_DATE('2025-01-01', 'YYYY-MM-DD')
```

---

## Solution Applied (Earlier Phase)

### Code Changes Made

#### 1. **Removed INNER JOIN from COUNT Queries**
**File**: `src/App/Jobs/ImportJobs/ImportJobCoordinator.php`
```php
// BEFORE: Complex query with JOIN
$query = "SELECT COUNT(*) as total FROM {$criteria}";
if (!empty($mappingCriteria['where'])) {
    $query .= " " . $mappingCriteria['where'];
}
if (!empty($mappingCriteria['inner'])) {
    $query .= " " . $mappingCriteria['inner'];  // ❌ EXPENSIVE JOIN
}

// AFTER: Simplified query without JOIN
$query = "SELECT COUNT(*) as total FROM {$criteria}";
if (!empty($mappingCriteria['where'])) {
    $query .= " " . $mappingCriteria['where'];
}
// ✅ Removed INNER JOIN for COUNT operations
```

#### 2. **Increased Timeout Buffer**
**File**: `src/Domain/OAuths/Netsuite/Models/NetSuiteOAuth.php`
```php
// cURL timeout: 90s → 120s
curl_setopt($this->curlHandle, CURLOPT_TIMEOUT, 120);

// PHP socket timeout: 120s → 150s
ini_set('default_socket_timeout', 150);
```

#### 3. **Retained Critical Fixes**
- ✅ PHP socket timeout override (prevents 60s PHP timeout)
- ✅ cURL error detection for "0 bytes received"
- ✅ Streaming output buffering protection
- ✅ Persistent cURL handle with proper cleanup

---

## Performance Results (Earlier Phase)

### Before Fix
- ❌ **COUNT Query**: Timeout after 90 seconds, "0 bytes received"
- ❌ **Total Import**: Failed to start (no record count available)

### After Fix
- ✅ **COUNT Query**: 87 seconds, successful response
- ✅ **Record Count**: 4,812,213 transaction lines
- ✅ **Total Import**: Proceeding normally with 500-record batches

### Log Evidence
```json
{
  "record_type_name": "transactionline",
  "total_records": 4812213,
  "query": "SELECT COUNT(*) as total FROM transactionline WHERE transactionline.mainline = 'F' AND linecreateddate >= TO_DATE('2025-01-01', 'YYYY-MM-DD')",
  "attempt": 1,
  "status": "success"
}
```

---

## Key Insights (General)

### 1. **NetSuite Resource Limits**
NetSuite has **different resource limits** for different query types:
- **Simple queries** (even returning large data): High limits
- **Complex aggregation with JOINs**: Strict limits
- **Query complexity** matters more than response size

### 2. **Response Size Paradox**
- **COUNT response**: ~200 bytes ❌ (was failing)
- **500 record batch**: ~2MB ✅ (always worked)
- **Proof**: Issue was server-side query processing, not client-side response handling

### 3. **Timeout Layer Hierarchy**
```
NetSuite Server Processing: ~87 seconds (query execution time)
↓
cURL Timeout: 120 seconds (application timeout)
↓
PHP Socket Timeout: 150 seconds (infrastructure timeout)
↓
Horizon Job Timeout: 900 seconds (queue timeout)
```

### 4. **Terminal/Postman vs PHP Difference**
- **Same query structure**: Both used identical SQL
- **Different JOIN requirements**: COUNT queries don't need JOIN data
- **NetSuite optimization**: May handle external tools differently than API clients

### 5. **ORDER BY Clause Impact**
- Transaction line queries without an `ORDER BY` clause take ~24s to execute, whereas including the `ORDER BY` clause takes ~2.5s.
- For `LIMIT 1000` with `INNER JOIN` and `SELECT *`:
  - With `ORDER BY`: ~3.5s
  - Without `ORDER BY`: ~28s

- Additional measurements (`SELECT transactionline.*` with `INNER JOIN`, date filter applied, `LIMIT 1000`):
  - With `ORDER BY transaction.id ASC`: ~4.5s
  - Without `ORDER BY`: ~10.5s

- Date filter impact relative to `ORDER BY`:
  - When `ORDER BY` is omitted, applying the date filter materially reduces response time.
  - When `ORDER BY` is included, the date filter provides minimal additional benefit.

- Measurement context: These timings were captured in Postman, which tends to be faster than our application.
- Next step: Run these same queries from within the application to validate and compare timings.

---

## Files Modified (Earlier Phase)

### Primary Changes
1. **`src/App/Jobs/ImportJobs/ImportJobCoordinator.php`**
   - Removed INNER JOIN from COUNT query construction
   - Simplified query for performance

2. **`src/Domain/OAuths/Netsuite/Models/NetSuiteOAuth.php`**
   - Increased cURL timeout: 90s → 120s
   - Increased PHP socket timeout: 120s → 150s
   - Retained streaming output buffering protection

### Supporting Configuration
- **Batch size**: Reduced from 1000 → 500 records (for general performance)
- **cURL error detection**: Enhanced logging for "0 bytes received" scenarios

---

## Testing Methodology Applied (General)

### 1. **Systematic Hypothesis Testing**
- Started with most likely causes (timeouts, corruption)
- Used process of elimination
- Each test was measurable and repeatable

### 2. **Comparative Analysis**
- Terminal/Postman vs PHP application
- Working record types vs failing transaction lines
- Response sizes (small COUNT vs large data)

### 3. **Progressive Debugging**
- Added comprehensive logging at each step
- Isolated variables (fresh handles, timeouts, query structure)
- Measured performance impact of each change

### 4. **Root Cause Validation**
- Final fix addressed only the specific issue identified
- Removed all unnecessary complexity added during debugging
- Confirmed fix with production-scale data (4.8M records)

---

## Lessons Learned (General)

### 1. **Query Optimization for Large Datasets**
- COUNT operations should be as simple as possible
- Avoid JOINs in COUNT queries when not needed for the count logic
- NetSuite has internal resource limits that may not be documented

### 2. **Timeout Strategy**
- Use layered timeouts (cURL < PHP socket < Queue)
- Provide adequate buffer based on actual query performance
- PHP socket timeout can override cURL timeout

### 3. **Debugging Complex Issues**
- Response size doesn't always correlate with complexity
- Server-side vs client-side issues require different approaches
- Comparative testing (working vs failing) is crucial

### 4. **NetSuite API Behavior**
- Different query patterns have different resource limits
- Complex aggregations may hit undocumented limits
- External tools (Postman) may be handled differently than API clients

---

## Future Considerations (General)

---

## Current Working Test Configuration (Validated)
- Date filter: `linecreateddate >= TO_DATE('2025-07-01', 'YYYY-MM-DD')`
- Query: `SELECT id FROM transactionline WHERE mainline = 'F' OFFSET {offset} ROWS FETCH NEXT {limit} ROWS ONLY`
- Batch size: 100
- Method: POST with JSON body `{ q: "..." }`
- Transport: HTTP/1.1, IPv4, `Expect:` (disabled), `Connection: close`
- Observed performance: ~8.1s per 100-row page (stable)

## What Worked (Validated Hypotheses)
- Remove JOINs from COUNT: Eliminated COUNT timeouts.
- Apply date filter on SELECT (same as COUNT): Prevented massive scans; reduced first-byte latency.
- Minimal column selection (`SELECT id`): Lowered execution cost; stabilized responses.
- POST with JSON body: Avoided long URL / proxy edge cases.
- Conservative transport (HTTP/1.1, IPv4, Expect off, Connection: close): Removed HTTP/2/keepalive quirks.

## What Didn’t Work / Dead Ends (with reasons)
- “PHP process corruption”: No evidence; behavior matched query cost and filters.
- Fresh cURL handles: No reliable improvement vs persistent handle after fixes.
- Streaming/generators: Helpful patterns but didn’t fix slow first-byte from heavy queries; added complexity.
- GET with querystring: Works but risks long URLs; POST+JSON preferred.
- Large batch sizes (500+): Increased timeout risk; 100 proved reliable.

## Earlier Findings – Revalidated
- Layered timeouts: Keep cURL < PHP socket < queue timeout to avoid socket cutoff.
- COUNT vs SELECT: COUNT can succeed while SELECT fails if SELECT lacks equivalent filters or is more complex.

## Next Steps (Phased Plan)
1) Stabilization (stay in test mode)
   - Keep: `SELECT id`, July 1 filter, POST+JSON, HTTP/1.1/IPv4/Expect off/Connection close, limit=100.
   - Run through many pages and record latency distribution; watch for errors.

2) Gradual capability increase
   - Add minimal additional columns required by upsert (e.g., 5–10). Keep date filter; keep no ORDER BY. Monitor.
   - Incrementally add remaining needed columns. Keep limit=100.
   - Relax transport overrides one-by-one (retain POST+JSON): first remove `Connection: close`, then `Expect:`, then consider IPv6/HTTP/2 only if helpful.

3) Productization
   - Replace hard-coded test path with a feature flag (config-gated) in the query builder.
   - Ensure all SELECT paths consistently apply the same date filter logic used by COUNT.
   - Improve timeout handling: treat cURL 28 as retryable with backoff; avoid classifying timeouts as successful.
   - Consider two-step fetching when dependencies required: fetch parent IDs first, then lines via `IN (...)` batches.

## Post-Test Cleanup Plan (after walking back test code)
- Remove hard-coded transactionline test path; keep an opt-in config/feature flag for minimal-query mode.
- Standardize on POST+JSON for SuiteQL (keep GET only for short queries where safe).
- Keep unified date filter application for COUNT and SELECT.
- Maintain default batch size from config (100); revisit only after sustained stability.
- Preserve improved timeout layering; roll back transport overrides gradually with monitoring.
- Properly classify cURL 28 timeouts as retryable with backoff.

---

## Appendix – Key Log Signals (for quick verification)
- “HARD-CODED TEST: NetSuite transactionline query being executed” — confirms minimal SELECT id + date + paging.
- “HARD-CODED TEST transport” — confirms HTTP/1.1, IPv4, Expect off, Connection close per request.

### 1. **Monitoring**
- Track COUNT query performance across all record types
- Alert on queries approaching timeout thresholds
- Monitor for new record types with similar complexity

### 2. **Optimization Opportunities**
- Review other COUNT queries for unnecessary JOINs
- Consider caching COUNT results for frequently accessed data
- Implement query performance profiling

### 3. **Documentation**
- Document NetSuite query complexity guidelines
- Create reference for timeout troubleshooting
- Share learnings with other teams using NetSuite APIs

---

## Status
- Proceed with Phase 2 (add columns gradually), then Phase 3 (relax transport, de-hardcode with config), with active monitoring.
