# Parent-First Transaction Import Design

## Summary

- Problem: Importing large volumes of NetSuite transaction lines by scanning the `transactionline` table directly causes timeouts and inconsistent performance.
- Finalized Solution: Adopt a parent-first strategy (default for all transaction-based record types). Import parent transactions first (ordered by `transaction.id`). Derive the parent IDs from those results and fetch only their associated transaction lines using batched `IN (...)` queries (ordered by `transactionline.transaction, transactionline.id`). Use a lightweight field-probing step to construct explicit SELECT lists for performance and reliability; reserve `SELECT *` as a last-resort fallback for discovery only.
- Rationale: Measured results show explicit fields with probe/caching provide stable, low-latency pages at larger limits (500–1000) and dramatically outperform `SELECT *` for line queries. Parent `SELECT id` is fastest for ID enumeration; when additional parent fields are needed, explicit lists outperform `SELECT *` at scale and keep payloads small.

## Goals & Non-Goals

- Goals
  - Parent-first imports are the default for all transaction-based record types (e.g., Sales Orders, Invoices, Item Fulfillments, etc.).
  - Scope line fetching to known parent IDs using `IN (...)` batching and RESTlet paging.
  - Integrate field-probe logic to build explicit field lists per tenant/record type and cache results.
  - Maintain alignment with wave coordination, batch upsert, and chunked import architecture.
- Non-Goals
  - No per-tenant feature gating for parent-first; it is the default behavior across tenants (overrides remain possible if needed).
  - No date-windowing (time-frame splitting) for transaction imports; the parent-first model removes the need and avoids added complexity.

## Acceptance Criteria

- Parent-first is the default for all transaction-based record types.
- Parents are fetched ordered by `transaction.id`. Lines are fetched ordered by `transactionline.transaction, transactionline.id` using `IN (...)` with batched parent IDs.
- Probe logic is executed (and cached) to validate non-custom mapped fields and construct explicit SELECT lists; custom (`cust*`) fields from mappings are included without probing.
- Line queries use explicit SELECT lists by default. Parent ID enumeration uses `SELECT transaction.id`. Parent data pages prefer explicit lists when additional fields are required; `SELECT *` only for discovery.
- Waves & Batches: parent batches precede derived line batches; no line jobs are dispatched for failed parents.
- Observability: logs/metrics clearly distinguish parent and line phases, page counts, and coverage (parents imported vs lines imported).

## Probe Logic Integration

- Candidate field set
  - Source: `criteria.json` + `integration_mappings` for the record type.
  - Always include required cores: `transaction.id` (parents), `transactionline.id` and `transactionline.uniquekey` (lines). Inject configured aliases from `config/suiteql.php` (e.g., `transactionline.id AS line_id`, `transactionline.uniquekey AS line_uniquekey`).
  - For non-join line queries, drop any `transaction.*` selections (selection is from `transactionline`).
- Custom fields
  - Identification: any mapped field prefixed with `cust*` (e.g., `custbody_*`, `custcol_*`, `custrecord_*`).
  - Policy: include mapped `cust*` fields unconditionally; do not probe. They are tenant-specific and their presence in the mapping indicates expected availability.
- Probe execution (non-custom fields)
  - Build probe SELECT: `aliases + required + optional(non-custom)` with `WHERE` and enforced `ORDER BY`.
  - Run via RESTlet with `limit=1` and `offset=0`.
  - On success: cache `ok_fields` in Redis per tenant+record-type (TTL 24 hours); use full list. Invalidate cache on mapping changes or deployments.
  - On failure: retry with required-only; if still failing, probe required individually to salvage a minimal set; cache degraded `ok_fields`.
  - Fallback: if nothing passes, use `SELECT *` for a single discovery page (logged) and continue probing on subsequent runs.
  - Logging: probe failures are logged at Notice level (expected, informational).
- Usage after probe
  - Parents explicit: `SELECT {ok_fields} FROM transaction WHERE type='...' ORDER BY transaction.id`.
  - Lines explicit: `SELECT {ok_fields} FROM transactionline WHERE mainline='F' AND transactionline.transaction IN (...) ORDER BY transactionline.transaction, transactionline.id`.

## Query Patterns (via RESTlet)

- Parents
  - ID enumeration (fastest): `SELECT id FROM transaction WHERE type='...' ORDER BY id ASC`
  - Data pages (preferred): `SELECT {ok_fields} FROM transaction WHERE type='...' ORDER BY id ASC`
- Lines (scoped to parents)
  - `SELECT {ok_fields} FROM transactionline WHERE transactionline.mainline='F' AND transactionline.transaction IN (...) ORDER BY transactionline.transaction ASC, transactionline.id ASC`
- Paging
  - Use RESTlet `runSuiteQLPaged` with `offset/limit`. Ensure an explicit `ORDER BY` is present. Target page size 1000; fall back to 500 if tenant performance varies.

## Configuration & Metadata

- Defaults
  - Parent-first enabled for all transaction-based record types.
  - Line fetch page size: 1000 (fallback 500).
  - IN-batch size (parent IDs per line job): 500 (fallback 100 if page latency >10s for 2 consecutive calls).
  - Config location: `config/waves.php` → `line_fetch.in_batch_size`, `line_fetch.page_size`, `line_fetch.page_fallback`.
- Metadata in jobs
  - Parent batch id, list of parent IDs, page size, current offset, and max pages per line job (safety cap).
- Overrides
  - None (defaults above apply globally; tuning is via code/config release).

## Record Types Covered (Default)

- Sales Orders, Invoices, Purchase Orders, Estimates, Item Fulfillments, Item Receipts, Work Orders.

## Wave & Batch Generation

- Dependency graph
  - Maintain non-transaction dependencies (Customer/Item/Subsidiary/Location/...) before parents.
  - For transaction types, remove `TransactionLine` as a global prerequisite and instead generate line jobs as a derived stage from completed parent batches.
- Waves
  - Wave N: parent batches.
  - Wave N+1 (or immediate follow-on): derived line-IN batches from the IDs produced by Wave N.
- Batch generation
  - After each parent batch completes successfully, extract IDs and enqueue one or more line-IN jobs using configured IN-batch size and page size.

## Batch Upsert Behavior

- Parents
  - Upsert parents first. Extract IDs from the same batch results to feed line jobs (no separate ID-only call required in production flow).
- Lines
  - Upsert lines with idempotency using `uniquekey` as both `refid` and `external_id` in SuiteX. `line_id` is stored separately (data field only).
  - Canonical ID mapping: `CanonicalIdExtractor` converts `uniquekey → refid/external_id` while keeping `line_id` separate. Applied after successful transaction upserts and required only for transaction lines.
- Reliability
  - If a parent batch fails, do not enqueue line jobs for those parents; standard retry policies apply. Partial parent success generates lines only for succeeded IDs.

## Chunked Import Logic & Wave Coordination

- Chunked import continues to operate unchanged; wave coordination orchestrates parent-first ordering:
  - Create parent waves, await completion threshold, then dispatch derived line waves.
  - Ensure no line batch executes without its parent batch marked successful (precondition guard enforced in the line job handler).
  - Maintain existing backoff/retry strategies for RESTlet and database operations.

## Error Handling & Retries

- Parent batch failures block corresponding line batches; retry parents first.
- Line batch failures retry independently; upsert idempotency (via `uniquekey`) allows safe replays.
- RESTlet/network errors: progressive backoff retries; surface concise error context in logs.

## Security & Operational Guardrails

- Max IN-list per line job: 500 parent IDs.
- Max pages per line job: none; iterate until complete.
- Retry caps/backoff: use existing internal retry/backoff policies.

## Observability & Coverage

- Metrics/logs per phase: parent fetch duration, line fetch duration/page timings, total line count per parent set.
- Thresholds: target <2000ms/page under normal load; warn if >5000ms/page for 3 consecutive pages.
- Coverage: parents imported vs lines imported, with reconciliation signals when mismatches occur. Parents must complete first.
 - Implementation notes: `ImportNetSuiteRecordsBatch` logs RESTlet durations; TransactionLine pages log warnings if exceeding thresholds and include `limit` and `parent_ids_count` for context.

## Performance Results Summary & Decision

- Parents
  - `SELECT id` for enumeration is consistently fast. When additional fields are required, explicit fields (probe-driven) are faster than `SELECT *` at larger page sizes due to smaller payloads.
- Lines
  - Explicit fields (probe-driven) are materially faster than `SELECT *` across page sizes (500–1000) and remain stable as limits increase.
- Decision
  - Retain the probe approach with Redis caching to produce explicit field lists by default. Use `SELECT *` strictly as a discovery fallback when probing cannot establish a safe field set.

## Implementation Plan (Concise)

1. Make parent-first the default for all transaction-based record types in dependency graph and planning.
2. Integrate probe logic in query builders for transaction and transactionline; cache `ok_fields` per tenant/record type.
3. Generate derived line-IN batches from completed parent batches; enforce preconditions and ordering in waves.
4. Apply defaults (page size 1000; IN-batch size 500 with fallbacks) via wave configuration; remove per-tenant overrides.
5. Enhance logging/metrics for phase timings and coverage.

## Conclusion

Parent-first imports for transaction record types reduce scan breadth, improve stability, and integrate cleanly with existing wave, batch upsert, and chunked import infrastructure. With probe-driven explicit field selection, the system achieves stable low-latency performance at larger page sizes while preserving resilience through idempotent line upserts and robust error handling. This document serves as the single source of truth for the finalized approach and replaces prior testing guidance and date-windowing strategies.
