## Backfill Transaction Subsidiary After Transaction Lines

### Problem

NetSuite transaction responses (e.g., sales orders, invoices) do not include the subsidiary value on the parent transaction. Subsidiary is present on the transaction lines. We insert parent transactions before their lines, so the parent’s `subsidiary` is missing at insert time.

### Goal

Populate the parent transaction’s `subsidiary` after related transaction lines have been inserted, ensuring all transactions have a subsidiary value aligned to each table’s schema type.

### Key Requirements

- Perform a follow-on bulk update once TransactionLine processing completes for a job.
- Update only the affected transactions (when possible) using the job’s captured parent IDs.
- Respect each table’s schema type for `subsidiary` (VARCHAR vs INTEGER) and cast accordingly.
- If multiple line subsidiaries exist for a single transaction, choose the first and set that on the parent. We define “first” deterministically as the line with the lowest `transaction_lines.id` (stable ordering).
- Idempotent, safe to run multiple times.

### Fit in Current Flow

1) Parent transactions are imported first (e.g., SalesOrder, Invoice, ItemFulfillment, Estimate, ItemReceipt, WorkOrder, PurchaseOrder).
2) Transaction lines are imported in derived waves using the parent-first design.
3) After the TransactionLine wave is complete, run a single job to backfill `subsidiary` onto parent transactions.

### Data Types by Table (schema-aligned)

- VARCHAR subsidiary tables (store string): Sales Orders, Invoices, Item Fulfillments, Purchase Orders, Opportunities, Estimates, Item Receipts
- INTEGER subsidiary tables (store integer): Work Orders, Projects, Locations, Transaction Lines (lines already set at insert time)

Note: Actual column types are enforced in service validation. This backfill must cast to the correct type before updating.

### Inputs

- job_id: current import job identifier
- parentIds per record type: we already cache parent IDs during parent fetches (key pattern `parent_ids_{jobId}_{recordTypeId}`) for TransactionLine batching. Use these to scope the backfill.
- Fallback: if no cached IDs, target parents with NULL/empty subsidiary.

### Conflict Strategy (multiple subsidiaries on lines for the same transaction)

- Choose the first line’s subsidiary by lowest `transaction_lines.id` and set that on the parent.
- Log any case where a transaction had more than one distinct line subsidiary for observability.

### Bulk Update Patterns

Use a single bulk `UPDATE … JOIN` per table. The inner selection picks the first line per transaction via `MIN(transaction_lines.id)` and returns that line’s subsidiary for the parent update.

#### VARCHAR subsidiary parents (string storage)

Update pattern (example: `salesorders`):

```sql
UPDATE salesorders so
JOIN (
  SELECT tl.transaction AS id,
         tl.subsidiary,
         cnt.distinct_subs
  FROM transaction_lines tl
  JOIN (
    SELECT transaction,
           MIN(id) AS min_line_id,
           COUNT(DISTINCT subsidiary) AS distinct_subs
    FROM transaction_lines
    WHERE transaction IN (:parentIds)
      AND subsidiary IS NOT NULL
    GROUP BY transaction
  ) cnt ON cnt.transaction = tl.transaction AND tl.id = cnt.min_line_id
) x ON x.id = so.id
SET so.subsidiary = CAST(x.subsidiary AS CHAR)
WHERE (so.subsidiary IS NULL OR so.subsidiary = '');
```

Apply the same shape to: `invoices`, `itemfulfillments`, `purchasesorders`, `estimates`, `itemreceipts`, `workorders`, `opportunities` (table names per schema), adjusting only the parent table in the `UPDATE` and `SET`.

Notes:
- `CAST(… AS CHAR)` ensures compatibility with VARCHAR columns.
- If `:parentIds` is not available, drop the `IN (:parentIds)` predicate and target only parents where `(subsidiary IS NULL OR subsidiary = '')`.

#### INTEGER subsidiary parents (integer storage)

Update pattern (example: `projects`):

```sql
UPDATE projects p
JOIN (
  SELECT tl.transaction AS id,
         tl.subsidiary,
         cnt.distinct_subs
  FROM transaction_lines tl
  JOIN (
    SELECT transaction,
           MIN(id) AS min_line_id,
           COUNT(DISTINCT subsidiary) AS distinct_subs
    FROM transaction_lines
    WHERE transaction IN (:parentIds)
      AND subsidiary IS NOT NULL
    GROUP BY transaction
  ) cnt ON cnt.transaction = tl.transaction AND tl.id = cnt.min_line_id
) x ON x.id = p.id
SET p.subsidiary = x.subsidiary
WHERE p.subsidiary IS NULL;
```

Notes:
- No casting needed for integer columns.

### Covered Transaction Types (per-table recipes)

This backfill applies to all parent transaction types listed below. Use the VARCHAR or INTEGER pattern as indicated by schema.

- Sales Orders (table: `salesorders`) → VARCHAR

```sql
UPDATE salesorders so
JOIN (
  SELECT tl.transaction AS id, tl.subsidiary, cnt.distinct_subs
  FROM transaction_lines tl
  JOIN (
    SELECT transaction, MIN(id) AS min_line_id, COUNT(DISTINCT subsidiary) AS distinct_subs
    FROM transaction_lines
    WHERE transaction IN (:parentIds) AND subsidiary IS NOT NULL
    GROUP BY transaction
  ) cnt ON cnt.transaction = tl.transaction AND tl.id = cnt.min_line_id
) x ON x.id = so.id
SET so.subsidiary = CAST(x.subsidiary AS CHAR)
WHERE (so.subsidiary IS NULL OR so.subsidiary = '');
```

- Purchase Orders (table: `purchaseorders`) → VARCHAR

```sql
UPDATE purchaseorders po
JOIN (
  SELECT tl.transaction AS id, tl.subsidiary, cnt.distinct_subs
  FROM transaction_lines tl
  JOIN (
    SELECT transaction, MIN(id) AS min_line_id, COUNT(DISTINCT subsidiary) AS distinct_subs
    FROM transaction_lines
    WHERE transaction IN (:parentIds) AND subsidiary IS NOT NULL
    GROUP BY transaction
  ) cnt ON cnt.transaction = tl.transaction AND tl.id = cnt.min_line_id
) x ON x.id = po.id
SET po.subsidiary = CAST(x.subsidiary AS CHAR)
WHERE (po.subsidiary IS NULL OR po.subsidiary = '');
``;

- Estimates (table: `estimates`) → VARCHAR

```sql
UPDATE estimates e
JOIN (
  SELECT tl.transaction AS id, tl.subsidiary, cnt.distinct_subs
  FROM transaction_lines tl
  JOIN (
    SELECT transaction, MIN(id) AS min_line_id, COUNT(DISTINCT subsidiary) AS distinct_subs
    FROM transaction_lines
    WHERE transaction IN (:parentIds) AND subsidiary IS NOT NULL
    GROUP BY transaction
  ) cnt ON cnt.transaction = tl.transaction AND tl.id = cnt.min_line_id
) x ON x.id = e.id
SET e.subsidiary = CAST(x.subsidiary AS CHAR)
WHERE (e.subsidiary IS NULL OR e.subsidiary = '');
```

- Opportunities (table: `opportunities`) → VARCHAR

```sql
UPDATE opportunities o
JOIN (
  SELECT tl.transaction AS id, tl.subsidiary, cnt.distinct_subs
  FROM transaction_lines tl
  JOIN (
    SELECT transaction, MIN(id) AS min_line_id, COUNT(DISTINCT subsidiary) AS distinct_subs
    FROM transaction_lines
    WHERE transaction IN (:parentIds) AND subsidiary IS NOT NULL
    GROUP BY transaction
  ) cnt ON cnt.transaction = tl.transaction AND tl.id = cnt.min_line_id
) x ON x.id = o.id
SET o.subsidiary = CAST(x.subsidiary AS CHAR)
WHERE (o.subsidiary IS NULL OR o.subsidiary = '');
```

- Invoices (table: `invoices`) → VARCHAR

```sql
UPDATE invoices i
JOIN (
  SELECT tl.transaction AS id, tl.subsidiary, cnt.distinct_subs
  FROM transaction_lines tl
  JOIN (
    SELECT transaction, MIN(id) AS min_line_id, COUNT(DISTINCT subsidiary) AS distinct_subs
    FROM transaction_lines
    WHERE transaction IN (:parentIds) AND subsidiary IS NOT NULL
    GROUP BY transaction
  ) cnt ON cnt.transaction = tl.transaction AND tl.id = cnt.min_line_id
) x ON x.id = i.id
SET i.subsidiary = CAST(x.subsidiary AS CHAR)
WHERE (i.subsidiary IS NULL OR i.subsidiary = '');
```

- Item Fulfillments (table: `itemfulfillments`) → VARCHAR

```sql
UPDATE itemfulfillments f
JOIN (
  SELECT tl.transaction AS id, tl.subsidiary, cnt.distinct_subs
  FROM transaction_lines tl
  JOIN (
    SELECT transaction, MIN(id) AS min_line_id, COUNT(DISTINCT subsidiary) AS distinct_subs
    FROM transaction_lines
    WHERE transaction IN (:parentIds) AND subsidiary IS NOT NULL
    GROUP BY transaction
  ) cnt ON cnt.transaction = tl.transaction AND tl.id = cnt.min_line_id
) x ON x.id = f.id
SET f.subsidiary = CAST(x.subsidiary AS CHAR)
WHERE (f.subsidiary IS NULL OR f.subsidiary = '');
```

- Item Receipts (table: `itemreceipts`) → VARCHAR

```sql
UPDATE itemreceipts r
JOIN (
  SELECT tl.transaction AS id, tl.subsidiary, cnt.distinct_subs
  FROM transaction_lines tl
  JOIN (
    SELECT transaction, MIN(id) AS min_line_id, COUNT(DISTINCT subsidiary) AS distinct_subs
    FROM transaction_lines
    WHERE transaction IN (:parentIds) AND subsidiary IS NOT NULL
    GROUP BY transaction
  ) cnt ON cnt.transaction = tl.transaction AND tl.id = cnt.min_line_id
) x ON x.id = r.id
SET r.subsidiary = CAST(x.subsidiary AS CHAR)
WHERE (r.subsidiary IS NULL OR r.subsidiary = '');
```

- Work Orders (table: `workorders`) → INTEGER

```sql
UPDATE workorders w
JOIN (
  SELECT tl.transaction AS id, tl.subsidiary, cnt.distinct_subs
  FROM transaction_lines tl
  JOIN (
    SELECT transaction, MIN(id) AS min_line_id, COUNT(DISTINCT subsidiary) AS distinct_subs
    FROM transaction_lines
    WHERE transaction IN (:parentIds) AND subsidiary IS NOT NULL
    GROUP BY transaction
  ) cnt ON cnt.transaction = tl.transaction AND tl.id = cnt.min_line_id
) x ON x.id = w.id
SET w.subsidiary = x.subsidiary
WHERE w.subsidiary IS NULL;
```

### Job Orchestration

Create `UpdateTransactionSubsidiariesJob` with inputs `{ job_id }`. It will:

1) Resolve record types that were processed in this job and have a `subsidiary` column.
2) Load `parentIds` from cache where available, per record type.
3) For each target parent table:
   - Run the corresponding bulk `UPDATE … JOIN` SQL.
   - Collect metrics: rows updated, rows skipped, conflicts (where `distinct_subs > 1`).
4) Log a summary per table and overall totals.

Trigger: Dispatch this job when the TransactionLine wave for the job is marked complete (e.g., from `BatchJobCompletedListener` when lines reach 100%).

#### Orchestration Integration & References

- Orchestration trigger points (existing):
  - `src/App/Listeners/ImportJobs/BatchJobCompletedListener.php`
    - Listens for batch completions and already nudges wave progression. Hook the dispatch of `UpdateTransactionSubsidiariesJob` once the TransactionLine wave reaches 100% for the `job_id`.
  - `src/App/Services/ImportJobs/WaveCoordinator.php`
    - Methods like `checkWaveCompletion()` and `checkAndDispatchDerivedWaves()` determine wave lifecycle. Use these signals to know when all derived (TransactionLine) batches are finished for the job.
  - `src/App/Jobs/ImportJobs/ImportNetSuiteRecordsBatch.php`
    - Source of batch execution. Not the dispatch point, but useful context for when lines complete.
  - `config/waves.php`
    - Thresholds and coordination parameters referenced by the coordinator.

- Recommended placement:
  - In `BatchJobCompletedListener`, after confirming all TransactionLine batches for `job_id` are complete, dispatch `UpdateTransactionSubsidiariesJob` (once per job).

- Related design docs:
  - Parent-first/wave fixes: `/docs/design/import-fixes-sonnet.md`
  - Transactions table backfill (companion): `/docs/designs/backfill-transactionable-table.md`

### Idempotency & Safety

- Updates only fill missing/empty `subsidiary` values; re-runs are safe.
- Scope by `:parentIds` where possible to minimize touched rows.
- Log conflicts (where more than one distinct subsidiary was observed); we still update using the first line’s subsidiary per requirements.

### Performance & Indexes

- Ensure indexes:
  - `transaction_lines(transaction)` for the join
  - Parent tables on `id` (primary key)
- For very large `:parentIds`, chunk the list and run multiple updates.
- Each table is updated with one query (per chunk), minimizing round-trips.

### Validation Compatibility

- VARCHAR tables receive `CAST(x.subsidiary AS CHAR)`; INTEGER tables receive integer values.
- This happens post-insert and does not interfere with upsert-time validation.

### Monitoring & Logging

- Per table, log:
  - `rows_updated`
  - `rows_skipped`
  - `conflict_transactions` (count where `distinct_subs > 1`)
- Emit one summary log for the job.

### Edge Cases

- Transactions with no lines: unchanged (remain NULL/empty subsidiary).
- Lines with NULL subsidiary: excluded by `AND subsidiary IS NOT NULL`.
- Multiple subsidiaries: we pick the first (lowest line id). Still log the conflict count.

### Acceptance Criteria

- All transactions get a subsidiary value populated when any line contains one.
- Updates respect each table’s schema type for `subsidiary`.
- Backfill runs after TransactionLine wave completion and is idempotent.
- Conflicts are logged; the first subsidiary is applied per requirement.


