# SuiteX ↔ NetSuite Sync — Design, Requirements, Deliverables, and Runbook

**Created:** 2025-11-14
**Version:** 1
**Goal:** Reliable, near-real-time (target ≤ 1 minute) bidirectional sync between SuiteX and NetSuite that captures: UE-driven events, workflow/poller-detected changes, SuiteX-originated changes, and deletes. Ensure correctness through event-sourcing/delta model, three-way merging, conflict handling, robust error handling, auditability, and operational controls to respect NetSuite governance.

---

## Executive summary

This document describes an enterprise-grade architecture to merge four streams and sync state between SuiteX and NetSuite:

- UE events (User Event scripts inside NetSuite)
- Poller events (external polling of NetSuite via `lastModifiedDate` / system notes / getDeleted)
- SuiteX-originated outbound changes (create/update/delete)
- SuiteX custom-record staging + RESTlet workers (existing)

Primary principles:

1. **Event-first and deltas**: All changes are emitted as immutable events with `changes` (field-level deltas) and `baseVersion` (logical version). Avoid full-record snapshots as the primary channel.
2. **Single durable event stream**: Central event store (Kafka/SQS DB-backed events) partitions by `(account, recordType, recordId)` to preserve per-record ordering.
3. **Polling is authoritative for missed events**: Polling confirms and fills gaps; UE provides early visibility and reduces latency.
4. **Three-way merge with per-field policies**: Deterministic merges using `Base` (the version event was derived from), `LocalChange` (origin change) and `RemoteCurrent` (target latest). Use configurable per-field conflict policies.
5. **Operational controls**: Adaptive concurrency, batching, debounce/coalescing, and conflict queue/human review UI.

---

## High-level architecture

1. **Producers / Event sources**
   - NetSuite UE script: emits events on UI/API-originated actions (field-level deltas plus metadata). Writes to `events` queue/topic.
   - NetSuite Poller: discovery and detail fetch by `lastModifiedDate` + `(lastInternalId)` cursor, emits events for changes missed by UE. Writes to `events` queue/topic.
   - SuiteX change capture: when SuiteX creates/updates/deletes, produce events (field-deltas) to the same `events` queue/topic.
     - SuiteX change capture is implemented inside the SuiteX application layer (e.g., Laravel domain services). On commit of a transactional change to the tenant database, a `ChangeEvent` is produced and published to the durable backbone with an ordering key derived from `(account, recordType, recordId)`.
     - For SuiteX-originated changes, the workflow engine ensures that the domain write and the event publication are coordinated: either by transactional outbox pattern (preferred) or by explicit retry with idempotency keys.
     - SuiteX adds metadata such as tenant identifier, user context, and correlation IDs so downstream consumers can join events back to SuiteX audit logs and UI.
   - SuiteX snapshot fallback: when deltas can't be created, store a full snapshot and emit `fullSnapshotRef`.
     - Snapshot fallback is performed by SuiteX background workers that read from the same event backbone but are driven by reconciliation or migration jobs. These workers persist snapshots to object storage (for example, GCS) and persist a pointer in `fullSnapshotRef`.
     - SuiteX maintains a per-record configuration describing which fields are part of the synchronized surface area; snapshot producers respect that configuration so that `fullSnapshotRef` remains compatible with NetSuite-facing workloads.

2. **Durable Event Store**
   - Append-only store with partitioning: `account|recordType|recordId`.
   - Store event envelope (eventId, timestamp, source, baseVersion, changes, metadata).
   - In concrete implementation, SuiteX uses the external backbone (for example, Google Pub/Sub) for ordered delivery and a Cloud SQL-backed `events` table for durable, queryable history and idempotency tracking. Pub/Sub carries the live stream; Cloud SQL provides random-access history and analytics.

3. **Orchestrator / Sync Workers**
   - Consumer(s) read events, group by record and apply deterministic merge logic to the opposite system (NetSuite ↔ SuiteX).
   - Use optimistic concurrency (check remote version) and three-way merge if mismatch.
   - SuiteX implements orchestrator workers as stateless processes (for example, Laravel queue workers) that:
     - Read from Pub/Sub subscriptions bound to `events.raw` and `events.merged`.
     - Resolve tenant context based on `accountId` and SuiteX internal tenant mapping.
     - Persist and update `current_state` and conflict records inside the tenant-aware Cloud SQL database.

4. **Projection / Current State**
   - Maintain `current_state` projection per record (versioned). This is authoritative for the sync service.
   - Projections live inside SuiteX-managed Cloud SQL tables scoped by tenant. SuiteX exposes read-only APIs over this projection to internal services and admin tools, while NetSuite-side components continue to treat SuiteX as a black box accessed via RESTlet/REST APIs.

5. **Conflict Queue & Resolution UI**
   - Conflicts stored for manual or semi-automated resolution with change context and suggested merge.
   - SuiteX hosts the conflict queue and UI. NetSuite remains the source of UE and polling events but does not host any conflict UX. The SuiteX admin UI presents three-way merge context (Base, Local, Remote) and writes human decisions back as resolution events to the backbone.

6. **Monitoring & Alerting**
   - Metrics, traces, SLAs, and alerts for error rates, backlog growth, 429s, and conflict volume.
   - SuiteX-side observability is implemented via standard metrics (for example, Prometheus or GCP Monitoring), structured application logs, and distributed tracing from the Pub/Sub consumers through to SuiteX domain writes and NetSuite RESTlet calls.

7. **Operational Controls**
   - Debounce windows, per-account concurrency, adaptive throttling, and backfills.
   - SuiteX orchestrator workers enforce operational controls at the SuiteX layer by:
     - Using per-account concurrency pools when writing to NetSuite and SuiteX.
     - Applying debounce windows to coalesce high-frequency SuiteX UI updates before emitting outbound NetSuite writes.
     - Respecting NetSuite governance limits by honoring backoff and switching to queued writes when thresholds are exceeded, as detailed later in governance policies.

---

## Requirements (functional & non-functional)

### Functional
1. Capture all NetSuite updates (UE + workflow changes + mass updates + scheduled scripts + CSV + system) and SuiteX changes.
2. Emit immutable events representing field-level deltas with `baseVersion`.
3. Guarantee at-least-once delivery to the orchestration layer; idempotent application to targets.
4. Merge concurrent changes deterministically; configurable per-field policies.
5. Maintain audit trail and ability to reconstruct history.
6. Provide conflict queue and human-in-the-loop resolution UI.
7. Provide backpressure and throttling to respect NetSuite governance.
8. Support deletes and bulk operations safely.

### Non-functional
1. Target sync latency: ≤ 60s for typical workloads (subject to NetSuite limits). Provide SLAs per-customer profile.
2. System must be horizontally scalable (SuiteX side) with per-account throttles.
3. No silent failures — every error must be logged, retried, or escalated.
4. System must be observable with metrics, traces, and dashboards; retain logs for audits (configurable retention).
5. All external writes must be idempotent with idempotency keys persisted.

---

## Deliverables & Acceptance Criteria (staged)

### Stage 0 — Current-state validation & small refactors
**Deliverables**:
- Inventory of existing UE scripts, RESTlet workers, SuiteX payload table schema, and current snapshot flow.
- Test harness for generating concurrent SuiteX + NetSuite changes.

**Acceptance**:
- Inventory documented.
- Tests run successfully in sandbox demonstrating race conditions reproducible.

---

### Stage 1 — Event model & durable event store
**Deliverables**:
- Event schema defined (JSON envelope) and `events` table / Kafka topic created.
- Producers implemented: UE emitter, Poller emitter, SuiteX emitter.
- Lightweight event producer integration tests.

**Acceptance**:
- Events produced by all sources appear in the durable store with required fields (eventId, recordType, recordId, source, timestamp, baseVersion, changes).
- Test demonstrating replayability and ordering per-partition.

---

### Stage 2 — Projections & Versioning
**Deliverables**:
- `current_state` projection table and `version` logic.
- Initial migration of snapshot-based projection into `current_state`.
- API to fetch `current_state` by (account, recordType, recordId).

**Acceptance**:
- Projections reflect authoritative state after replaying events.
- Version increments deterministically after apply.

---

### Stage 3 — Orchestrator + Three-way merge
**Deliverables**:
- Worker that consumes events, fetches remote target state, runs three-way merge, applies merged changes via REST/SOAP, updates projection.
- Per-field conflict policy engine (config-driven).
- Idempotency & optimistic concurrency primitives.

**Acceptance**:
- Automated tests covering: no-concurrency, simple concurrency (non-conflicting fields), conflicting fields with configured policy, conflict queueing.
- End-to-end demo: SuiteX change + NetSuite poller change concurrently produce merged result and projection consistent.

---

### Stage 4 — Backpressure, batching & adaptive concurrency
**Deliverables**:
- Discovery & detail-fetch pipeline (two-phase) with cursor management.
- Batch partitioning & worker pools.
- Per-account adaptive concurrency/throttling module.
- Debounce/coalescing logic.

**Acceptance**:
- Load test demonstrating graceful handling of high-change rate (e.g., 10k changes/min) with per-account throttling preventing 429 storm.
- Configurable knobs for batch sizes, concurrency, debounce.

---

### Stage 5 — Conflict UI & human workflows
**Deliverables**:
- Conflict queue UI showing base snapshot, local change, remote current, suggested merge, timestamps, audit trail.
- APIs to resolve conflict (accept SuiteX, accept NetSuite, accept suggested merge, or edit then apply).
- Documented human SLA/ES escalation policy.

**Acceptance**:
- Manual resolution toggles work end-to-end and result is applied and projection updated.
- System prevents concurrent automated applies while record is in human review (locking semantics tested).

---

### Stage 6 — Observability, runbooks, and cutover
**Deliverables**:
- Dashboards: events/sec, backlog, 429s, conflicts, error rates per account.
- Alerts and runbooks for common failure modes.
- Migration plan for switching RESTlet snapshot workers to delta-driven workers.

**Acceptance**:
- Monitoring tests with synthetic errors trigger alerts and runbook actions.
- Successful cutover on canary account and verification of data consistency.

---

## Detailed design — components & contracts

### Event envelope (canonical)

```json
{
  "eventId": "uuid",
  "accountId": "acct-123",
  "recordType": "customer",
  "recordId": "12345",
  "eventType": "update", // create | update | delete
  "source": "SuiteX|UE|Poller",
  "timestamp": "2025-11-14T13:12:05Z",
  "baseVersion": "v123", // projection version the event was derived from
  "changes": { "fieldA": "newValue" },
  "fullSnapshotRef": "s3://.../snapshots/..." // optional
}
```

**Producer contract**:
- `baseVersion` must be present when available.
- `changes` should be minimal (field-level) when possible.
- Full snapshots allowed for schema changes or blob fields; use sparingly.


### Projection / current state

Schema (simplified):

```sql
CREATE TABLE current_state (
  account_id text,
  record_type text,
  record_id text,
  version bigint,
  state jsonb,
  last_modified timestamptz,
  PRIMARY KEY (account_id, record_type, record_id)
);
```

- `version` is monotonic per record (incremented on each authoritative apply).
- `state` contains only synced fields.


### Events table (immutable store)

Schema (simplified):

```sql
CREATE TABLE events (
  id uuid primary key,
  account_id text,
  record_type text,
  record_id text,
  source text,
  timestamp timestamptz,
  base_version bigint,
  changes jsonb,
  full_snapshot_ref text,
  processed boolean default false
);
```


### Conflict table

```sql
CREATE TABLE conflicts (
  conflict_id uuid primary key,
  account_id text,
  record_type text,
  record_id text,
  base_version bigint,
  our_event_id uuid,
  remote_version bigint,
  remote_snapshot jsonb,
  our_changes jsonb,
  status text,
  created_ts timestamptz default now()
);
```


### UE emitter (NetSuite) — contract & fallback
- UE script will emit field-deltas with `baseVersion` equal to projection version at the time of capture (if available via REST call). If projection version is not retrievable within UE constraints, include `lastModifiedDate` and let orchestrator infer base.
- UE emitter must be performant and only send deltas.


### Poller — discovery & detail fetch
- Cursor: `(lastModifiedDate, lastInternalId)` persisted per `(account, recordType)`.
- Discovery query returns `(internalId, lastModifiedDate)` sorted asc.
- Coalesce duplicate IDs per discovery window; detail fetch by getList/batched gets only required fields.
- Poller emits events that include `baseVersion` if projection version available; otherwise include `lastModifiedDate` and minimal snapshot.


### SuiteX outbound
- On user/app change, produce delta event to the event store and also insert into SuiteX payload table for compatibility.
- Create events for create/update/delete with `baseVersion` = local projection version.
- SuiteX outbound events are generated inside SuiteX domain services and use a transactional outbox pattern wherever possible:
  - Domain-layer code writes both the business change and an `outbox_events` row in the same database transaction.
  - A dedicated SuiteX background worker reads `outbox_events`, publishes `ChangeEvent` messages to the backbone, and marks outbox rows as dispatched.
  - If publishing fails, the outbox row remains and is retried without duplicating business effects.
- For legacy RESTlet snapshot workers, SuiteX continues to populate the existing payload table, but the new outbound pipeline additionally emits `ChangeEvent` deltas so both systems can run in parallel during migration.


### Orchestrator / Merge worker — algorithm summary
1. Read event E from durable store.
2. Load `Base` snapshot by `baseVersion` (from projection if available) or reconstruct from last known projection.
3. Fetch `RemoteCurrent` from target system with version.
4. If `remoteVersion == baseVersion`: apply `E.changes` directly.
5. Else: compute `remoteDelta = diff(Base, RemoteCurrent)` and `ourDelta = E.changes`. Run three-way merge per-field. If conflicts per policy -> enqueue to `conflicts`; else apply merged.
6. On apply success: update `current_state.version` and `current_state.state` atomically.
7. Emit audit log and metrics.


### Idempotency and optimistic concurrency
- Outbound applies must include idempotency keys (derived from eventId) so retries are safe.
- When writing to target, prefer APIs that accept conditional updates (e.g., REST `If-Match` style or check value before patch). If not available, implement read-verify-update pattern with transient lock/backoff.
- Within SuiteX, idempotency is enforced via:
  - `events` table primary keys and a dedicated `idempotency_keys` table recording `(target, key, first_seen_at, status)`.
  - Orchestrator workers checking whether an event has already been applied to SuiteX or NetSuite before attempting a write.
  - Explicitly storing NetSuite request identifiers (for example, in a SuiteX-side audit table) so duplicate NetSuite RESTlet calls can be detected and ignored.
- Optimistic concurrency is implemented by comparing projection versions and, for SuiteX database writes, by relying on version columns or `WHERE version = ?` updates. SuiteX never bypasses NetSuite’s own optimistic concurrency and governance constraints; it only adds SuiteX-side checks on top.


### SuiteX workflow engine & sync pipeline (implementation view)

This section describes how the above concepts map to concrete components inside SuiteX without changing any NetSuite responsibilities or constraints.

#### SuiteX workflow engine responsibilities

- Orchestrate the end-to-end flow of events through SuiteX, including:
  - Consuming raw events (`events.raw`) from the backbone.
  - Normalizing, deduplicating, and coalescing them into canonical `MergedEvent` records.
  - Deciding which actions need to be applied to SuiteX, which to NetSuite, and which are informational only.
  - Driving conflict detection and resolution, including locking records in SuiteX while human review is pending.
- Enforce per-tenant isolation by:
  - Identifying the SuiteX tenant from the `accountId` or mapping metadata on each event.
  - Executing all data access through tenant-scoped connections and models, consistent with SuiteX’s multi-tenant architecture.

#### Event ingestion and normalization (SuiteX side)

- SuiteX runs one or more stateless consumer services that:
  - Subscribe to the Pub/Sub `events.raw` topic using an ordering key of `"recordType:recordId"` to guarantee in-order delivery per record.
  - For each message, validate it against the `ChangeEvent` schema.
  - Persist the event into the `events` table inside Cloud SQL for durability and querying.
  - Use the `processEvent` logic (see Appendix F) to merge with any pending changes in an in-memory or Redis-backed buffer keyed by `(accountId, recordType, recordId)`.
- When the normalization window elapses or a flush condition is met (for example, idle time, count threshold, or explicit flag), the workflow engine:
  - Emits a `MergedEvent` onto the `events.merged` topic.
  - Clears the pending buffer for the key while leaving the immutable `events` table history intact.

#### SuiteX sync pipeline to NetSuite

- A dedicated NetSuite writer consumer in SuiteX:
  - Subscribes to `events.merged`.
  - Filters for events that require NetSuite writes (for example, SuiteX-originated updates or reconciliation corrections).
  - Applies governance-aware batching and throttling as per the governance policies described in this document.
  - Issues RESTlet or REST/SOAP calls to NetSuite with idempotency keys derived from `eventId` and record identifiers.
- For each successful write to NetSuite:
  - SuiteX updates the `current_state` projection for the record.
  - Records an audit log entry, including the NetSuite response identifiers, into an `event_audit_log` style table.
  - Marks any related idempotency keys as completed.
- On failure, the writer:
  - Classifies the error (transient, validation, conflict, operational) using the error taxonomy.
  - Routes the event to the appropriate queue (retry, error, conflict) without dropping the original message or violating NetSuite constraints.

#### SuiteX sync pipeline from NetSuite

- For NetSuite-originated UE and polling events:
  - The same normalization flow runs in SuiteX, but the resulting `MergedEvent` is interpreted as a change whose authoritative source is NetSuite.
  - When applying such changes to SuiteX:
    - Orchestrator workers update SuiteX’s tenant databases and `current_state` projections.
    - Any SuiteX-side workflows that should respond to these changes subscribe to internal SuiteX domain events rather than directly to Pub/Sub.
  - SuiteX honors NetSuite’s understanding of record ownership when conflict policies dictate that NetSuite’s values win for specific fields.

#### SuiteX record locking and conflict handling

- When the merge logic detects a conflict that cannot be auto-resolved:
  - SuiteX creates a conflict record in the `conflicts` table and marks the record as locked in a `record_lock` style table.
  - Automated writers for that record are paused in SuiteX (both to NetSuite and to SuiteX itself) while still accepting and queuing new incoming events on the backbone.
  - The admin UI allows SuiteX operators or customer admins to:
    - Inspect Base, SuiteX, and NetSuite values.
    - Choose a resolution strategy or edit a merged payload.
    - Apply the resolution, which generates a resolution event and clears the lock.

#### Relationship to existing SuiteX snapshot pipeline

- During migration:
  - The existing SuiteX snapshot-based RESTlet workers continue to run using their current staging tables.
  - The new event-driven pipeline runs alongside, with SuiteX comparing `current_state` against snapshot-based projections for a canary account.
  - Any drift is surfaced in SuiteX dashboards and can be corrected via reconciliation events.
- After cutover:
  - Snapshot workers become a fallback mechanism used only for recovery or emergency backfills.
  - The durable event pipeline (UE + polling + SuiteX events) becomes the authoritative path, fully implemented within SuiteX’s workflow engine and orchestrator services.


## Error handling & escalation (end-to-end)

### Error classes
1. **Transient errors**: network blips, HTTP 5xx, rate-limits (429) — retry with exponential backoff and jitter.
2. **Permanent errors**: authorization failure, invalid payload (400), missing required fields — route to error queue and notify.
3. **Conflict errors**: detected by three-way merge and not auto-resolvable by policy — route to conflict queue for human review.
4. **Operational errors**: DB failure, event store outages — failover to standby and alert on-call.


### Guarantees & behavior
- **At-least-once delivery** of events to orchestrator; **idempotent** application to target to avoid duplicate state changes.
- **No silent failures**: every error lands in an appropriate queue (retry/backoff, dead-letter, or conflict) and is logged.
- **Notifications**: defined levels with thresholds (see next).


### Notification & Human Intervention policy
**When to notify immediately (automated alerts):**
- Authorization failures for account writes (e.g., NetSuite token invalid) — immediate notification to account owner & on-call.
- Persistent 429s causing backlog growth beyond threshold (e.g., > 10k unprocessed events or > 1 hour backlog for high-priority record types).
- High conflict rates (> X% of events) or > N critical conflicts within T minutes.
- Permanent 4xx errors for more than Y occurrences (configurable) on the same account.

**Notification channels:** email + Slack (or chosen comms) to account owner; on-call alert.

**Human resolution process (start to finish):**
1. Conflict queued with full context (Base, our changes, remote current, suggested merged payload, links to audit).
2. Notifier triggers if conflict meets threshold or manual review required.
3. Human visits Conflict UI, reviews the differences, chooses action (Accept SuiteX, Accept NetSuite, Accept suggested merge, or Edit & Apply).
4. When human applies resolution: orchestrator applies resolved change to the target(s) and clears conflict. The `current_state.version` is incremented.
5. While waiting for human resolution, any new events for that record are accepted and appended to the event stream but are blocked from automated apply until conflict resolved (record-level lock). They may be merged into the conflict payload for context.
6. If new events occur while waiting, the conflict UI should show a timeline and allow human to rebase suggested merge onto the latest base (or escalate to higher trust/rollback options).

**Escalation**: If a conflict remains unresolved beyond the human SLA window (e.g., 24 hours), escalate to a higher authority and optionally pause automated writes for that account or mark the field(s) read-only until resolved.


## Handling simultaneous changes while waiting for human intervention
- Lock record for automated applies while in state `IN_CONFLICT`.
- Accept new events and append them to event stream (they will not be auto-applied). When human resolves, orchestrator will compute merge relative to the latest remote snapshot and all queued events.
- Optionally enable a policy to auto-resolve low-risk fields while blocking only high-risk fields (granular field-level locking).


## Backfill & reconciliation
- Periodic full reconciliation job per account or record type (e.g., nightly) that diffs authoritative NetSuite state vs `current_state` projection and emits reconciliation events.
- Provide a customer-initiated resync process and an admin backfill tool to repair projection mismatches.


## Operational runbook (common incidents)

1. **Spike in 429s / governance**
   - Auto-scale down per-account concurrency; pause low-priority record types; alert account owner; schedule backfill.
2. **Persistent auth failure**
   - Immediately notify account owner; mark account suspended for outbound writes; retry token refresh flows; block applies until auth fixed.
3. **Conflict surge**
   - Notify product owner; route non-critical conflicts to auto-resolve policies; escalate critical conflicts.
4. **Event store outage**
   - Failover to standby queue; if unavailable, pause producers and queue locally; alert on-call.


## Security & data protection
- All external writes use TLS and stored secrets are rotated.
- Event and projection stores encrypted at rest.
- Access to conflict UI and escalation controls require RBAC and audit trails.


## Metrics & dashboards
- Events/sec, per-source
- Apply latency (event->applied)
- Conflicts/sec and conflict backlog
- Unprocessed backlog per account
- 429/error rate per account
- Retry counts and dead-letter rate


## Migration plan (from current snapshot-style)
1. Implement event store and producers in parallel with current snapshot pipeline.
2. Start producing delta events from SuiteX (and UE/poller) in addition to snapshots.
3. Implement projection from event store and validate parity with snapshot projection on a canary account.
4. Switch RESTlet workers to consume events and perform three-way merges; keep snapshot pipeline as fallback.
5. Monitor metrics, tune, and gradually widen canary roll-out.
6. Retire snapshot-only pathway after X stable days.


## Acceptance criteria (overall)
- For canary account, achieve consistent data parity for 7 days with event-based sync and <1 minute median apply latency for non-throttled record types.
- Conflict rate below acceptable threshold (configurable) or explained by business rules.
- System handles synthetic load test (configurable target e.g., 10k changes/min) without unbounded backlog growth when throttled.
- No silent failures: all errors flow to DLQ or conflict queue and are visible on dashboard.

---

# SuiteX ↔ NetSuite Sync — Technical Supplement

This supplement contains all implementation-level references designed for AI agent consumption, including event backbone details, sequence diagrams (Mermaid format), orchestration pseudocode, ERDs, schemas, configuration specs, and error taxonomy.

## Appendix A — Durable Event Backbone Architecture

## **A1. Purpose**

This appendix describes the proposed event backbone used to merge, normalize, and deliver change events between SuiteX and NetSuite. The main document defines system behavior; this appendix defines the infrastructure supporting that behavior.

The event backbone provides:

* Durable, replayable event storage
* Ordered delivery per record
* Fan-out to multiple consumers (merging, conflict resolution, sync, reconciliation, DLQ handling)
* Unified ingestion for NetSuite UE events, NetSuite polling events, and SuiteX-originated updates

The backbone is hosted **outside** both systems in our cloud environment.

---

## **A2. Platform Selection**

Given the current hosting on Google Cloud (Ubuntu app servers, Cloud SQL, Redis, GCS, etc.), the recommended durable log is:

**Google Pub/Sub**

Justification:

* Managed service with no cluster maintenance
* Horizontal scaling up to millions of messages
* At-least-once delivery with optional ordering keys
* Dead-letter topics
* Ability to replay messages (seek by time or snapshot)
* IAM-integrated security
* Works seamlessly with Cloud Run, GCE, and Cloud Functions

(Kafka or AWS Kinesis would serve the same role if environments changed.)

---

## **A3. Event Producers**

### **NetSuite UE Producer**

Triggered by User Event scripts (afterSubmit).

* UE sends minimal payload to a SuiteScript RESTlet.
* RESTlet publishes event to Pub/Sub via secure HTTPS endpoint.

### **NetSuite Polling Producer**

Runs every minute from SuiteX.

* Polls `lastmodified`-based windows.
* Emits only record deltas (after normalization).
* Publishes change events to the event bus.

### **SuiteX Producer**

On any SuiteX-originated update (create/update/delete):

* Emits a change event to the bus before or after writing to internal DB (depending on desired consistency model).

All three producers publish to the **same topic**.

---

## **A4. Event Schema**

All events use a common envelope:

```
{
  "recordType": "customer" | "item" | "salesorder" | ...,
  "recordId": "12345",
  "source": "netsuite-ue" | "netsuite-poll" | "suitex",
  "operation": "create" | "update" | "delete",
  "timestamp": "2025-11-14T20:11:00Z",
  "orderingKey": "customer:12345",
  "authContext": {...},
  "payload": {
      // Full or partial change set depending on source
  }
}
```

### Ordering Keys

Using `"orderingKey": "<type>:<id>"` ensures all events for the same record stay in sequence.

---

## **A5. Event Consumers**

Pub/Sub supports multiple independent subscribers. Our architecture needs:

### 1. **Merge / Normalize Consumer**

* Reads all events
* Deduplicates
* Coalesces multiple updates within a small window
* Produces a canonical “merged change event” to an internal topic

### 2. **NetSuite Sync Consumer**

* Consumes merged events requiring writes to NetSuite
* Handles governance-aware batching
* Retries with backoff
* Pushes failures to DLQ + SuiteX human-review workflow

### 3. **SuiteX Sync Consumer**

* Applies NetSuite-originated merged changes to SuiteX
* Idempotent updates
* Triggers rehydration on partial failures

### 4. **Reconciliation / Backfill Consumer**

* Runs in background
* Scans for drift
* Writes “correction events” back into the bus

### 5. **DLQ Handler**

* Consumes Pub/Sub dead-letter topic
* Escalates via SuiteX UI notifications (errors requiring human review)
* Locks the affected record to prevent further writes while manual resolution is pending

---

## **A6. Persistence Layer**

Beyond Pub/Sub itself:

### **Cloud SQL**

* Stores sync watermarks
* Tracks polling state
* Logs conflict-resolution decisions
* Stores human-review queues
* Maintains record-level locks during manual intervention

### **GCS/BigQuery** (optional)

* Stores all historic events for audit
* Analytics on sync patterns, volume, failure rates

---

## **A7. Security & IAM**

* Producers use Pub/Sub HTTPS push endpoints with restricted service accounts
* Least-privilege IAM roles
* All event payloads encrypted in transit (TLS 1.2/1.3) and at rest (GCP KMS-managed)
* Access logs stored in Cloud Logging

---

## **A8. Failure Handling**

### **At the Event Producer**

* If NetSuite UE cannot send event → UE logs local error, retries via RESTlet, fallback to polling detection
* If polling producer fails → next iteration recovers using watermark checkpoint

### **At the Consumer**

* Automatic retry with exponential backoff
* DLQ after threshold exceeded
* Manual intervention process initiated

### **Manual-Review Workflow**

1. Consumer writes failure context to SuiteX “Sync Error” table
2. Record locked from further writes (except UI override)
3. Notification issued via email / in-app notification
4. User resolves conflict or corrects data
5. Unlock flow publishes “resolution event” back into Pub/Sub
6. Sync resumes

---

## **A9. Why This Backbone Is Required**

* UE events are fast but incomplete
* Polling is reliable but slow and high-volume
* SuiteX events happen independently
* You need a single timeline for each record
* NetSuite’s governance limitations require intelligent retry & batching
* Event replay for audits and fixes is only possible with a durable log

Without an external durable log, conflict resolution and merging logic are fragile or impossible.

---

## **A10. Summary**

The durable event backbone is the infrastructure layer enabling enterprise-grade sync behavior between SuiteX and NetSuite:

* Hosted in GCP (not in SuiteX or NetSuite)
* Captures all changes from both systems
* Merges, deduplicates, and sequences events
* Enables consistent, low-latency bidirectional sync
* Supports error recovery, manual review, and replay

## Appendix B — Mermaid Sequence Diagrams

### B1. NetSuite UE → Event Backbone → Merge → Sync

```mermaid
sequenceDiagram
    participant UE as NetSuite UE Script
    participant RL as RESTlet Endpoint
    participant EB as Event Backbone (Pub/Sub)
    participant MG as Merge Service
    participant SX as SuiteX Sync Service
    participant NS as NetSuite Sync Service

    UE->>RL: POST Change Event
    RL->>EB: Publish Event
    EB->>MG: Deliver Event
    MG->>MG: Normalize & Merge
    MG->>SX: Emit SuiteX-bound Updates
    MG->>NS: Emit NetSuite-bound Updates
    SX->>SX: Apply to SuiteX
    NS->>NS: Apply to NetSuite
```

### B2. Polling Cycle

```mermaid
sequenceDiagram
    participant PL as SuiteX Poller
    participant NS as NetSuite Records
    participant EB as Event Backbone

    PL->>NS: Query lastmodified >= watermark
    NS->>PL: Return changed records
    PL->>EB: Publish delta events
    EB->>MG: Merge as normal
```

### B3. SuiteX → NetSuite Write Cycle

```mermaid
sequenceDiagram
    participant SX as SuiteX App
    participant EB as Event Backbone
    participant MG as Merge Service
    participant NS as NetSuite Sync Service

    SX->>EB: Publish create/update/delete event
    EB->>MG: Deliver
    MG->>NS: Emit canonical NetSuite write
    NS->>NS: Perform governance-aware write
```

### B4. Conflict Handling

```mermaid
sequenceDiagram
    participant MG as Merge Service
    participant CQ as Conflict Queue
    participant UI as SuiteX Admin UI
    participant EB as Event Backbone

    MG->>CQ: Submit conflict record
    UI->>CQ: Fetch conflict
    UI->>UI: Human resolution
    UI->>EB: Publish resolution event
    EB->>MG: Resume merging
```

## Appendix C — Canonical Data Models (JSON Schema)

### C1. ChangeEvent Schema

```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "ChangeEvent",
  "type": "object",
  "required": ["recordType", "recordId", "source", "operation", "timestamp", "orderingKey", "payload"],
  "properties": {
    "recordType": {"type": "string"},
    "recordId": {"type": "string"},
    "source": {"enum": ["netsuite-ue", "netsuite-poll", "suitex"]},
    "operation": {"enum": ["create", "update", "delete"]},
    "timestamp": {"type": "string", "format": "date-time"},
    "orderingKey": {"type": "string"},
    "payload": {"type": "object"}
  }
}
```

### C2. MergedEvent Schema

```json
{
  "title": "MergedEvent",
  "type": "object",
  "required": ["recordType", "recordId", "changes", "sourceEvents"],
  "properties": {
    "recordType": {"type": "string"},
    "recordId": {"type": "string"},
    "changes": {"type": "object"},
    "sourceEvents": {
      "type": "array",
      "items": {"$ref": "#/definitions/changeEvent"}
    }
  }
}
```

### C3. ConflictRecord Schema

```json
{
  "title": "ConflictRecord",
  "type": "object",
  "properties": {
    "recordType": {"type": "string"},
    "recordId": {"type": "string"},
    "field": {"type": "string"},
    "suiteXValue": {},
    "netSuiteValue": {},
    "timestamp": {"type": "string", "format": "date-time"}
  }
}
```

## Appendix D — ERD (Mermaid Format)

```mermaid
erDiagram
    sync_watermark {
        string record_type
        datetime last_polled
        datetime last_event_seen
    }

    event_audit_log {
        string event_id
        string record_type
        string record_id
        string source
        json payload
        datetime created_at
    }

    sync_error_queue {
        string id
        string record_type
        string record_id
        string reason
        json details
        datetime created_at
        string status
    }

    record_lock {
        string record_type
        string record_id
        boolean locked
        datetime locked_at
        string reason
    }
```

## Appendix E — Topic & Consumer Map

### Topics

* `events.raw` — UE, polling, SuiteX producers
* `events.merged` — output of merge service
* `events.error` — retry-able errors
* `events.dlq` — dead-letter

### Consumers

* `merge-service` — normalizes and produces canonical merged events
* `netsuite-writer` — governance-aware writing
* `suitex-writer` — apply NS → SX updates
* `reconciliation-service` — background drift correction
* `error-handler` — retries + escalation

## Appendix F — Orchestrator Pseudocode

### F1. Merge Orchestrator

```
function processEvent(event):
    key = event.recordType + ":" + event.recordId

    existing = readPendingChanges(key)

    if existing:
        merged = merge(existing, event)
    else:
        merged = event

    writePendingChanges(key, merged)

    if shouldFlush(key):
        emitMergedEvent(merged)
        clearPending(key)
```

### F2. NetSuite Writer

```
function writeToNetSuite(mergedEvent):
    if recordLocked(mergedEvent.recordId):
        queueForLater(mergedEvent)
        return

    try:
        response = netsuiteApi.update(mergedEvent)
    catch GovernanceLimit:
        retryWithBackoff(mergedEvent)
    catch ValidationError e:
        pushToErrorQueue(mergedEvent, e)
    catch Exception e:
        sendToDLQ(mergedEvent, e)
```

### F3. Human-Intervention Flow

```
onErrorQueueItem(item):
    createLock(item.recordId)
    notifyAdmins(item)

adminResolvesConflict(item, resolution):
    applyResolution(item, resolution)
    publishResolutionEvent(item)
    releaseLock(item.recordId)
```

## Appendix G — Error Taxonomy

### Transient Errors

* NetSuite governance exceeded
* Timeout
* Network issues

### Validation Errors

* Required fields missing
* Business rules violated

### Conflict Errors

* SuiteX vs NetSuite concurrent update collision

### Irrecoverable Errors

* Record deleted but referenced
* Corrupted payload

## Appendix H — Governance-Aware NetSuite Policies

* Maximum batch size: 10 writes/operation
* Backoff schedule: 5s, 15s, 30s, 60s, 5m
* Switch to queued writes after 3 consecutive failures
* Cap governance at 50% of daily budget per integration

## Appendix I — Configuration & Environment Spec

* `PUBSUB_TOPIC_RAW`
* `PUBSUB_TOPIC_MERGED`
* `NETSUITE_RESTLET_URL`
* `NETSUITE_ACCOUNT`
* `NETSUITE_SCRIPT_ID`
* `NETSUITE_DEPLOY_ID`
* `POLL_INTERVAL_SECONDS`
* `MAX_RETRY_ATTEMPTS`
* `GOVERNANCE_THRESHOLD`

---

*End of design document.*

## Change Log – V1

- **Version tag**: Updated the document metadata to `Version: 1` while preserving the original title and goals.
- **High-level architecture**: Expanded the SuiteX-specific change capture and snapshot fallback bullets, and clarified that SuiteX uses Pub/Sub plus Cloud SQL for durable event storage while keeping NetSuite producers and constraints unchanged.
- **SuiteX outbound & idempotency**: Elaborated the `SuiteX outbound` section to describe a transactional outbox pattern, SuiteX payload compatibility, and concrete idempotency primitives (idempotency key tracking, audit logs, and projection-aware writes) without altering any NetSuite behavior.
- **Workflow engine & sync pipeline**: Added a new section `SuiteX workflow engine & sync pipeline (implementation view)` detailing SuiteX’s event ingestion, normalization, merge, conflict handling, and bidirectional sync pipelines, explicitly mapping them onto SuiteX’s multi-tenant Cloud SQL, Redis, Pub/Sub, and worker processes.
- **Conflict handling & migration narrative**: Clarified SuiteX’s responsibility for conflict UI, record locking, and coexistence with the existing snapshot pipeline during migration, ensuring the overall pipeline remains technically implementable inside SuiteX and fully compatible with all existing NetSuite-specific sections and constraints.

