# Epic 1: Gap Analysis Against V10 Design Document

**Date:** 2026-03-12
**Scope:** Comparison of all Epic 1 files (`design-spec.md`, `tech-spec.md`, `jira-ticket.md`, `jira-ticket-2.md`) against `docs/designs/data-sync/V10.md`.

---

## Overall Assessment

The Epic 1 files accurately capture the core intent of V10's Anti-Corruption Layer: strict JSON Schema validation, field-level normalization for NetSuite data type quirks, a multi-tenant `field_metadata` table, and two-stage validation. The normalization rules (checkbox, date, multiselect) directly implement V10 Appendix K requirements. The `field_metadata` DDL matches V10 Appendix P2.2.

However, the envelope schema has several field-level discrepancies against V10, and one V10 internal inconsistency that the epic needs to resolve explicitly rather than inherit silently. Each gap below includes the V10 reference, the current state in Epic 1, and the required correction.

---

## Gap 1: Missing `orderingKey` Field (Critical)

**V10 Reference:** Appendix C1 ChangeEvent Schema (line 878) lists `orderingKey` as a **required** field. Appendix A4 (line 684) defines its format as `"<type>:<id>"`. Appendix N and the durable merge model depend on this field for Pub/Sub per-record ordering.

**Current State:** The Epic 1 envelope schema in `tech-spec.md` does not include `orderingKey` at all -- not as required, not as optional.

**Impact:** Without `orderingKey`, Pub/Sub cannot enforce per-record message ordering. Events for the same record may arrive out of order, corrupting the `current_state` projection and breaking the three-way merge.

**Required Change:** Add `orderingKey` to the Canonical Envelope Schema as a **required** string field:

```json
"orderingKey": {
  "type": "string",
  "pattern": "^[a-z_]+:[a-zA-Z0-9_-]+$",
  "description": "Pub/Sub ordering key in the format 'recordType:recordId'. Required for per-record serial processing."
}
```

Add `"orderingKey"` to the `required` array.

**Affected Files:** `tech-spec.md` (envelope schema), `design-spec.md` (requirements section A), `jira-ticket.md` (technical requirements).

---

## Gap 2: Missing `reconciliation` Source Value

**V10 Reference:** The durable merge model (lines 355-357) defines four event source classes:
- `suitex`
- `netsuite-ue`
- `netsuite-poll`
- `reconciliation` (originating from SuiteX reconciliation/backfill jobs)

**Current State:** The Epic 1 envelope schema defines `source` as `enum: ["netsuite-ue", "netsuite-poll", "suitex"]` -- three values, missing `reconciliation`.

**Impact:** Reconciliation/backfill events cannot be published to `events.raw` without failing Stage 1 structural validation. V10 grants reconciliation events **highest priority** when correcting drift (line 406), so blocking them at the schema level is a functional gap.

**Required Change:** Update the `source` enum:

```json
"source": {
  "type": "string",
  "enum": ["netsuite-ue", "netsuite-poll", "suitex", "reconciliation"]
}
```

**Affected Files:** `tech-spec.md` (envelope schema), `jira-ticket.md`.

---

## Gap 3: `actorId` Should Be in v1, Not v1.1

**V10 Reference:** The canonical envelope (line 203) includes `actorId: "8832"` as a top-level field alongside all other required fields in the same schema version (`v1.1`).

**Current State:** `jira-ticket-2.md` correctly identifies the missing `actorId` but proposes adding it as a v1.1 schema migration. The base Epic 1 schema in `tech-spec.md` ships v1 without it.

**Impact:** Since neither the schema nor any emitters have been deployed yet, introducing a schema version bump before the first release creates unnecessary versioning complexity. The merge service would need to support both v1 and v1.1 simultaneously from day one, which contradicts V10's schema versioning strategy (P2.9: "Deploy consumer supporting both versions... Deprecate old version after 30 days"). There is nothing to deprecate if v1 was never in production.

**Required Change:** Fold `actorId` directly into the v1 schema as a required field. Retire `jira-ticket-2.md` as a separate work item and incorporate its requirements into the base Epic 1 deliverables.

```json
"actorId": {
  "type": "string",
  "minLength": 1,
  "description": "The SuiteX user ID of the actor who triggered the change, or 'system' for automated processes."
}
```

Add `"actorId"` to the `required` array.

**Affected Files:** `tech-spec.md` (envelope schema), `design-spec.md` (requirements section A), `jira-ticket.md` (technical requirements). `jira-ticket-2.md` becomes obsolete.

---

## Gap 4: `eventType` vs `operation` Naming Inconsistency

**V10 Reference:** V10 itself is internally inconsistent on this field name:
- Main document canonical envelope (line 195): `"eventType": "update"`
- Appendix C1 ChangeEvent Schema (line 883): `"operation": {"enum": ["create", "update", "delete"]}`
- Idempotency key format (P2.1, line 2440): `<source>:<recordType>:<recordId>:<eventTimestamp>:<operation>`
- UE script event emission (Appendix J, line 1181): `operation: mapContextType(context.type)`

Three of V10's four references use `operation`. The main envelope example appears to be an earlier naming that was superseded by the more specific `operation` in subsequent appendices.

**Current State:** Epic 1 uses `eventType` (matching the main envelope example). This works but will create a naming mismatch with the idempotency key format and the UE/polling emitters built in later epics.

**Required Change:** Rename `eventType` to `operation` in the envelope schema for consistency with V10's ChangeEvent schema, idempotency key format, and emitter contracts. The enum values remain `["create", "update", "delete"]`.

```json
"operation": {
  "type": "string",
  "enum": ["create", "update", "delete"],
  "description": "The type of mutation. Aligns with V10 Appendix C1 and the canonical idempotency key format."
}
```

Update the `required` array to reference `"operation"` instead of `"eventType"`.

**Affected Files:** `tech-spec.md` (envelope schema), `design-spec.md` (requirements section A), `jira-ticket.md` (technical requirements).

---

## Gap 5: `source` vs `sourceSystem` Distinction Not Documented

**V10 Reference:** V10 uses two separate fields with different semantics:
- `source` (line 196, C1 line 882): The **physical origin** of event emission -- which technical component published the event. Values: `netsuite-ue`, `netsuite-poll`, `suitex`, `reconciliation`.
- `sourceSystem` (line 201, Circular Update appendix section 2): The **logical attribution** -- who initiated the business change. Values: `suitex`, `netsuite`, `workflow`, `user`.

These are intentionally different. A NetSuite UE script (`source = netsuite-ue`) might emit an event where the logical actor was a workflow (`sourceSystem = workflow`) or a human user (`sourceSystem = user`).

**Current State:** The Epic 1 schema includes both fields with correct enums, but neither `design-spec.md` nor `tech-spec.md` explicitly documents why two seemingly overlapping fields exist or how they differ. A developer implementing emitters could confuse them or assume one is redundant.

**Required Change:** Add a documentation section to `tech-spec.md` explaining the `source` / `sourceSystem` distinction:

> **`source`** identifies the technical emission pathway (which producer published the event). It tells the merge service *how* the event entered the system.
>
> **`sourceSystem`** identifies the logical business actor (who initiated the change). It tells the circular update prevention layer *why* the change happened and whether it should be suppressed.
>
> Example: A NetSuite workflow updates a record. The UE script fires and publishes the event. The event has `source: "netsuite-ue"` (because the UE script emitted it) and `sourceSystem: "workflow"` (because a workflow initiated the change, not a human user or SuiteX).

**Affected Files:** `tech-spec.md`, `design-spec.md`.

---

## Gap 6: `patternProperties` Regex Only Covers One Custom Field Prefix

**V10 Reference:** Appendix K7 (lines 1687-1692) identifies five NetSuite custom field prefixes:
- `custbody_` (body-level / transaction fields)
- `custentity_` (entity-level fields -- Customer, Vendor, Employee)
- `custitem_` (item-level fields)
- `custcol_` (column/line-level fields)
- `custevent_` (CRM event fields)

**Current State:** The tech-spec's Customer example schema only includes `^custentity_[a-zA-Z0-9_]+$` in `patternProperties`. The deliverables scope to Project and ProjectTask, which would use `custbody_` (body-level project fields), not `custentity_`.

**Impact:** Payload schemas for Project and ProjectTask would reject legitimate custom fields with `custbody_` prefix at Stage 1 validation, routing valid events to the DLQ.

**Required Change:** Each payload schema must include `patternProperties` entries for all applicable custom field prefixes for that record type. For Project and ProjectTask (body-level transaction-adjacent records):

```json
"patternProperties": {
  "^custbody_[a-zA-Z0-9_]+$": {
    "description": "Body-level custom fields. Type validation deferred to Stage 2."
  },
  "^custcol_[a-zA-Z0-9_]+$": {
    "description": "Column-level custom fields (sublists). Type validation deferred to Stage 2."
  }
}
```

The tech-spec should include a reference table mapping record types to their applicable custom field prefixes so future payload schemas are built correctly.

**Affected Files:** `tech-spec.md` (payload schemas and example).

---

## Gap 7: `baseVersion` Type Mismatch Between V10 Sections

**V10 Reference:** V10 is internally inconsistent on the type of `baseVersion`:
- Main envelope (line 198): `"baseVersion": "v123"` (string with `v` prefix)
- Events table DDL (line 249): `base_version bigint`
- current_state table DDL (line 224): `version bigint`
- Orchestrator algorithm (line 304): Compares `remoteVersion == baseVersion` (implies numeric comparison)

**Current State:** The Epic 1 tech-spec defines `baseVersion` as `"type": "string"`, matching the main envelope example.

**Impact:** If `baseVersion` is a string in the JSON schema but `bigint` in the database, every read/write requires type coercion. More critically, the three-way merge algorithm performs equality and ordering comparisons on versions -- string comparison of `"v123"` vs `"v124"` is lexicographic, not numeric, which breaks version ordering for values like `"v9"` vs `"v10"`.

**Required Change:** Align on one type. Given that the database stores versions as `bigint` and the merge algorithm requires numeric comparison, `baseVersion` should be an integer in the JSON schema:

```json
"baseVersion": {
  "type": ["integer", "null"],
  "minimum": 0,
  "description": "The current_state projection version this event was derived from. Null if unavailable (e.g., first sync or UE without projection access). Used for three-way merge base resolution."
}
```

This is a V10 internal inconsistency that Epic 1 must resolve since it is defining the contract that all downstream systems will use.

**Affected Files:** `tech-spec.md` (envelope schema).

---

## Gap 8: `fullSnapshotRef` References S3 but Infrastructure Uses GCS

**V10 Reference:** The main envelope example (line 200) shows `"fullSnapshotRef": "s3://.../snapshots/..."`. However, V10's infrastructure section (Appendix A2, line 618-620) specifies Google Cloud Platform, and the snapshot fallback description (line 40) says "object storage (for example, GCS)."

**Current State:** Epic 1's tech-spec carries the `fullSnapshotRef` field without specifying the URI scheme. The field description says "Optional URI pointing to a full state backup in cloud storage."

**Impact:** Minor. The `s3://` prefix in V10's example appears to be a placeholder or copy error. Since SuiteX runs on GCP, the actual URI scheme will be `gs://`.

**Required Change:** Document in the tech-spec that `fullSnapshotRef` uses `gs://` URI scheme for GCS:

```json
"fullSnapshotRef": {
  "type": "string",
  "pattern": "^gs://",
  "description": "GCS URI pointing to a full state snapshot. Used when deltas cannot be reliably generated."
}
```

**Affected Files:** `tech-spec.md`.

---

## Gap 9: `field_metadata` DDL Missing Database Placement and Defaults

**V10 Reference:** V10's P2.2 (line 2449-2461) describes `field_metadata` as a "central field metadata table scoped by tenant." Core project standards (line 68-74) require tenant-specific migrations in `database/migrations/tenants/` while core tables go in `database/migrations/`.

**Current State:** The tech-spec provides the DDL but does not specify:
- Which database it belongs to (root vs. tenant)
- NOT NULL constraints
- DEFAULT values for boolean columns
- Index strategy for query patterns used by Stage 2 validation

**Impact:** Ambiguity in migration placement. The table is centralized with `account_id` as a discriminator (not one table per tenant), so it belongs in the **root database** (`database/migrations/`), not in tenant migrations.

**Required Change:** Clarify in the tech-spec:
1. Migration placement: `database/migrations/` (root database, centralized)
2. Add NOT NULL constraints and defaults:

```sql
CREATE TABLE field_metadata (
    account_id text NOT NULL,
    record_type text NOT NULL,
    field_id text NOT NULL,
    field_type text NOT NULL,
    is_synced boolean NOT NULL DEFAULT false,
    is_readonly boolean NOT NULL DEFAULT false,
    normalization_rule text,
    conflict_policy text NOT NULL DEFAULT 'manual',
    PRIMARY KEY (account_id, record_type, field_id)
);

CREATE INDEX idx_field_metadata_sync_lookup
    ON field_metadata (account_id, record_type)
    WHERE is_synced = true;
```

The `DEFAULT 'manual'` for `conflict_policy` follows V10's guidance that `manual` is the safe default for critical fields (line 404).

**Affected Files:** `tech-spec.md`, `jira-ticket.md`.

---

## Gap 10: No Mention of Computed/Formula Field Exclusion Enforcement

**V10 Reference:** Appendix K8 (lines 1694-1703) states: "Do not include [formula/computed fields] in outbound change events. Mark as read-only in field mappings. SuiteX should never attempt to write these fields."

**Current State:** The tech-spec briefly mentions "Computed/Formula Fields: System-calculated fields must be excluded entirely from outbound change events." However, there is no enforcement mechanism defined. The `is_readonly` column in `field_metadata` exists, but the spec does not describe how Stage 2 validation uses it to reject computed fields in outbound `changes` payloads.

**Required Change:** Add a validation rule to the Two-Stage Validation section:

> **Stage 2 — Computed Field Rejection (Outbound Only):** For events with `source = "suitex"`, the validator must cross-reference every key in the `changes` object against `field_metadata`. If any key maps to a field where `is_readonly = true`, the validator must strip that field from the payload and log a warning. This prevents SuiteX from attempting to overwrite NetSuite-computed fields (e.g., transaction totals, formula fields).

This is distinct from DLQ routing -- readonly fields in outbound events are stripped silently (with a log), not treated as permanent errors.

**Affected Files:** `tech-spec.md`, `design-spec.md` (functional requirements section).

---

## Summary of Required Changes

| Gap | Severity | Files Affected | Change Type |
|-----|----------|----------------|-------------|
| 1. Missing `orderingKey` | **Critical** | tech-spec, design-spec, jira-ticket | Add required field |
| 2. Missing `reconciliation` source | **High** | tech-spec, jira-ticket | Extend enum |
| 3. `actorId` in v1 not v1.1 | **Medium** | tech-spec, design-spec, jira-ticket, jira-ticket-2 | Fold into v1; retire addendum |
| 4. `eventType` → `operation` | **Medium** | tech-spec, design-spec, jira-ticket | Rename field |
| 5. `source`/`sourceSystem` docs | **Medium** | tech-spec, design-spec | Add documentation |
| 6. `patternProperties` coverage | **Medium** | tech-spec | Add prefixes per record type |
| 7. `baseVersion` type alignment | **High** | tech-spec | Change string → integer |
| 8. `fullSnapshotRef` URI scheme | **Low** | tech-spec | Clarify gs:// |
| 9. `field_metadata` DDL details | **Medium** | tech-spec, jira-ticket | Add constraints, placement |
| 10. Computed field enforcement | **Medium** | tech-spec, design-spec | Add validation rule |

---

## Corrected Canonical Envelope Schema (v1)

Incorporating all gaps above, the corrected envelope schema for Epic 1 should be:

```json
{
  "$schema": "http://json-schema.org/draft-07/schema#",
  "title": "CanonicalChangeEventEnvelope",
  "description": "The universal wrapper for all SuiteX <-> NetSuite synchronization events. Enforces the Anti-Corruption Layer and circular update prevention.",
  "type": "object",
  "required": [
    "schemaVersion",
    "eventId",
    "accountId",
    "recordType",
    "recordId",
    "operation",
    "source",
    "timestamp",
    "orderingKey",
    "changes",
    "sourceSystem",
    "writeId",
    "actorId"
  ],
  "properties": {
    "schemaVersion": {
      "type": "string",
      "enum": ["v1"],
      "description": "Enables zero-downtime schema evolution."
    },
    "eventId": {
      "type": "string",
      "format": "uuid",
      "description": "Unique identifier for this specific event instance."
    },
    "accountId": {
      "type": "string",
      "description": "The tenant identifier, used to route to the correct database and fetch tenant-specific field_metadata."
    },
    "recordType": {
      "type": "string",
      "description": "The standard NetSuite record type (e.g., 'project', 'projecttask', 'customer')."
    },
    "recordId": {
      "type": "string",
      "description": "The NetSuite internalId or the SuiteX primary key."
    },
    "operation": {
      "type": "string",
      "enum": ["create", "update", "delete"],
      "description": "The type of mutation. Aligns with V10 Appendix C1 and the canonical idempotency key format."
    },
    "source": {
      "type": "string",
      "enum": ["netsuite-ue", "netsuite-poll", "suitex", "reconciliation"],
      "description": "The physical origin of event emission (which producer published the event)."
    },
    "timestamp": {
      "type": "string",
      "format": "date-time",
      "description": "ISO8601 UTC timestamp of when the change occurred."
    },
    "orderingKey": {
      "type": "string",
      "pattern": "^[a-z_]+:[a-zA-Z0-9_-]+$",
      "description": "Pub/Sub ordering key in the format 'recordType:recordId'. Required for per-record serial processing."
    },
    "baseVersion": {
      "type": ["integer", "null"],
      "minimum": 0,
      "description": "The current_state projection version this event was derived from. Null if unavailable."
    },
    "changes": {
      "type": "object",
      "description": "Minimal field-level deltas containing normalized values. Schema details delegated to record-specific payload schemas."
    },
    "fullSnapshotRef": {
      "type": "string",
      "pattern": "^gs://",
      "description": "GCS URI pointing to a full state snapshot. Used when deltas cannot be reliably generated."
    },
    "sourceSystem": {
      "type": "string",
      "enum": ["suitex", "netsuite", "workflow", "user"],
      "description": "Logical attribution: who initiated the business change. Used by circular update prevention. Distinct from 'source' which identifies the technical emission pathway."
    },
    "writeId": {
      "type": "string",
      "format": "uuid",
      "description": "Unique identifier for the atomic write across systems. Used for idempotency and circular loop detection via the Write Ledger."
    },
    "actorId": {
      "type": "string",
      "minLength": 1,
      "description": "The SuiteX user ID of the actor who triggered the change, or 'system' for automated processes."
    },
    "transactionGroupId": {
      "type": ["string", "null"],
      "format": "uuid",
      "description": "Used to group multiple events for batched operations."
    }
  },
  "additionalProperties": false
}
```

### Key Differences from Current Epic 1 Schema

1. `eventType` renamed to `operation`
2. `orderingKey` added as required
3. `source` enum includes `reconciliation`
4. `actorId` added as required (absorbed from jira-ticket-2)
5. `baseVersion` type changed from `string` to `["integer", "null"]`
6. `fullSnapshotRef` pattern constrained to `gs://`
7. `source` and `sourceSystem` descriptions clarify their distinct roles
