# Tenant-Scoped Search: Per-Tenant Custom Field Indexing

## 1. Overview

- Problem: Global search currently indexes only predefined model fields. Tenants often add dynamic custom columns (commonly `custfield*`) in their own databases and want these searchable without impacting other tenants.
- Proposed solution: Introduce a tenant-scoped search field registry and a schema-first, per-tenant index configuration pipeline that:
  - Registers which tenant-specific columns are searchable for each model/table (supports `custfield*` and non-custfield columns).
  - Extends the Typesense collection schema per tenant and per model to include those fields (and optionally embeddings), without altering global schemas.
  - Enriches `toSearchableArray()` with values for enabled custom fields.
  - Merges tenant custom fields into `query_by` (and aligned weights) at search time, ensuring isolation via tenant-prefixed index names.
- Why now: Tenants rely on rich custom data in SuiteX. Searching those fields increases usability and reduces bespoke reports. The tenant-aware Scout/Typesense foundation already exists; we extend it cleanly.

## 2. Goals and Non-Goals

- Goals
  - Allow each tenant to designate columns per model/table for inclusion in global search (supports `custfield*` and non-custfield).
  - Keep global search fields intact and shared across tenants.
  - Ensure strict isolation: no cross-tenant schema or data pollution.
  - Use a schema-first approach: only index columns that exist in the tenant DB table.
  - Provide tenant-defined field weights; compute aligned `query_by` and `query_by_weights` dynamically.
  - Provide a scalable, cached configuration and a safe failure mode (skip custom fields if invalid; do not block global fields).

- Non-Goals
  - Building a full UI redesign for custom fields (we will leverage existing Form Fields flows where possible).
  - Cross-tenant shared custom fields (each tenant config is independent).
  - Replacing Typesense or Scout; we integrate with current stack.

## 3. Current State

- Key components (see `docs/designs/SEARCH.md`):
  - `src/App/Services/SearchService.php`: tenant import orchestration and index flush.
  - `src/App/Services/SemanticSearchService.php`: text/semantic/hybrid search, builds `query_by` and hits tenant-prefixed indices.
  - `src/App/Traits/SearchableTenant.php`: prefixes index names per tenant via `searchableAs()` and dispatches tenant-aware jobs.
  - `src/App/Jobs/TenantAwareMakeSearchable.php`: restores tenant context in Scout jobs.
  - `config/scout.php`: global model `collection-schema` and `search-parameters` definitions.
  - Custom fields lifecycle: Form Fields domain and Database Columns domain add physical `custfield*` columns in tenant DB tables.

- Limitations
  - Global schemas in `config/scout.php` are static; cannot reflect tenant-specific columns.
  - `getSearchableFields()` per model omits tenant fields.
  - `toSearchableArray()` does not include arbitrary `custfield*` values by default.

## 4. Proposed Solution

- High-level
  - Create a tenant-scoped registry of searchable fields per model/table.
  - Index aliasing: All reads/writes use a stable per-tenant/model alias (e.g., `3978045_projects`).
  - Schema changes:
    - Add-only (no removals/embedding change): patch/extend existing collection in place; no recreate, no downtime.
    - Destructive (removals/embedding change): zero-downtime blue/green swap:
      - Create new collection (e.g., `3978045_projects_vN`) with merged schema
      - Stage index all documents into new collection
      - Atomically swap alias to new collection, then drop old collection
  - At document build time, merge custom field values into the searchable array.
  - At query time, merge tenant custom fields into `query_by` and compute aligned weights, querying via alias.
  - Auto-reindex: on registry changes, enqueue a debounced (2–5 minutes) reindex job for the affected tenant/model only.

- New/Updated Files
  - New: `database/migrations/tenants/2025_09_24_000000_create_search_field_configs_table.php`
  - New: `src/App/Services/TenantSearchIndexConfigurator.php`
  - New: `src/App/Services/TenantSearchFieldRegistry.php`
  - New: `src/App/Services/TypesenseAliasManager.php` (create/swap/get alias targets)
  - New: `src/App/Traits/SearchableCustomFields.php` (helper to extend `toSearchableArray()` with custom fields)
  - New: `src/App/Jobs/Search/ReindexTenantModelSearch.php` (queued, debounced reindex per tenant/model)
  - Update: `src/App/Traits/SearchableTenant.php` (`searchableAs()` returns alias)
  - Update: `src/App/Services/SearchService.php` (ensure schema; proceed directly with indexing; no readiness probe)
  - Update: `src/App/Services/SemanticSearchService.php` (use alias; exclude embedding from `query_by`; normalize weights)
  - Optional update: Integrate with Form Fields flows to toggle “Index in Search”, “Embed”, and “Weight”

- Interfaces and responsibilities
  - `TenantSearchFieldRegistry`
    - Load tenant’s enabled fields per model/table
    - Validate against DB schema (schema-first) and cache results
    - Expose metadata for `weight` (int) and `embed` (bool)
  - `TenantSearchIndexConfigurator`
    - Compute merged Typesense collection schema (global + tenant fields; all custom fields as `string` type)
    - Add-only change: patch existing collection fields
    - Destructive change: blue/green staged collection + alias swap; then trigger model-only reindex
    - Merge embedding `from` list with opt-in custom fields (default off; cap 5)
  - `SearchableCustomFields` trait
    - Provide `array buildCustomSearchFields(array $allowedColumns): array`
    - Pull model attributes for allowed custom keys and cast to strings safely
  - `ReindexTenantModelSearch` job
- `TypesenseAliasManager`
  - `ensureAlias(string $alias, string $target)`
  - `swapAlias(string $alias, string $newTarget)` (atomic)
  - `getAliasTarget(string $alias): string|null`

    - Debounced scheduling per tenant/model; performs reindex for just that model and tenant

- Key method pseudo code
  - Registry (schema-first and cached)
    - `TenantSearchFieldRegistry::getEnabledFields(string $tenantId, string $table): array`
```
$cacheKey = "tenant:{$tenantId}:search_fields:{$table}";
return Cache::remember($cacheKey, now()->addMinutes(10), function () use ($table) {
    $rows = DB::connection('tenant_connection')
        ->table('search_field_configs')
        ->where('model_table', $table)
        ->where('enabled', true)
        ->get();

    $existingColumns = Schema::connection('tenant_connection')->getColumnListing($table);
    return $rows
        // Allow custfield* and non-custfield columns; always require column to exist
        ->filter(fn($r) => in_array($r->column, $existingColumns, true))
        ->map(fn($r) => [
            'column' => $r->column,
            'weight' => $r->weight, // nullable -> default later
            'embed'  => (bool) $r->embed,
        ])
        ->values()
        ->all();
});
```

  - Configurator (add-only vs blue/green)
```
public function ensureTenantCollection(string $tenantId, Model $model): void
{
    $table = $model->getTable();
    $alias = $model->searchableAs(); // returns alias like 3978045_projects

    $global = config('scout.typesense.model-settings.' . get_class($model));
    $globalSchema = $global['collection-schema']['fields'] ?? [];

    $tenantFields = $this->registry->getEnabledFields($tenantId, $table);

    $customSchema = [];
    foreach ($tenantFields as $f) {
        $customSchema[] = ['name' => $f['column'], 'type' => 'string'];
    }

    $embedding = $this->buildEmbeddingField(
        $global['collection-schema']['fields'] ?? [],
        $this->mergeEmbeddingFrom($global, $tenantFields) // default off; include only embed=true; cap inside
    );

    $mergedFields = $this->mergeFieldsPreservingEmbedding($globalSchema, $customSchema, $embedding);

    $client = $this->resolveTypesenseClient();
    $target = $this->aliasManager->getAliasTarget($alias) ?? $this->initializeFirstCollection($client, $alias, $mergedFields);
    if ($this->isAddOnlyChange($client, $target, $mergedFields)) {
        $this->patchAddOnlyFields($client, $target, $mergedFields);
        return;
    }
    // Blue/green: create new, stage index, then swap alias
    $newTarget = $this->createVersionedCollection($client, $alias, $mergedFields);
    $this->stageIndexTo($tenantId, get_class($model), $newTarget);
    $this->aliasManager->swapAlias($alias, $newTarget);
    $this->dropOldCollection($client, $target);
}
```

  - Search enrichment in `toSearchableArray()` via trait
```
public function toSearchableArray(): array
{
    $base = parent::toSearchableArray() ?? [];
    $tenantId = $this->getCurrentTenantId();
    $allowed = app(TenantSearchFieldRegistry::class)
        ->getEnabledFields($tenantId, $this->getTable());

    $columns = array_map(fn($f) => $f['column'], $allowed);
    return $base + $this->buildCustomSearchFields($columns);
}
```

  - Extend query fields and weights in `SemanticSearchService`
```
protected function buildSearchParams(string $modelClass, string $query, int $perModelLimit, float $alpha): array
{
    $baseFields = method_exists($modelClass, 'getSearchableFields')
        ? $modelClass::getSearchableFields()
        : ['title', 'name', 'id'];

    $model = new $modelClass();
    $tenantId = app(App\Services\TenantService::class)->getCurrentTenantId();
    $extras = app(TenantSearchFieldRegistry::class)->getEnabledFields($tenantId, $model->getTable());

    $extraFields = array_column($extras, 'column');
    $queryBy = array_values(array_unique(array_merge($baseFields, $extraFields))); // embedding not in query_by

    // Compute weights aligned to query_by (excluding 'embedding' which Typesense ignores for weights)
    $weightsMap = [];
    foreach ($extras as $e) { $weightsMap[$e['column']] = max(1, (int)($e['weight'] ?? 1)); }
    $queryByWeights = [];
    foreach ($queryBy as $field) {
        if ($field === 'embedding') { continue; }
        $queryByWeights[] = (string)($weightsMap[$field] ?? 1);
    }

    return [
        'q' => $query,
      'query_by' => implode(',', $queryBy),
      // set query_by_weights only when counts align; otherwise omit
      'vector_query' => "embedding:([], alpha: {$alpha}, k: 200)",
        'per_page' => $perModelLimit,
        'exclude_fields' => 'embedding',
        'rerank_hybrid_matches' => true,
    ];
}
```

- Embedding policy
  - Default off for tenant custom fields; opt-in per field via registry; cap at 5 embedded custom fields.
  - Benefits: meaning-based matches for free-text; Costs: larger index, slower ingest, more CPU.
  - Implementation: `mergeEmbeddingFrom()` adds only fields with `embed=true` up to cap; all fields are `string` typed in Typesense.

- Auto-reindex (debounced)
  - On registry changes (add/remove/update enable/weight/embed), enqueue `ReindexTenantModelSearch` for the affected tenant/model.
  - Debounce window: 2–5 minutes; coalesce multiple changes into a single reindex.
  - Scope: reindex only that model for that tenant.

- Field state handling
  - Disabled: exclude from `toSearchableArray()` and `query_by`; keep in schema (no recreate required).
  - Deleted (or column dropped): remove from registry; schema change triggers collection recreate and reindex.

- Limits and types
  - Max custom fields per model per tenant: 15.
  - Max embedded custom fields per model per tenant: 5.
  - All custom fields indexed as Typesense `string`. Document how to extend to numeric/date later.

## 5. Phased Implementation Plan

- Phase 1: Registry and read-path integration
  - Deliverables:
    - Migration for `search_field_configs` on `tenant_connection`.
    - `TenantSearchFieldRegistry` with cache and schema-first validation.
    - `SearchableCustomFields` trait and integration into at least one primary model used by global search.
    - Extend `SemanticSearchService` to merge tenant fields and compute weights.

- Phase 2: Index configurator, aliases, blue/green, and import integration
  - Deliverables:
    - `TypesenseAliasManager` and alias-based `searchableAs()`
    - `TenantSearchIndexConfigurator` add-only detection; blue/green staged build with alias swap
    - `ReindexTenantModelSearch` job with debounced scheduling (2–5 minutes) per tenant/model.
    - Hook in `SearchService::importForTenant()` to ensure schema, wait readiness, support staged indexing target, and finalize alias swap.
    - Logging and safe fallback: skip only problematic custom fields, preserve global indexing.

- Phase 3: UI toggle and weights
  - Deliverables:
    - Minimal UI surface in Form Fields to toggle “Index in Search”, “Embed”, and set “Weight”.
    - Support `query_by_weights` alignment when tenant weights are set; fallback to uniform if omitted.

## 6. Testing Strategy

Follow `docs/AI/ai_tests.md` patterns strictly: schema-first, explicit `tenant_connection`, file-level `RefreshDatabase`, drop-then-create tables, and no Horizon. Do not mock Scout/Typesense heavily; prefer verifying configuration and payload composition.

- Unit Tests
  - Registry filters only existing columns
```
describe('TenantSearchFieldRegistry', function () {
    beforeEach(function () { /* SQLite tenant_connection setup per ai_tests.md */ });
    it('returns only enabled existing columns', function () {
        // seed: create tenant table with custom_a, custom_b
        // insert configs: custom_a enabled, custom_b disabled, custom_missing enabled
        // expect: only ['custom_a'] returned with flags
    });
});
```
  - Custom fields merged into `toSearchableArray()`
```
describe('SearchableCustomFields', function () {
    it('merges allowed custom fields into search document', function () {
        // model has attributes including custom_x
        // registry returns ['custom_x']
        // expect toSearchableArray() contains ['custom_x' => 'value']
    });
});
```
  - Query fields and weights merged in `SemanticSearchService`
```
describe('SemanticSearchService buildSearchParams', function () {
    it('merges tenant custom fields and aligns weights', function () {
        // registry returns [{column: 'alpha', weight: 3}, {column: 'beta', weight: null}]
        // expect query_by contains base + alpha + beta + embedding
        // expect query_by_weights aligned, defaults applied (1 for missing)
    });
});
```

- Integration Tests
  - Ensure schema before indexing per tenant (Option A)
```
describe('SearchService importForTenant', function () {
    it('ensures tenant collection schema includes custom fields and recreates on change', function () {
        // configure registry to return 2 fields
        // spy configurator called, collectionNeedsUpdate=true -> recreate and enqueue reindex job
        // run importForTenant and assert job enqueued
    });
});
```
  - Auto-reindex debounce
```
describe('Auto-reindex debounce', function () {
    it('debounces multiple registry updates into a single reindex job', function () {
        // simulate rapid successive updates
        // assert only one ReindexTenantModelSearch job enqueued within window
    });
});
```
  - End-to-end document payload composition
```
describe('Indexing payload composition', function () {
    it('includes custom fields in indexed documents', function () {
        // create model instance with custom values
        // call toSearchableArray and assert fields present
    });
});
```
  - Disabled vs deleted behavior
```
describe('Field state handling', function () {
    it('excludes disabled fields without schema changes', function () {
        // disable field -> not in query_by/docs; schema unchanged
    });
    it('removes deleted fields by recreating schema and reindexing', function () {
        // delete field -> collectionNeedsUpdate -> recreate and enqueue reindex
    });
});
```

- Feature Tests (minimal, smoke)
  - Search within tenant respects custom fields and weights
```
describe('Tenant search behavior', function () {
    it('searches on tenant custom fields only within tenant index and applies weights', function () {
        // seed: two tenants, each with different custom fields and weights
        // assert query_by/query_by_weights contain only current tenant fields in correct order
    });
});
```

## 7. Acceptance Criteria

- For a tenant with `custfield_project_lead` enabled on `projects`:
  - The tenant’s `projects` index contains a field named `custfield_project_lead` in its collection schema.
  - `toSearchableArray()` for `Project` includes `custfield_project_lead` values.
  - Hybrid/semantic search `query_by` for `Project` includes `custfield_project_lead` plus `embedding` when embedding is enabled.
  - Disabling the field removes it from `query_by` and payloads without recreating the collection.
  - Deleting the field (or dropping the column) triggers collection recreate and reindex for that tenant/model only.
  - Other tenants remain unaffected (no field in their indices, no query_by entry).
  - Tenant-defined weights are reflected in `query_by_weights` aligned with `query_by` (defaults used when not set).
  - Auto-reindex is debounced; multiple updates within the window enqueue a single job.

## 8. Additional Notes

- Database migrations
  - `tenant_connection.search_field_configs`:
    - `model_table` string, `column` string (any existing column name), `enabled` bool, `weight` nullable int, `embed` bool, timestamps.
    - Unique (`model_table`, `column`).

- Alterations to existing functionality
  - Non-breaking: global fields remain intact; per-tenant extensions added at runtime for that tenant’s indices.

- Testing commands
  - `./vendor/bin/pest --filter="TenantSearch"`
  - `php artisan search:import --tenant={id}` for manual verification.

- Metrics, logging, monitoring
  - Log schema ensure operations and skips with context (`tenant_id`, `table`, `column`).
  - Add counters for number of custom fields applied per tenant/model.
  - Alert on repeated failures updating collections (rate-limited logs).
  - Emit logs for auto-reindex scheduling and debounce coalescing.
  - Enforce caps: max 15 custom fields, max 5 embedded.
