# Multi-Query Data Integration Design

**Document Version**: 1.0
**Created**: September 15, 2025
**Status**: 🟡 **DESIGN PHASE** - Approved for Future Implementation
**Priority**: High
**Estimated Effort**: 3-5 days

---

## 📋 **Executive Summary**

This design addresses the need to support complex NetSuite data relationships requiring multiple API queries and data merging within the existing import infrastructure. The current system breaks when attempting to use complex JOIN operations with GROUP BY statements, necessitating a multi-query approach while maintaining configuration-driven flexibility.

**Key Outcomes:**
- Eliminates NetSuite INVALID_PARAMETER errors from complex queries
- Enables rich data relationships (items with pricing, transactions with details)
- Maintains zero hard-coded business logic approach
- Provides backward compatibility with existing single-query configurations

---

## 🚨 **Problem Statement**

### **Current Issue**
NetSuite SuiteQL queries fail with `INVALID_PARAMETER` errors when using complex JOIN and GROUP BY operations:

```sql
-- FAILING: Complex aggregation query
SELECT * FROM item
LEFT JOIN pricing ON pricing.item = item.id
WHERE item.itemType = 'NonInvtPart'
GROUP BY item.id, item.itemid, item.costEstimate, ...
```

**Error**: `"Invalid search query. Detailed unprocessed description follows. Search error occurred: Invalid or unsupported search."`

### **Business Impact**
- **NonInventoryItems** and **ServiceItems** imports failing completely
- **Missing pricing data** for imported items
- **Reduced data richness** in SuiteX platform
- **Manual data entry** required for pricing information

### **Root Cause**
The `SELECT * optimization` removes LISTAGG aggregation functions but preserves JOIN and GROUP BY clauses, creating invalid SQL that NetSuite cannot process.

---

## 🏗️ **Current Architecture Analysis**

### **Existing Flow**
```mermaid
graph LR
    A[criteria.json] --> B[Query Builder]
    B --> C[NetSuite API]
    C --> D[Batch Processor]
    D --> E[Upsert Service]
```

### **Current Assumptions**
- **One criteria.json** = **One query** = **One dataset**
- Single NetSuite API call per record type
- Direct mapping from API response to database records
- Simple field transformations only

### **Breaking Point**
Multi-query requirements break the fundamental 1:1:1 relationship between configuration, query, and dataset.

---

## 🎯 **Proposed Solution: Enhanced Configuration-Driven Architecture**

### **Core Concept**
Extend `criteria.json` to support multiple queries with merge instructions while maintaining backward compatibility and zero hard-coded business logic.

### **Enhanced Configuration Schema**

#### **Multi-Query Configuration Example**
```json
{
  "rest": true,
  "allowMapping": true,
  "type": "noninventoryitem",
  "id": "noninventoryitem",
  "queries": [
    {
      "name": "items",
      "criteria": "item",
      "where": "WHERE item.itemType = 'NonInvtPart'",
      "primary": true,
      "select": "*"
    },
    {
      "name": "pricing",
      "criteria": "pricing",
      "where": "WHERE item IN ({items.ids})",
      "dependsOn": "items",
      "select": "item, pricelevel, unitprice"
    }
  ],
  "merge": {
    "strategy": "leftJoin",
    "primaryKey": "id",
    "foreignKey": "item",
    "aggregations": {
      "price_levels": {
        "source": "pricing.pricelevel",
        "function": "CONCAT",
        "separator": ", "
      },
      "unit_prices": {
        "source": "pricing.unitprice",
        "function": "CONCAT",
        "separator": ", "
      }
    }
  },
  "keep": ["unit_prices", "price_levels"],
  "mapping": {
    "costestimate": "costEstimate",
    "displayname": "displayName"
  }
}
```

#### **Backward Compatibility**
```json
{
  "rest": true,
  "allowMapping": true,
  "type": "discountitem",
  "criteria": "item",
  "where": "WHERE item.itemType = 'Discount'",
  "mapping": { /* existing mapping */ }
}
```

### **Architecture Components**

#### **1. Enhanced Configuration Loader**
```php
interface QueryConfiguration {
    public function isMultiQuery(): bool;
    public function getQueries(): array;
    public function getMergeStrategy(): ?MergeStrategy;
    public function getMapping(): array;
}

class SingleQueryConfiguration implements QueryConfiguration {
    // Existing single-query behavior
}

class MultiQueryConfiguration implements QueryConfiguration {
    // New multi-query behavior
}

class CriteriaLoader {
    public function load(string $recordType): QueryConfiguration {
        $criteria = $this->loadJson($recordType);

        if (isset($criteria['queries'])) {
            return new MultiQueryConfiguration($criteria);
        }

        return new SingleQueryConfiguration($criteria);
    }
}
```

#### **2. Query Executor Enhancement**
```php
class EnhancedQueryExecutor {
    public function execute(QueryConfiguration $config, int $offset, int $limit): array {
        if (!$config->isMultiQuery()) {
            return $this->executeSingle($config, $offset, $limit);
        }

        return $this->executeMulti($config, $offset, $limit);
    }

    private function executeMulti(MultiQueryConfiguration $config, int $offset, int $limit): array {
        $results = [];

        foreach ($config->getQueries() as $query) {
            // Handle dependency resolution
            if ($query->hasDependencies()) {
                $query = $this->resolveDependencies($query, $results);
            }

            // Execute query with pagination context
            $queryResults = $this->queryNetSuite(
                $query,
                $query->isPrimary() ? $offset : 0,
                $query->isPrimary() ? $limit : null
            );

            $results[$query->getName()] = $queryResults;
        }

        // Merge results according to strategy
        return $config->getMergeStrategy()->merge($results);
    }
}
```

#### **3. Merge Strategy Framework**
```php
interface MergeStrategy {
    public function merge(array $queryResults): array;
}

class LeftJoinMergeStrategy implements MergeStrategy {
    private array $config;

    public function merge(array $queryResults): array {
        $primary = $this->findPrimaryResults($queryResults);
        $secondary = $this->findSecondaryResults($queryResults);

        foreach ($primary as &$record) {
            $this->attachRelatedData($record, $secondary);
            $this->applyAggregations($record);
        }

        return $primary;
    }
}

class ConfigurableMergeStrategy implements MergeStrategy {
    // JSON-driven merge configuration
}
```

#### **4. Dependency Resolution System**
```php
class DependencyResolver {
    public function resolve(QueryDefinition $query, array $previousResults): QueryDefinition {
        $resolvedWhere = $this->substituteTemplates(
            $query->getWhere(),
            $previousResults
        );

        return $query->withWhere($resolvedWhere);
    }

    private function substituteTemplates(string $where, array $results): string {
        // Replace {items.ids} with actual comma-separated IDs
        return preg_replace_callback(
            '/\{(\w+)\.(\w+)\}/',
            fn($matches) => $this->extractValues($results, $matches[1], $matches[2]),
            $where
        );
    }
}
```

---

## 🔧 **Implementation Plan**

### **Phase 1: Core Infrastructure (2 days)**

#### **Tasks:**
1. **Create configuration interfaces** and base classes
2. **Implement enhanced criteria loader** with validation
3. **Build dependency resolution system**
4. **Add configuration schema validation**

#### **Deliverables:**
- `QueryConfiguration` interface and implementations
- `CriteriaLoader` with multi-query detection
- `DependencyResolver` class
- JSON schema validation for new format

#### **Success Criteria:**
- Load both single and multi-query configurations
- Validate configuration syntax and dependencies
- Detect circular dependencies

### **Phase 2: Query Execution Enhancement (2 days)**

#### **Tasks:**
1. **Enhance query executor** for multi-query support
2. **Implement merge strategy framework**
3. **Add template substitution** for dependent queries
4. **Build aggregation functions**

#### **Deliverables:**
- `EnhancedQueryExecutor` class
- `MergeStrategy` implementations
- Template substitution engine
- Aggregation function library

#### **Success Criteria:**
- Execute dependent queries in correct order
- Merge results according to configuration
- Handle pagination context correctly

### **Phase 3: Record Type Migration (1 day)**

#### **Tasks:**
1. **Convert NonInventoryItems** to multi-query format
2. **Convert ServiceItems** to multi-query format
3. **Test migration** with existing import jobs
4. **Performance comparison** vs single queries

#### **Deliverables:**
- Updated criteria.json files for failing record types
- Migration validation scripts
- Performance benchmarks

#### **Success Criteria:**
- NoninventoryItems imports succeed with pricing data
- ServiceItems imports succeed with pricing data
- No regression in processing speed

---

## 🚨 **Risk Analysis**

### **High Risk**
| Risk | Impact | Mitigation |
|------|--------|------------|
| **Breaking existing imports** | Critical | Comprehensive backward compatibility testing |
| **Performance degradation** | High | Benchmark testing, query optimization |
| **Complex debugging** | Medium | Enhanced logging for multi-query flows |

### **Medium Risk**
| Risk | Impact | Mitigation |
|------|--------|------------|
| **Configuration complexity** | Medium | Clear documentation, validation schemas |
| **Memory usage increase** | Medium | Streaming processing for large datasets |
| **NetSuite rate limits** | Medium | Intelligent query batching, caching |

### **Low Risk**
| Risk | Impact | Mitigation |
|------|--------|------------|
| **Development timeline** | Low | Phased implementation approach |
| **Testing complexity** | Low | Automated test suite expansion |

---

## 📊 **Success Metrics**

### **Functional Success**
- [ ] **NoninventoryItems** import success rate: 100%
- [ ] **ServiceItems** import success rate: 100%
- [ ] **Pricing data completeness**: >95% of items have pricing
- [ ] **Zero INVALID_PARAMETER errors** from complex queries

### **Performance Success**
- [ ] **Multi-query processing time** ≤ 150% of single query time
- [ ] **Memory usage increase** ≤ 20% during processing
- [ ] **API call efficiency** maintained or improved

### **Operational Success**
- [ ] **Configuration validation** catches errors before deployment
- [ ] **Error logging** provides clear debugging information
- [ ] **Rollback capability** for problematic configurations

---

## 🔄 **Alternative Approaches Considered**

### **Query Adapter Pattern**
**Rejected**: Requires code changes per record type, violates zero hard-coding constraint.

### **Pipeline/Strategy Pattern**
**Rejected**: Over-engineering for current requirements, adds unnecessary complexity.

### **Dependency Graph Extension**
**Rejected**: Mixes import-level and record-level concerns, harder to debug.

---

## 📚 **Dependencies & Prerequisites**

### **System Dependencies**
- Current criteria.json loading system
- NetSuite API query builder
- Batch processing infrastructure
- Wave coordination system

### **Development Prerequisites**
- Understanding of current import flow
- NetSuite SuiteQL query limitations
- JSON schema validation tools
- Performance testing framework

---

## 🚀 **Future Considerations**

### **Potential Enhancements**
- **Parallel query execution** for independent queries
- **Query result caching** for repeated dependent queries
- **Dynamic query optimization** based on data patterns
- **Visual configuration builder** for complex merge strategies

### **Scalability Considerations**
- **Memory-efficient streaming** for large datasets
- **Query batching optimization** for NetSuite limits
- **Configuration inheritance** for similar record types
- **Performance monitoring** and alerting

---

## 📋 **Appendix**

### **Configuration Examples**

#### **Complex Invoice with Transaction Lines**
```json
{
  "type": "invoice_with_lines",
  "queries": [
    {
      "name": "invoices",
      "criteria": "transaction",
      "where": "WHERE type = 'CustInvc' AND voided = 'F'",
      "primary": true
    },
    {
      "name": "lines",
      "criteria": "transactionline",
      "where": "WHERE transaction IN ({invoices.ids}) AND mainline = 'F'",
      "dependsOn": "invoices"
    }
  ],
  "merge": {
    "strategy": "leftJoin",
    "primaryKey": "id",
    "foreignKey": "transaction",
    "aggregations": {
      "line_count": {
        "source": "lines.id",
        "function": "COUNT"
      },
      "total_amount": {
        "source": "lines.amount",
        "function": "SUM"
      }
    }
  }
}
```

### **Testing Strategy**
1. **Unit tests** for each component
2. **Integration tests** with NetSuite sandbox
3. **Performance tests** with production data volumes
4. **Regression tests** for existing single-query configurations

---

**Next Review**: Before implementation begins
**Document Owner**: Development Team
**Stakeholders**: Import Operations, Data Architecture Team

