# 🧹 Node Data Cleanup Guide

## 📋 Overview

The `maintenance:cleanup-node-data` command cleans up old records from the `node_data` table that are no longer needed. This table stores temporary data during flow execution in the iPaaS system.

## 🎯 Objective

Delete obsolete data safely, efficiently, and without affecting running flows.

---

## 🔑 Key Concepts

### 1. **Retention Window**
- **Default value:** 48 hours
- **What it does:** Only deletes records older than this time
- **Why:** Allows time for debugging and analysis of completed flows

### 2. **Safety Window**
- **Default value:** 10 minutes
- **What it does:** Protects recently updated records
- **Why:** Prevents deletion of data from flows that are still running

### 3. **Batch Processing**
- **Default value:** 5,000 records per batch
- **What it does:** Deletes data in small groups
- **Why:** Prevents database locking and performance issues

### 4. **Cache Lock**
- **Key:** `maintenance:cleanup-node-data`
- **Timeout:** 10 minutes
- **What it does:** Prevents multiple instances of the command from running simultaneously
- **Why:** Avoids conflicts and duplicate work

### 5. **Maximum Runtime**
- **Default value:** 300 seconds (5 minutes)
- **What it does:** Stops the command after this time
- **Why:** Prevents the command from running indefinitely

---

## 🚀 Command Usage

### Basic Syntax

```bash
php artisan maintenance:cleanup-node-data [options]
```

### Available Options

| Option | Default Value | Description |
|--------|---------------|-------------|
| `--dry-run` | false | Simulates execution without deleting data |
| `--tenant=` | all | Limits to a specific tenant |
| `--hours=` | 48 | Retention window in hours |
| `--batch=` | 5000 | Deletion batch size |
| `--max-runtime=` | 300 | Maximum runtime in seconds |

---

## 📖 Usage Examples

### 1. Normal Execution (Production)

```bash
php artisan maintenance:cleanup-node-data
```

**Result:** Deletes `node_data` records older than 48 hours across all active tenants.

---

### 2. Simulation (Dry-Run)

```bash
php artisan maintenance:cleanup-node-data --dry-run
```

**Result:** Shows how many records would be deleted WITHOUT deleting anything. **Use this first to verify.**

**Expected output:**
```
🧹 Starting node_data cleanup process...
🔍 DRY-RUN MODE: No data will be deleted
📊 Found 3 active tenants

🔄 Processing tenant: sx_db_tenant1
  📅 Retention cutoff: 2025-09-27 10:00:00
  🛡️  Safety cutoff: 2025-09-29 09:50:00
  📝 Found 12,345 candidate rows
  🔍 DRY-RUN: Would delete 12,345 rows

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
📊 CLEANUP SUMMARY
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  Tenants processed: 3
  Rows scanned:      45,678
  Rows deleted:      0
  Duration:          1,234 ms
  Errors:            0
  Mode:              DRY-RUN (no data deleted)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
```

---

### 3. Limit to Specific Tenant

```bash
# With full prefix
php artisan maintenance:cleanup-node-data --tenant=sx_db_tenant123

# Without prefix (automatically added)
php artisan maintenance:cleanup-node-data --tenant=tenant123
```

**Result:** Only cleans data from the specified tenant.

---

### 4. Change Retention Window

```bash
# Delete only records older than 72 hours
php artisan maintenance:cleanup-node-data --hours=72

# Delete records older than 24 hours (more aggressive)
php artisan maintenance:cleanup-node-data --hours=24
```

⚠️ **Warning:** Using less than 48 hours may delete data that is still useful for analysis.

---

### 5. Adjust Batch Size

```bash
# Smaller batches (less DB impact but slower)
php artisan maintenance:cleanup-node-data --batch=1000

# Larger batches (faster but higher DB impact)
php artisan maintenance:cleanup-node-data --batch=10000
```

---

### 6. Combine Options

```bash
# Dry-run on specific tenant with 72-hour retention
php artisan maintenance:cleanup-node-data --dry-run --tenant=tenant123 --hours=72

# Production execution with custom configuration
php artisan maintenance:cleanup-node-data --hours=36 --batch=2000 --max-runtime=600
```

---

## ⏰ Automatic Scheduling

The command is configured to run **automatically every hour** at minute 25 (e.g., 10:25, 11:25, 12:25, etc.).

### Configuration in Kernel.php

```php
$schedule->command('maintenance:cleanup-node-data')
    ->hourlyAt(25)
    ->withoutOverlapping();
```

### Why at minute 25?

- Avoids conflicts with other commands that run at the start of the hour (minute 0, 5, 10)
- Reduces load during peak times

---

## 📊 Telemetry and Logs

### Structured Logs

The command generates structured logs for observability:

```json
{
  "category": "node_data_gc",
  "tenants_processed": 5,
  "rows_scanned": 123456,
  "rows_deleted": 45678,
  "duration_ms": 12345,
  "dry_run": false,
  "errors": 0,
  "retention_hours": 48,
  "batch_size": 5000
}
```

### Log Location

```bash
# View command logs
tail -f storage/logs/laravel.log | grep node_data_gc

# Search recent executions
grep "Node data cleanup completed" storage/logs/laravel.log
```

---

## 🔒 Safety Mechanisms

### 1. **Double Time Filter**

```php
WHERE updated_at <= '2025-09-27 10:00:00'  -- Retention cutoff (48h)
  AND updated_at <= '2025-09-29 09:50:00'  -- Safety cutoff (10min)
```

Only deletes records that meet **BOTH** conditions.

### 2. **Cache Lock**

```php
// If another process is running, the new process exits immediately
if (!$this->acquireLock()) {
    $this->warn('Another cleanup process is already running. Exiting.');
    return Command::FAILURE;
}
```

### 3. **Maximum Runtime**

```php
// Stops the command after 5 minutes (default)
if ($this->isMaxRuntimeExceeded()) {
    $this->warn('Max runtime exceeded. Stopping gracefully.');
    break;
}
```

### 4. **Delay Between Batches**

```php
usleep(100000); // 100ms wait between batches
```

Reduces pressure on the database.

---

## 🛠️ Troubleshooting

### Problem 1: "Another cleanup process is already running"

**Cause:** The cache lock is active (another process is running or a previous one didn't finish correctly).

**Solution:**
```bash
# Check if lock exists
php artisan cache:get maintenance:cleanup-node-data

# Clear lock manually (if you're sure no other process is running)
php artisan cache:forget maintenance:cleanup-node-data
```

---

### Problem 2: Command takes too long

**Cause:** Too many records to delete.

**Solutions:**
1. **Reduce batch size:**
   ```bash
   php artisan maintenance:cleanup-node-data --batch=2000
   ```

2. **Increase maximum runtime:**
   ```bash
   php artisan maintenance:cleanup-node-data --max-runtime=900
   ```

3. **Run for specific tenant:**
   ```bash
   php artisan maintenance:cleanup-node-data --tenant=tenant1
   ```

---

### Problem 3: Database connection errors

**Cause:** Issues with tenant connection.

**Solution:**
```bash
# Verify connection manually
php artisan tinker
>>> DB::connection('tenant_connection')->getPdo();
>>> DB::connection('tenant_connection')->table('node_data')->count();
```

---

### Problem 4: No records deleted

**Cause:** No records meet the criteria.

**Verification:**
```bash
# Use dry-run to see what would happen
php artisan maintenance:cleanup-node-data --dry-run

# Reduce retention window (carefully)
php artisan maintenance:cleanup-node-data --dry-run --hours=24
```

---

## 📈 Monitoring and Metrics

### Important Metrics to Monitor

1. **`rows_deleted`**: Number of records deleted per execution
2. **`duration_ms`**: Command execution time
3. **`errors`**: Errors during execution
4. **`tenants_processed`**: Number of tenants processed

### Useful Queries

```sql
-- View candidate records for deletion (last week)
SELECT 
    COUNT(*) as total_rows,
    DATE(updated_at) as date
FROM node_data
WHERE updated_at <= DATE_SUB(NOW(), INTERVAL 48 HOUR)
    AND updated_at <= DATE_SUB(NOW(), INTERVAL 10 MINUTE)
GROUP BY DATE(updated_at)
ORDER BY date DESC;

-- View node_data table size
SELECT 
    table_name,
    ROUND(((data_length + index_length) / 1024 / 1024), 2) AS "Size (MB)"
FROM information_schema.TABLES
WHERE table_name = 'node_data';

-- View oldest records
SELECT 
    MIN(updated_at) as oldest_record,
    MAX(updated_at) as newest_record,
    COUNT(*) as total_rows
FROM node_data;
```

---

## ⚠️ Warnings and Best Practices

### ✅ **DO**

1. **Always run `--dry-run` first** in production
2. **Monitor logs** after each execution
3. **Adjust retention window** based on business needs
4. **Run during low-load hours** if done manually

### ❌ **DON'T**

1. **Don't reduce retention window to less than 24 hours** without prior analysis
2. **Don't increase batch size beyond 10,000** without testing
3. **Don't run multiple instances manually** at the same time
4. **Don't delete cache lock** while another process is running

---

## 🔄 Execution Flow

```mermaid
graph TD
    A[Start] --> B{Acquire Cache Lock}
    B -->|Fail| C[Exit: Another process running]
    B -->|Success| D{Dry-Run Mode?}
    D -->|Yes| E[Only Count Records]
    D -->|No| F[Get Active Tenants]
    F --> G{For each Tenant}
    G --> H[Connect to Tenant DB]
    H --> I[Calculate Cutoffs]
    I --> J[Count Candidates]
    J --> K{Candidates exist?}
    K -->|No| G
    K -->|Yes| L{Dry-Run Mode?}
    L -->|Yes| M[Report without deleting]
    L -->|No| N[Delete in Batches]
    N --> O{Max Runtime?}
    O -->|Exceeded| P[Stop Gracefully]
    O -->|OK| Q{More Records?}
    Q -->|Yes| N
    Q -->|No| G
    M --> G
    G --> R[Log Telemetry]
    P --> R
    E --> R
    R --> S[Release Cache Lock]
    S --> T[End]
```

---

## 📝 Frequently Asked Questions

### Q: Can I run this command manually?
**A:** Yes, but first use `--dry-run` to see what would happen.

### Q: What happens if the command is interrupted?
**A:** The cache lock will be automatically released after 10 minutes. Changes already applied won't be rolled back (it's deletion, not transactional).

### Q: Can I change the execution schedule?
**A:** Yes, modify `src/App/Console/Kernel.php`:
```php
// Run daily at 2 AM
$schedule->command('maintenance:cleanup-node-data')->dailyAt('02:00');

// Run every 6 hours
$schedule->command('maintenance:cleanup-node-data')->everySixHours();
```

### Q: How do I know if it's working?
**A:** Check the logs:
```bash
grep "Node data cleanup completed" storage/logs/laravel.log | tail -5
```

### Q: Can I recover deleted data?
**A:** No, deletion is permanent. That's why it's important to use `--dry-run` first.

---

## 🚀 Next Steps (Future Improvements)

### Phase B - Advanced Telemetry
- Add Prometheus/Grafana metrics
- Real-time monitoring dashboard
- Automatic alerts on errors

### Phase C - Completion Signals
- Integrate with flow metrics system
- Delete only when flow is confirmed as completed
- Tag records with `flow_id` and `run_id` for better tracking

---

## 📚 References

- **Implementation File:** `src/App/Console/Commands/MaintenanceNodeDataCleanUp.php`
- **Scheduler Configuration:** `src/App/Console/Kernel.php`
- **Data Table:** `database/migrations/tenants/2025_01_16_162829_create_node_data_table.php`
- **Model:** `src/Domain/Ipaas/Nodes/Models/NodeData.php`
- **Configuration:** `config/maintenance.php`

---

## 📞 Support

If you have problems or questions:

1. Review this guide first
2. Search the logs: `storage/logs/laravel.log`
3. Run `--dry-run` to diagnose
4. Contact the development team with complete logs

---

**Last updated:** October 2, 2025
