# Sentry Configuration & Monitoring Strategy

**Last Updated**: October 30, 2025
**Configuration File**: `/config/sentry.php`
**Documentation**: https://docs.sentry.io/platforms/php/guides/laravel/

## Table of Contents
- [Overview](#overview)
- [Account Limits & Quotas](#account-limits--quotas)
- [Configuration Strategy](#configuration-strategy)
- [Sampling Decisions](#sampling-decisions)
- [Error Filtering](#error-filtering)
- [Performance Monitoring](#performance-monitoring)
- [Expected Usage](#expected-usage)
- [Monitoring & Adjustment](#monitoring--adjustment)
- [Environment Configuration](#environment-configuration)

---

## Overview

SuiteX uses Sentry for production error tracking and performance monitoring. Our configuration is optimized for **maximum visibility while staying within quota limits**, with specific attention to filtering high-volume, low-value events (Livewire validation errors, queue job repeats).

### Key Principles
1. **Errors**: 100% sampling with intelligent filtering (Sentry's deduplication handles repeats)
2. **Performance**: Dynamic sampling based on transaction type using `traces_sampler`
3. **Noise Reduction**: Filter Livewire validation errors and disable noisy breadcrumbs/spans
4. **Quota Management**: Conservative initial sampling with 90%+ headroom for growth

### Environments
- **Production**: Full Sentry monitoring (configured below)
- **Staging/Sandbox**: Sentry disabled (no events sent)
- **Local/Development**: Sentry disabled (no events sent)

---

## Account Limits & Quotas

Our Sentry plan includes:

| Resource | Monthly Limit | Daily Estimate | Usage Strategy |
|----------|--------------|----------------|----------------|
| **Errors** | 50,000 | ~1,667/day | 100% sampling with filtering |
| **Spans** | 5,000,000 | ~167,000/day | Dynamic sampling (5-30%) |
| **Logs** | 5 GB | ~170 MB/day | Currently disabled |
| **Replays** | 50 | ~2/day | Not configured |
| **Cron Monitors** | 1 | - | Available |
| **Uptime Monitors** | 1 | - | Available |
| **Attachments** | 1 GB | ~34 MB/day | Available |
| **PAYG Budget** | $50/month | - | Safety net for overages |

---

## Configuration Strategy

### Error Sampling (`sample_rate`)

**Setting**: `1.0` (100%)

**Reasoning**:
- Sentry's built-in deduplication groups identical errors automatically
- With 50,000 error quota, 100% sampling provides complete visibility
- Early detection of production issues is critical
- `before_send` callback filters known noise (see Error Filtering below)

**Trade-off**: Accepts Sentry's grouping rather than pre-filtering with sampling

### Performance Tracing (`traces_sampler`)

**Setting**: Dynamic sampling function based on transaction type

**Reasoning**:
- 5,000,000 span quota is substantial - we can afford generous sampling
- Different transaction types have different debugging value
- Queue jobs can generate massive volume with repeat operations
- Static `traces_sample_rate` doesn't account for transaction diversity

**Implementation**: See [Sampling Decisions](#sampling-decisions) below

---

## Sampling Decisions

Our `traces_sampler` function implements transaction-specific sampling rates:

### Sampling Rates by Transaction Type

| Transaction Type | Sample Rate | Reasoning |
|-----------------|-------------|-----------|
| **User Web Routes** | 30% | Critical for UX; captures representative user experience |
| **API Endpoints** | 20% | Balances API visibility with volume control |
| **Queue Jobs** | 5% | High volume with repetitive operations; enough to catch issues |
| **CLI Commands** | 10% | Useful for debugging but not business-critical |
| **Health Checks** | 0% | Expected traffic with no debugging value |
| **Default (Other)** | 15% | Conservative fallback for unexpected transaction types |

### Parent Sampling Inheritance

```php
if ($context->getParentSampled() !== null) {
    return $context->getParentSampled() ? 1.0 : 0.0;
}
```

**Reasoning**:
- Maintains distributed traces across service boundaries
- Prevents broken traces that would make debugging impossible
- Standard Sentry best practice per [documentation](https://docs.sentry.io/platforms/php/guides/laravel/configuration/sampling/#inheritance)

### Transaction Type Detection

**Queue Jobs**: `str_starts_with($transactionName, 'queue:')`
- Laravel names queue transactions as `queue:JobClassName`
- Sampled at 5% to prevent quota waste from import jobs processing 1000s of records

**API Routes**: `str_starts_with($transactionName, 'api/')`
- Laravel routes under `/api/*` prefix
- Sampled at 20% for balanced visibility

**Web Routes**: HTTP method prefix (`GET `, `POST `, etc.)
- Laravel names web transactions as `GET /path` or `POST /path`
- Sampled at 30% as highest priority for user-facing performance

**CLI Commands**: `str_starts_with($transactionName, 'artisan:')`
- Laravel artisan commands
- Sampled at 10% - useful but not critical

### Profiling (`profiles_sample_rate`)

**Setting**: `0.05` (5% of sampled transactions)

**Reasoning**:
- Profiling provides deep performance insights but generates significant data
- 5% of sampled transactions = 0.05 × (avg 20% sampling) = ~1% overall
- Sufficient for identifying bottlenecks without overwhelming data
- Can be increased if bottleneck analysis needed

---

## Error Filtering

### Before Send Callback

**Purpose**: Filter noise BEFORE sending to Sentry (saves quota and reduces noise)

**Filtered Exceptions**:
1. **Livewire Validation Exceptions**
   - User input errors, not application bugs
   - Would generate thousands of duplicate "errors"
   - Pattern: `Livewire\Exceptions\ValidationException`

2. **Non-Critical Livewire Exceptions**
   - Most Livewire exceptions are user-driven (navigation, form issues)
   - Exception: `ComponentNotFoundException` (actual application error)
   - Pattern: `Livewire\Exceptions\*` (except ComponentNotFoundException)

**Implementation**:
```php
'before_send' => function (\Sentry\Event $event): ?\Sentry\Event {
    foreach ($event->getExceptions() as $exception) {
        if (str_contains($exception->getType(), 'Livewire\Exceptions\ValidationException')) {
            return null; // Don't send to Sentry
        }
        // ... additional filtering
    }
    return $event; // Send all other errors
}
```

### Ignore Exceptions

**Purpose**: SDK-level filtering (earlier in the pipeline)

**Ignored Classes**:
- `Livewire\Exceptions\ValidationException` - Redundant with `before_send` (belt & suspenders)

### Ignore Transactions

**Purpose**: Don't create performance transactions for known endpoints

**Ignored Routes**:
- `/up` - Laravel default health check
- `/health` - Custom health check
- `/_health` - Alternative health check
- `/status` - Status endpoint

**Reasoning**: Health checks generate high traffic with zero debugging value

---

## Performance Monitoring

### Enabled Tracing Features

#### High Value (Enabled)
- ✅ **SQL Queries** - Critical for identifying N+1 queries and slow database operations
- ✅ **SQL Origin** - Shows where slow queries originate (threshold: 100ms)
- ✅ **Views** - Identifies template rendering performance issues
- ✅ **HTTP Client Requests** - Tracks external API dependencies and latency
- ✅ **Queue Jobs** - Monitors async job performance
- ✅ **Notifications** - Tracks notification delivery
- ✅ **Continue After Response** - Captures complete request lifecycle including afterResponse() jobs

#### Disabled for Noise Reduction
- ❌ **Cache Operations** - Extremely noisy (100+ spans/request), minimal debugging value
- ❌ **Redis Commands** - High volume, enable only for specific Redis debugging
- ❌ **Livewire Components** - Not monitoring Livewire (per requirements)
- ❌ **SQL Bindings** - Adds significant data without proportional value
- ❌ **Missing Routes (404s)** - Expected user behavior, not performance issues

### Breadcrumb Configuration

**Purpose**: Add context to errors without overwhelming data

#### Enabled Breadcrumbs
- ✅ Logs - Critical debugging context
- ✅ SQL Queries - Shows query patterns leading to errors
- ✅ Queue Info - Important for async error context
- ✅ Command Info - Useful for CLI debugging
- ✅ HTTP Client Requests - Tracks external API calls
- ✅ Notifications - Context for communication-related errors

#### Disabled Breadcrumbs
- ❌ Cache - Too noisy, rarely useful
- ❌ Livewire - Not monitoring Livewire
- ❌ SQL Bindings - Adds volume without proportional value

---

## Expected Usage

### Traffic Assumptions
Based on estimated production traffic patterns:

- **Web Requests**: ~100,000/month
- **API Calls**: ~10,000/month
- **Queue Jobs**: ~50,000/month (includes import batch jobs)
- **CLI Commands**: ~5,000/month

### Projected Monthly Consumption

#### Errors
```
Total Errors: ~10,000-20,000/month
- Filtered: ~5,000 (Livewire validation, noise)
- Sent to Sentry: ~5,000-15,000
- After deduplication: ~2,000-5,000 unique issues

Quota: 50,000
Usage: 10-30%
Margin: 70-90% remaining ✅
```

#### Spans
```
Web Routes:    100,000 × 30% × 8 spans  = 240,000 spans
API Routes:     10,000 × 20% × 6 spans  =  12,000 spans
Queue Jobs:     50,000 × 5%  × 4 spans  =  10,000 spans
CLI Commands:    5,000 × 10% × 3 spans  =   1,500 spans
                                     Total: ~263,500 spans/month

Quota: 5,000,000
Usage: ~5%
Margin: 95% remaining ✅
```

**Note**: Actual span count per transaction varies based on:
- SQL queries executed
- HTTP client requests made
- View rendering operations
- Notification sends

### Headroom for Growth

With current configuration:
- **Errors**: Can handle 2.5-5x traffic increase before hitting quota
- **Spans**: Can handle 19x traffic increase before hitting quota
- **Strategy**: Very conservative; allows significant growth or sampling increase

---

## Monitoring & Adjustment

### Initial Monitoring Period (Weeks 1-2)

After deployment, monitor the following metrics in Sentry dashboard:

1. **Actual Consumption vs. Estimates**
   - Navigate to: `Settings > Usage & Billing`
   - Compare actual errors/spans to projections above
   - Identify any unexpected high-volume transactions

2. **Error Patterns**
   - Review: `Issues > All Issues`
   - Verify Livewire errors are filtered
   - Check for new high-frequency errors needing filtering

3. **Performance Insights**
   - Review: `Performance > Transactions`
   - Identify slow transactions
   - Validate sampling is capturing representative data

4. **Sampling Effectiveness**
   - Check: `Performance > Transactions > All Transactions`
   - Verify sampling rates match configuration
   - Ensure critical paths have sufficient data

### Adjustment Scenarios

#### Scenario 1: Under-utilizing Quota (Usage < 20%)
**Symptoms**: Only using 10-15% of error/span quota after 2 weeks

**Actions**:
1. Increase user route sampling from 30% → 40-50%
2. Increase queue job sampling from 5% → 10-15%
3. Enable Redis tracing if Redis performance is a concern
4. Increase profiling from 5% → 10%

**Impact**: Better visibility with still-comfortable margin

#### Scenario 2: Approaching Quota Limits (Usage > 80%)
**Symptoms**: Consuming 80%+ of monthly quota mid-month

**Actions**:
1. Review high-volume transactions in Sentry dashboard
2. Reduce sampling for high-volume, low-value transactions
3. Add more specific filtering in `before_send` callback
4. Consider adding more routes to `ignore_transactions`

**Impact**: Stay within quota limits without losing critical visibility

#### Scenario 3: Missing Critical Errors
**Symptoms**: Discovering production issues that weren't captured

**Actions**:
1. Review `before_send` filters - may be too aggressive
2. Check if filtered transaction types need higher sampling
3. Verify Sentry DSN is correct in production environment
4. Confirm `APP_ENV=production` triggers Sentry

**Impact**: Improve error detection without unnecessary quota waste

#### Scenario 4: Queue Job Performance Issues
**Symptoms**: Need better visibility into specific queue jobs

**Actions**:
1. Temporarily increase queue job sampling from 5% → 25%
2. Enable Redis tracing if job uses Redis heavily
3. Add custom instrumentation for specific problematic jobs
4. Return to 5% sampling after issues identified/resolved

**Impact**: Deep dive into specific performance problems

---

## Environment Configuration

### Production Environment Variables

Required in `.env`:
```bash
# Sentry DSN (required)
SENTRY_LARAVEL_DSN=https://xxx@xxx.ingest.sentry.io/xxx

# Environment identification
SENTRY_ENVIRONMENT=production
APP_ENV=production

# Optional: Override sampling rates without code changes
# SENTRY_SAMPLE_RATE=1.0
# SENTRY_TRACES_SAMPLE_RATE=0.2  # Note: traces_sampler takes precedence
# SENTRY_PROFILES_SAMPLE_RATE=0.05

# Optional: Release tracking (for changelog correlation)
# SENTRY_RELEASE=v1.2.3
```

### Staging/Development (Disabled)

In `.env`:
```bash
# Don't set SENTRY_LARAVEL_DSN - Sentry won't initialize
# Or explicitly disable:
SENTRY_LARAVEL_DSN=

APP_ENV=staging  # or local
```

**Result**: No Sentry data sent from non-production environments

### Runtime Overrides

All settings can be overridden via environment variables without code changes:

```bash
# Override error sampling
SENTRY_SAMPLE_RATE=0.5  # 50% of errors

# Override trace sampling (Note: traces_sampler takes precedence)
SENTRY_TRACES_SAMPLE_RATE=0.3  # Only used if traces_sampler not defined

# Override profiling
SENTRY_PROFILES_SAMPLE_RATE=0.1  # 10% of sampled transactions

# Disable specific features
SENTRY_TRACE_SQL_QUERIES_ENABLED=false
SENTRY_TRACE_CACHE_ENABLED=true  # Enable if needed
```

### Closures and Config Caching

**Important**: Laravel's config cache cannot serialize closures.

If you run `php artisan config:cache`, the `traces_sampler` and `before_send` closures will cause errors.

**Solutions**:
1. **Don't cache config in production** (if using closures) - slight performance impact
2. **Clear config cache** after deployment: `php artisan config:clear`
3. **Move closures to a service provider** if config caching is required

See: https://docs.sentry.io/platforms/php/guides/laravel/configuration/laravel-options/#closures-and-config-caching

---

## Future Considerations

### 1. Logs Integration (Currently Disabled)

**Status**: `'enable_logs' => false`

**Consideration**: We have 5GB/month log quota available

**Potential Use**:
- Capture Laravel logs automatically (warnings, errors, critical)
- Correlate logs with errors and performance data
- Better debugging context

**Trade-off**:
- Additional quota consumption
- More data to review
- May overlap with existing logging infrastructure

**Recommendation**: Enable if current logging is insufficient for production debugging

### 2. Session Replay (Available)

**Status**: Not configured

**Quota**: 50 replays/month included

**Potential Use**:
- Record user sessions leading to errors
- Visual reproduction of error scenarios
- UX debugging

**Implementation**: Requires frontend Sentry SDK

**Trade-off**:
- Privacy concerns (PII in recordings)
- Very limited quota (50/month)
- Requires JavaScript integration

**Recommendation**: Consider for high-priority, hard-to-reproduce user-facing errors

### 3. Cron Monitoring (Available)

**Status**: Not configured

**Quota**: 1 monitor included

**Potential Use**:
- Monitor scheduled Laravel commands
- Alert on missed executions
- Track execution duration

**Implementation**: Wrap scheduled commands with Sentry check-ins

**Recommendation**: Use for critical scheduled jobs (e.g., daily import, billing)

### 4. Uptime Monitoring (Available)

**Status**: Not configured

**Quota**: 1 monitor included

**Potential Use**:
- External uptime checks
- Alert on downtime
- Geographic availability

**Trade-off**: May duplicate existing uptime monitoring

**Recommendation**: Use if no existing uptime monitoring in place

### 5. Custom Instrumentation

**Current**: Using Laravel's automatic instrumentation

**Potential Additions**:
- Custom spans for business logic boundaries
- Tags for tenant/customer identification (without PII)
- Breadcrumbs for business process steps
- Custom metrics for business KPIs

**Example**:
```php
$span = \Sentry\startTransaction(
    new \Sentry\Tracing\TransactionContext('batch-import', 'Import 1000 records')
);

// ... business logic ...

$span->finish();
```

### 6. Sampling Rate Evolution

**Current Strategy**: Conservative (5-30% by transaction type)

**Evolution Path**:
1. **Weeks 1-2**: Monitor baseline, validate assumptions
2. **Month 1**: Adjust sampling based on actual traffic patterns
3. **Month 2+**: Optimize for discovered bottlenecks
4. **Ongoing**: Increase sampling as quota allows

**End State Possibilities**:
- User routes: 50-70%
- API routes: 30-40%
- Queue jobs: 10-20%
- Overall quota usage: 40-60% (comfortable margin for spikes)

---

## Troubleshooting

### Sentry Not Capturing Events

**Check**:
1. `SENTRY_LARAVEL_DSN` is set in production `.env`
2. `APP_ENV` is set to `production`
3. Test with: `php artisan sentry:test`
4. Check Laravel logs for Sentry errors: `storage/logs/laravel.log`

### Too Many Events Being Sent

**Check**:
1. Review high-volume transactions in Sentry dashboard
2. Temporarily reduce sampling rates via environment variables
3. Add filtering in `before_send` callback
4. Verify health check routes are in `ignore_transactions`

### Missing Expected Events

**Check**:
1. Verify transaction type detection in `traces_sampler`
2. Check if transaction is being filtered by `before_send`
3. Confirm sampling rate is appropriate for traffic volume
4. Review Sentry's "Rejected Events" in dashboard

### Config Cache Issues

**Error**: `Serialization of 'Closure' is not allowed`

**Solution**:
```bash
php artisan config:clear
# Don't run config:cache in production if using closures
```

---

## Related Documentation

- [Sentry Laravel SDK](https://docs.sentry.io/platforms/php/guides/laravel/)
- [Sentry Sampling Configuration](https://docs.sentry.io/platforms/php/guides/laravel/configuration/sampling/)
- [Sentry Performance Monitoring](https://docs.sentry.io/platforms/php/guides/laravel/tracing/)
- [Sentry Filtering](https://docs.sentry.io/platforms/php/guides/laravel/configuration/filtering/)

---

## Revision History

| Date | Author | Changes |
|------|--------|---------|
| 2025-10-30 | Initial | Created comprehensive Sentry configuration documentation |


