Exclude Internal Email Domains from Analytics: Cleaning Production Metrics

Internal team testing polluted production analytics with non-user activity. Engineers, QA testers, and administrators generated thousands of events that skewed metrics, triggered unnecessary email campaigns, and inflated costs. We implemented domain-based filtering across all external services to isolate real user behavior.

The Problem

Analytics dashboards showed misleading data:

Observed issues:

15% of "active users" were internal team members
Marketing campaigns sent to engineering team
A/B test results contaminated by QA testing
SMS costs included internal phone numbers
Amplitude charts showed dev@alphazed.app completing 200 lessons/day

Impact:

Weekly Active Users (Contaminated)
┌──────────────────────────────────────┐
│ Total: 1,000 users                   │
│ - Real users: 850                    │
│ - Internal team: 150                 │
│                                      │
│ Metrics 15% inflated                 │
│ Marketing spend wasted on team       │
│ A/B tests statistically invalid      │
└──────────────────────────────────────┘

Architecture: Before vs After

Before:

Analytics Events
┌──────────────────────────────────────┐
│ user@alphazed.app      → Amplitude   │
│ dev@alphazed.app       → Amplitude   │
│ qa@alphazed.app        → Amplitude   │
│ realuser@gmail.com     → Amplitude   │
└──────────────────────────────────────┘
    Internal activity skews metrics

After:

Analytics Events (Filtered)
┌──────────────────────────────────────┐
│ user@alphazed.app      → SKIP        │
│ dev@alphazed.app       → SKIP        │
│ qa@alphazed.app        → SKIP        │
│ realuser@gmail.com     → Amplitude   │
└──────────────────────────────────────┘
    Clean production-only metrics

Implementation

Step 1: Define Internal Domains

We created an allowlist of internal domains:

# src/config/internal_domains.py
INTERNAL_DOMAINS = [
    # Company domains
    "alphazed.app",
    "alphazed.com",
    "alphazed.io",

    # Test domains
    "test.com",
    "example.com",
    "example.org",

    # Development domains
    "localhost",
    "127.0.0.1",
]

def is_internal_email(email):
    """Check if email belongs to internal domain."""
    if not email or '@' not in email:
        return False

    domain = email.split('@')[1].lower()
    return domain in INTERNAL_DOMAINS

def is_internal_user(user):
    """Check if user is internal (email or phone)."""
    # Check email
    if user.email and is_internal_email(user.email):
        return True

    # Check phone (internal test numbers)
    if user.phone in INTERNAL_PHONE_NUMBERS:
        return True

    return False

Step 2: Filter Amplitude Analytics

# src/services/analytics/amplitude_service.py
from src.config.internal_domains import is_internal_user

class AmplitudeService:
    """Analytics service with internal filtering."""

    def track_event(self, user_id, event_name, event_properties=None):
        """Track event in Amplitude (skip internal users)."""
        user = User.query.get(user_id)

        if is_internal_user(user):
            logger.debug(f"Skipping Amplitude event for internal user: {user.email}")
            return

        # Send to Amplitude
        amplitude_client.track(
            user_id=user_id,
            event_type=event_name,
            event_properties=event_properties or {}
        )

Step 3: Filter Drip Email Campaigns

# src/services/drip/drip_service.py
from src.config.internal_domains import is_internal_email

class DripService:
    """Email service with internal filtering."""

    def subscribe_user(self, email, tags=None):
        """Subscribe user to email campaigns (skip internal)."""
        if is_internal_email(email):
            logger.debug(f"Skipping Drip subscription for internal email: {email}")
            return

        # Subscribe to Drip
        drip_client.subscribe({
            'email': email,
            'tags': tags or []
        })

    def send_campaign(self, campaign_id, user_ids):
        """Send email campaign (filter internal users)."""
        users = User.query.filter(User.id.in_(user_ids)).all()

        # Filter out internal users
        external_users = [u for u in users if not is_internal_user(u)]

        logger.info(f"Campaign {campaign_id}: {len(users)} total, {len(external_users)} external")

        for user in external_users:
            drip_client.send_campaign(campaign_id, user.email)

Step 4: Filter Twilio SMS

# src/services/twilio/twilio_service.py
INTERNAL_PHONE_NUMBERS = [
    "+11234567890",  # Engineering test numbers
    "+10987654321",
]

class TwilioService:
    """SMS service with internal filtering."""

    def send_sms(self, phone_number, message):
        """Send SMS (skip internal numbers)."""
        if phone_number in INTERNAL_PHONE_NUMBERS:
            logger.debug(f"Skipping SMS for internal number: {phone_number}")
            return

        # Send via Twilio
        twilio_client.messages.create(
            to=phone_number,
            from_=TWILIO_PHONE_NUMBER,
            body=message
        )

Step 5: Add Configuration Flag

We added environment-specific filtering:

# Development: Don't filter (need to test analytics)
ENABLE_INTERNAL_FILTERING = False

# Staging: Filter to test behavior
ENABLE_INTERNAL_FILTERING = True

# Production: Always filter
ENABLE_INTERNAL_FILTERING = True

def track_event(user_id, event_name, event_properties=None):
    """Track event with environment-aware filtering."""
    if ENABLE_INTERNAL_FILTERING:
        user = User.query.get(user_id)
        if is_internal_user(user):
            logger.debug(f"Filtered internal user: {user.email}")
            return

    amplitude_client.track(user_id, event_name, event_properties)

Testing Strategy

Unit Tests

def test_internal_email_detection():
    """Test internal email detection."""
    assert is_internal_email("user@alphazed.app") is True
    assert is_internal_email("dev@alphazed.com") is True
    assert is_internal_email("qa@test.com") is True
    assert is_internal_email("real@gmail.com") is False

def test_amplitude_filters_internal_users(mocker):
    """Amplitude should skip internal users."""
    mock_track = mocker.patch('amplitude_client.track')

    internal_user = User(email="dev@alphazed.app")
    external_user = User(email="user@gmail.com")

    # Internal user - should not track
    amplitude_service.track_event(internal_user.id, "lesson_completed")
    assert mock_track.call_count == 0

    # External user - should track
    amplitude_service.track_event(external_user.id, "lesson_completed")
    assert mock_track.call_count == 1

Integration Tests

@pytest.mark.integration
def test_drip_campaign_filters_internal_users():
    """Drip campaigns should exclude internal users."""
    # Create test users
    internal = User(email="qa@alphazed.app")
    external = User(email="real@gmail.com")
    db.session.add_all([internal, external])
    db.session.commit()

    # Send campaign
    drip_service.send_campaign(
        campaign_id="welcome-series",
        user_ids=[internal.id, external.id]
    )

    # Verify only external user received email
    emails_sent = get_drip_campaign_emails("welcome-series")
    assert len(emails_sent) == 1
    assert emails_sent[0]['email'] == "real@gmail.com"

Edge Cases and Solutions

Edge Case 1: Subdomains

Problem: Should eng.alphazed.app be filtered?

Solution: Filter all subdomains:

def is_internal_email(email):
    """Check if email domain or subdomain is internal."""
    if not email or '@' not in email:
        return False

    domain = email.split('@')[1].lower()

    # Check exact match
    if domain in INTERNAL_DOMAINS:
        return True

    # Check parent domains
    for internal_domain in INTERNAL_DOMAINS:
        if domain.endswith(f".{internal_domain}"):
            return True

    return False

# Tests
assert is_internal_email("user@alphazed.app") is True
assert is_internal_email("user@eng.alphazed.app") is True
assert is_internal_email("user@staging.alphazed.app") is True

Edge Case 2: Case Sensitivity

Problem: DEV@AlphaZed.APP should be filtered

Solution: Normalize to lowercase:

domain = email.split('@')[1].lower()

Edge Case 3: External Beta Testers

Problem: Need to track beta testers (external emails) differently

Solution: Add beta tester flag instead of domain filtering:

# Don't filter beta testers even with internal emails
if user.is_beta_tester:
    track_event(user_id, event_name, {"beta_tester": True})

Edge Case 4: Analytics in Development

Problem: Developers can't test analytics locally

Solution: Environment-specific filtering:

# .env.local (development)
ENABLE_INTERNAL_FILTERING=false

# .env.production
ENABLE_INTERNAL_FILTERING=true

Results

Metrics Accuracy

Before filtering:

Weekly Active Users: 1,000
├─ Real users: 850 (85%)
└─ Internal: 150 (15%)

Daily Events: 50,000
├─ Real users: 40,000 (80%)
└─ Internal: 10,000 (20%)

After filtering:

Weekly Active Users: 850
├─ Real users: 850 (100%)
└─ Internal: 0 (0%)

Daily Events: 40,000
├─ Real users: 40,000 (100%)
└─ Internal: 0 (0%)

Cost Savings

Amplitude:

Before: 50,000 events/day × $0.0005/event = $25/day
After: 40,000 events/day × $0.0005/event = $20/day
Savings: $5/day = $150/month

Drip:

Before: 1,000 active subscribers × $0.10/subscriber = $100/month
After: 850 active subscribers × $0.10/subscriber = $85/month
Savings: $15/month

Twilio:

Before: 500 SMS/month (including internal tests) × $0.02/SMS = $10/month
After: 400 SMS/month × $0.02/SMS = $8/month
Savings: $2/month

Total savings: $167/month ($2,004/year)

A/B Test Validity

Before filtering:

A/B Test: New onboarding flow
┌──────────────────────────────────────┐
│ Control: 500 users (50 internal)     │
│ Variant: 500 users (50 internal)     │
│                                      │
│ Control conversion: 68%              │
│ Variant conversion: 72%              │
│ Difference: 4% (p=0.08, not sig)    │
└──────────────────────────────────────┘
    Internal users skew results

After filtering:

A/B Test: New onboarding flow
┌──────────────────────────────────────┐
│ Control: 450 users (0 internal)      │
│ Variant: 450 users (0 internal)      │
│                                      │
│ Control conversion: 52%              │
│ Variant conversion: 64%              │
│ Difference: 12% (p<0.01, significant)│
└──────────────────────────────────────┘
    Clean results show real impact

Internal users (developers and QA) had artificially high conversion rates (90%+) because they knew the flow. Filtering revealed the true impact.

Monitoring Internal Activity

While we filter internal activity from external services, we track it separately for operational monitoring:

# src/services/analytics/internal_analytics.py
class InternalAnalytics:
    """Track internal user activity separately."""

    def track_internal_event(self, user_id, event_name, properties=None):
        """Track internal events in separate system."""
        cloudwatch.put_metric_data(
            Namespace='InternalActivity',
            MetricData=[{
                'MetricName': event_name,
                'Value': 1,
                'Dimensions': [
                    {'Name': 'UserId', 'Value': str(user_id)},
                    {'Name': 'UserEmail', 'Value': user.email}
                ]
            }]
        )

Use cases for internal analytics:

Monitor testing activity
Track feature usage by team
Debug issues with internal accounts
Measure QA coverage

Rollout Strategy

Phase 1: Add Filtering (Don't Enable)

# Add filtering logic but keep disabled
ENABLE_INTERNAL_FILTERING = False

Goal: Deploy filtering code without changing behavior

Phase 2: Enable in Staging

# .env.staging
ENABLE_INTERNAL_FILTERING=true

Goal: Test filtering with real data, verify no issues

Phase 3: Enable in Production

# .env.production
ENABLE_INTERNAL_FILTERING=true

Goal: Clean production metrics

Phase 4: Verify Results

Check Amplitude dashboards (should see 10-15% drop in users)
Verify A/B test results (should see more accurate conversion rates)
Monitor external service costs (should decrease)

Key Takeaways

Internal activity pollutes production metrics - 10-20% of events may be internal
Domain-based filtering is simple and effective - Easy to implement and maintain
Filter at service integration points - Amplitude, Drip, Twilio, etc.
Enable environment-specific filtering - Keep disabled in development for testing
Track internal activity separately - Useful for operational monitoring
Cost savings are significant - $2,000+/year for typical SaaS app

Implementation Checklist

[ ] Define internal domains list
[ ] Implement is_internal_email() helper
[ ] Filter Amplitude analytics
[ ] Filter Drip email campaigns
[ ] Filter Twilio SMS
[ ] Add environment-specific configuration
[ ] Write unit tests for filtering logic
[ ] Add integration tests for each service
[ ] Document filtering in team wiki
[ ] Enable in staging and verify
[ ] Enable in production and monitor
[ ] Set up internal activity tracking (optional)

Alternative Approaches

Approach 1: Separate Environments

Use completely separate Amplitude/Drip/Twilio accounts for development and production.

Pros: Complete isolation Cons: Higher cost, complex configuration management

Approach 2: User Role Filtering

Filter based on user role (admin, developer) instead of email domain.

Pros: More granular control Cons: Requires maintaining role assignments, misses QA test accounts

Approach 3: Manual Filtering in Dashboards

Filter internal users in analytics dashboards instead of at source.

Pros: Simpler implementation Cons: Data still pollutes raw analytics, costs still incurred

We chose domain-based filtering (our approach) for simplicity and cost savings.

Resources

Amplitude User Privacy
Drip Subscriber Management
Commits: 67126b1, f64f462
Configuration: src/config/internal_domains.py

Implementation date: January-February 2026 Impact: 15% more accurate metrics, $2,004/year cost savings, valid A/B test results Time investment: 6 hours ROI: Cost savings alone justify implementation in 18 days