Detect-Secrets Baseline: Preventing Credential Leaks

Accidentally committed secrets cause security incidents that can take hours to remediate. A developer copies an API key into code for testing, commits without thinking, and pushes to the remote repository. Now the secret must be rotated, the git history scrubbed, and all environments updated. One careless commit creates hours of emergency work.

We implemented Yelp's detect-secrets as a pre-commit hook to prevent credential leaks before they reach the repository. The tool scans staged files for patterns that look like secrets—API keys, passwords, private keys, OAuth tokens—and blocks commits containing them.

The Risk: Secrets in Version Control

Secrets in git history create multiple security risks. Public repositories expose credentials to anyone. Private repositories still leak secrets to former employees, contractors, and compromised accounts. Even after removing secrets from the latest commit, they persist in git history unless you rewrite history across all clones.

Common Secret Leak Scenarios:

Hardcoded API Keys:

# Developer testing locally
OPENAI_API_KEY = "sk-proj-abc123..."  # Committed by accident
response = openai.chat.completions.create(...)

Configuration Files:

# config.yaml
database:
  password: "prod_db_password_2024"  # Committed instead of .env
  host: "prod.rds.amazonaws.com"

Test Files:

# test_auth.py
def test_login():
    client.post('/login', json={
        'username': 'admin',
        'password': 'actual_prod_password'  # Copy-pasted from prod
    })

Environment Variables:

# deploy.sh
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG..."  # Committed script

Each scenario represents a real leak we've seen in codebases. The consequences range from minor (leaked staging credentials) to severe (leaked production database passwords enabling unauthorized access).

Before: Manual Secret Detection

Commit Process
┌──────────────────────────────────────┐
│ git add .                            │
│ git commit -m "Add feature"          │
│ git push                             │
│                                      │
│ (Secrets accidentally committed)     │
│                                      │
│ Later: Manual audit finds secret    │
│ Remediation:                         │
│ - Rotate secret ($$$)                │
│ - Update all environments (hours)    │
│ - Scrub git history (complex)        │
│ - Alert security team                │
└──────────────────────────────────────┘

The median time to detect a committed secret was 3-7 days, based on our pre-implementation analysis. By the time we noticed, the secret had propagated to CI/CD systems, developer machines, and potentially external logs.

The Solution: Automated Secret Scanning

Detect-secrets integrates with pre-commit to scan every file before commit. When it detects a potential secret, the commit is blocked with a clear error message showing exactly what triggered the alert. The developer reviews the finding, either removes the secret or marks it as a false positive, and commits again.

After: Automated Secret Detection

Commit Process (Protected)
┌──────────────────────────────────────┐
│ git add .                            │
│ git commit -m "Add feature"          │
│   └──> detect-secrets scan           │
│        └──> SECRET DETECTED!         │
│            File: src/config.py       │
│            Line: 23                  │
│            Type: AWS Access Key      │
│            └──> COMMIT BLOCKED ✗     │
│                                      │
│ (Developer removes secret)           │
│ git commit -m "Add feature" ✓        │
└──────────────────────────────────────┘

Detection happens in <100ms, adding negligible overhead to the commit process. The feedback is immediate and actionable—developers see exactly which file and line triggered the alert.

Implementation Details

We configured detect-secrets in .pre-commit-config.yaml:

repos:
  - repo: https://github.com/Yelp/detect-secrets
    rev: v1.4.0
    hooks:
      - id: detect-secrets
        args: ['--baseline', '.secrets.baseline']
        exclude: '(package-lock\.json|poetry\.lock|\.secrets\.baseline)$'

The --baseline argument points to .secrets.baseline, a file containing known false positives. This prevents the hook from blocking commits for intentional "secrets" like example API keys in documentation or test fixtures.

Creating the Baseline:

# Generate initial baseline of existing "secrets"
detect-secrets scan --baseline .secrets.baseline

# Update baseline when adding new false positives
detect-secrets scan --update .secrets.baseline

The baseline file is JSON containing hashes of detected secrets:

{
  "version": "1.4.0",
  "results": {
    "src/tests/fixtures/auth.py": [
      {
        "type": "Secret Keyword",
        "line_number": 12,
        "hashed_secret": "a94a8fe5ccb19ba61c4c0873d391e987982fbbd3",
        "is_verified": false
      }
    ]
  }
}

Hashing ensures the baseline doesn't leak actual secrets while still allowing the tool to recognize previously-approved findings.

Detection Plugins:

Detect-secrets uses plugins to identify different secret types:

AWSKeyDetector - AWS access keys and secret access keys
BasicAuthDetector - Basic auth credentials in URLs
PrivateKeyDetector - RSA, DSA, EC, and other private keys
SlackDetector - Slack API tokens and webhooks
StripeDetector - Stripe API keys
KeywordDetector - Generic password/secret patterns
Base64HighEntropyString - High-entropy base64 strings
HexHighEntropyString - High-entropy hex strings

Entropy-based detectors catch secrets that don't match specific patterns. A random 40-character string has high entropy and likely represents a token or key, even if it doesn't match known formats.

Configuring Exclusions:

Some files legitimately contain secret-like patterns:

# Exclude specific file types
exclude: |
  (?x)^(
    package-lock\.json|
    poetry\.lock|
    \.secrets\.baseline|
    docs/examples/.*|
    src/tests/fixtures/.*
  )$

This regex excludes lock files (which contain hashes), the baseline file itself, documentation examples, and test fixtures.

Handling False Positives

Entropy-based detection creates false positives. A 40-character hex string might be a secret or might be a git commit hash. When detect-secrets blocks a legitimate commit, the developer has two options:

Option 1: Add to Baseline

For persistent false positives (test fixtures, documentation):

# Add the file to the baseline
detect-secrets scan src/tests/fixtures/example.py --update .secrets.baseline

# Commit both files
git add .secrets.baseline src/tests/fixtures/example.py
git commit -m "Add test fixture"

Option 2: Inline Pragma

For one-off cases:

# This is a test fixture, not a real secret
TEST_API_KEY = "sk-test-abc123..."  # pragma: allowlist secret

The pragma tells detect-secrets to skip this specific line. Use this sparingly—overuse defeats the purpose of secret scanning.

Integration with CI/CD

We added detect-secrets to our CI pipeline to catch cases where developers bypass local hooks:

# .circleci/config.yml
jobs:
  security-scan:
    docker:
      - image: python:3.11
    steps:
      - checkout
      - run:
          name: Install detect-secrets
          command: pip install detect-secrets
      - run:
          name: Scan for secrets
          command: |
            detect-secrets scan --baseline .secrets.baseline
            if [ $? -ne 0 ]; then
              echo "Secrets detected! Run 'detect-secrets scan' locally."
              exit 1
            fi

If the scan finds secrets not in the baseline, the build fails. This creates defense in depth—local hooks prevent most leaks, and CI catches anything that slips through.

Real-World Impact

Before Implementation:

3 credential leaks per quarter (12/year)
Average detection time: 5 days
Average remediation time: 4 hours per incident
Total cost: ~48 hours/year in remediation
Security risk: High (credentials exposed in git history)

After Implementation:

0 credential leaks in 6 months
Average detection time: <1 second (at commit time)
Average remediation time: 30 seconds (remove before commit)
Total cost: ~0 hours in remediation
Security risk: Minimal (prevented at source)

The first week after implementation, detect-secrets blocked 8 attempted commits containing actual secrets:

OpenAI API key in a test file (copy-pasted from production)
Database password in a configuration file (developer testing locally)
AWS access key in a shell script (temporary debugging code)
Stripe test key in application code (should have been in .env)
Private SSH key in a deployment script (accidentally staged)
Google OAuth client secret in auth code (hardcoded for testing)
Twilio auth token in SMS service (temporary workaround)
JWT secret in authentication middleware (forgotten after refactor)

Each block represented a potential security incident prevented. Developers appreciated the immediate feedback—they could fix the issue in seconds rather than dealing with emergency rotation and history rewriting later.

False Positive Rate:

Over 6 months:

Total commits blocked: 147
Actual secrets: 47 (32%)
False positives: 100 (68%)
False positives added to baseline: 23
False positives fixed by refactoring: 77

The 68% false positive rate sounds high, but the cost is low—developers spend 10-20 seconds reviewing each alert. Compare this to the 4-hour cost of remediating a single leaked secret, and the tradeoff is clearly worthwhile.

Developer Feedback

Initial resistance came from developers annoyed by false positives. "This tool blocked my commit for a git commit hash!" After explaining the tradeoff—10 seconds to review a false positive vs. 4 hours to fix a leaked secret—adoption improved.

We created documentation with common false positive patterns and how to handle them:

Common False Positives:

Git commit hashes (40-char hex) → Add to baseline
Hashed passwords in test fixtures → Add to baseline or use pragma
Example secrets in documentation → Exclude docs/ directory
UUID values (high entropy) → Refactor to use generated UUIDs
Base64-encoded images → Exclude from scan

After the first month, false positive complaints dropped to near zero. Developers internalized the patterns and learned to structure code to avoid triggering alerts.

Best Practices

Run baseline scan before implementation:
```
detect-secrets scan --baseline .secrets.baseline
```
This creates a baseline of existing findings so you can focus on preventing new leaks.

Use .env files for secrets:

# Bad
API_KEY = "sk-prod-abc123"

# Good
import os
API_KEY = os.getenv("API_KEY")

Add .env to .gitignore:

# .gitignore
.env
.env.local
.env.*.local

Use .env.example for structure:

# .env.example
API_KEY=your_api_key_here
DATABASE_URL=postgresql://user:pass@host:5432/db

Review baseline periodically:

# Audit baseline to ensure false positives are still valid
detect-secrets audit .secrets.baseline

Results

The investment in detect-secrets was minimal—two hours to configure and document. The return was immediate and ongoing:

Zero secret leaks in 6 months (down from 6 in the previous 6 months)
48 hours/year saved in remediation time
Reduced security risk from exposed credentials
Improved security culture through immediate feedback
Negligible performance impact (<100ms per commit)

Detect-secrets became an invisible safety net. Developers rarely think about it—it just works, blocking secrets before they reach the repository. The rare false positive is handled in seconds, and actual secrets are caught 100% of the time at commit.

The cultural shift was subtle but significant. Developers became more conscious of where secrets live in code. Instead of temporarily hardcoding an API key "just for testing," they use environment variables from the start. The tool shaped behavior through immediate feedback, making secure practices the path of least resistance.

Key Takeaways

Prevent problems at the source - Block secrets at commit time, not after they've leaked
Accept false positives - Better to review 100 false positives than miss 1 real secret
Make it automatic - Integrate with pre-commit so developers don't have to remember
Provide clear feedback - Show exactly what triggered the alert and how to fix it
Measure impact - Track leaks prevented and time saved to demonstrate value

Secret scanning is a solved problem. Tools like detect-secrets are mature, fast, and effective. The only question is whether you'll implement it before or after your next credential leak. We chose before, and six months of zero incidents proves the value.