Detect-Secrets Baseline: Preventing Credential Leaks
Accidentally committed secrets cause security incidents that can take hours to remediate. A developer copies an API key into code for testing, commits without thinking, and pushes to the remote repository. Now the secret must be rotated, the git history scrubbed, and all environments updated. One careless commit creates hours of emergency work.
We implemented Yelp's detect-secrets as a pre-commit hook to prevent credential leaks before they reach the repository. The tool scans staged files for patterns that look like secrets—API keys, passwords, private keys, OAuth tokens—and blocks commits containing them.
The Risk: Secrets in Version Control
Secrets in git history create multiple security risks. Public repositories expose credentials to anyone. Private repositories still leak secrets to former employees, contractors, and compromised accounts. Even after removing secrets from the latest commit, they persist in git history unless you rewrite history across all clones.
Common Secret Leak Scenarios:
- Hardcoded API Keys:
# Developer testing locally
OPENAI_API_KEY = "sk-proj-abc123..." # Committed by accident
response = openai.chat.completions.create(...)
- Configuration Files:
# config.yaml
database:
password: "prod_db_password_2024" # Committed instead of .env
host: "prod.rds.amazonaws.com"
- Test Files:
# test_auth.py
def test_login():
client.post('/login', json={
'username': 'admin',
'password': 'actual_prod_password' # Copy-pasted from prod
})
- Environment Variables:
# deploy.sh
export AWS_SECRET_ACCESS_KEY="wJalrXUtnFEMI/K7MDENG..." # Committed script
Each scenario represents a real leak we've seen in codebases. The consequences range from minor (leaked staging credentials) to severe (leaked production database passwords enabling unauthorized access).
Before: Manual Secret Detection
Commit Process
┌──────────────────────────────────────┐
│ git add . │
│ git commit -m "Add feature" │
│ git push │
│ │
│ (Secrets accidentally committed) │
│ │
│ Later: Manual audit finds secret │
│ Remediation: │
│ - Rotate secret ($$$) │
│ - Update all environments (hours) │
│ - Scrub git history (complex) │
│ - Alert security team │
└──────────────────────────────────────┘
The median time to detect a committed secret was 3-7 days, based on our pre-implementation analysis. By the time we noticed, the secret had propagated to CI/CD systems, developer machines, and potentially external logs.
The Solution: Automated Secret Scanning
Detect-secrets integrates with pre-commit to scan every file before commit. When it detects a potential secret, the commit is blocked with a clear error message showing exactly what triggered the alert. The developer reviews the finding, either removes the secret or marks it as a false positive, and commits again.
After: Automated Secret Detection
Commit Process (Protected)
┌──────────────────────────────────────┐
│ git add . │
│ git commit -m "Add feature" │
│ └──> detect-secrets scan │
│ └──> SECRET DETECTED! │
│ File: src/config.py │
│ Line: 23 │
│ Type: AWS Access Key │
│ └──> COMMIT BLOCKED ✗ │
│ │
│ (Developer removes secret) │
│ git commit -m "Add feature" ✓ │
└──────────────────────────────────────┘
Detection happens in <100ms, adding negligible overhead to the commit process. The feedback is immediate and actionable—developers see exactly which file and line triggered the alert.
Implementation Details
We configured detect-secrets in .pre-commit-config.yaml:
repos:
- repo: https://github.com/Yelp/detect-secrets
rev: v1.4.0
hooks:
- id: detect-secrets
args: ['--baseline', '.secrets.baseline']
exclude: '(package-lock\.json|poetry\.lock|\.secrets\.baseline)$'
The --baseline argument points to .secrets.baseline, a file containing known false positives. This prevents the hook from blocking commits for intentional "secrets" like example API keys in documentation or test fixtures.
Creating the Baseline:
# Generate initial baseline of existing "secrets"
detect-secrets scan --baseline .secrets.baseline
# Update baseline when adding new false positives
detect-secrets scan --update .secrets.baseline
The baseline file is JSON containing hashes of detected secrets:
{
"version": "1.4.0",
"results": {
"src/tests/fixtures/auth.py": [
{
"type": "Secret Keyword",
"line_number": 12,
"hashed_secret": "a94a8fe5ccb19ba61c4c0873d391e987982fbbd3",
"is_verified": false
}
]
}
}
Hashing ensures the baseline doesn't leak actual secrets while still allowing the tool to recognize previously-approved findings.
Detection Plugins:
Detect-secrets uses plugins to identify different secret types:
- AWSKeyDetector - AWS access keys and secret access keys
- BasicAuthDetector - Basic auth credentials in URLs
- PrivateKeyDetector - RSA, DSA, EC, and other private keys
- SlackDetector - Slack API tokens and webhooks
- StripeDetector - Stripe API keys
- KeywordDetector - Generic password/secret patterns
- Base64HighEntropyString - High-entropy base64 strings
- HexHighEntropyString - High-entropy hex strings
Entropy-based detectors catch secrets that don't match specific patterns. A random 40-character string has high entropy and likely represents a token or key, even if it doesn't match known formats.
Configuring Exclusions:
Some files legitimately contain secret-like patterns:
# Exclude specific file types
exclude: |
(?x)^(
package-lock\.json|
poetry\.lock|
\.secrets\.baseline|
docs/examples/.*|
src/tests/fixtures/.*
)$
This regex excludes lock files (which contain hashes), the baseline file itself, documentation examples, and test fixtures.
Handling False Positives
Entropy-based detection creates false positives. A 40-character hex string might be a secret or might be a git commit hash. When detect-secrets blocks a legitimate commit, the developer has two options:
Option 1: Add to Baseline
For persistent false positives (test fixtures, documentation):
# Add the file to the baseline
detect-secrets scan src/tests/fixtures/example.py --update .secrets.baseline
# Commit both files
git add .secrets.baseline src/tests/fixtures/example.py
git commit -m "Add test fixture"
Option 2: Inline Pragma
For one-off cases:
# This is a test fixture, not a real secret
TEST_API_KEY = "sk-test-abc123..." # pragma: allowlist secret
The pragma tells detect-secrets to skip this specific line. Use this sparingly—overuse defeats the purpose of secret scanning.
Integration with CI/CD
We added detect-secrets to our CI pipeline to catch cases where developers bypass local hooks:
# .circleci/config.yml
jobs:
security-scan:
docker:
- image: python:3.11
steps:
- checkout
- run:
name: Install detect-secrets
command: pip install detect-secrets
- run:
name: Scan for secrets
command: |
detect-secrets scan --baseline .secrets.baseline
if [ $? -ne 0 ]; then
echo "Secrets detected! Run 'detect-secrets scan' locally."
exit 1
fi
If the scan finds secrets not in the baseline, the build fails. This creates defense in depth—local hooks prevent most leaks, and CI catches anything that slips through.
Real-World Impact
Before Implementation:
- 3 credential leaks per quarter (12/year)
- Average detection time: 5 days
- Average remediation time: 4 hours per incident
- Total cost: ~48 hours/year in remediation
- Security risk: High (credentials exposed in git history)
After Implementation:
- 0 credential leaks in 6 months
- Average detection time: <1 second (at commit time)
- Average remediation time: 30 seconds (remove before commit)
- Total cost: ~0 hours in remediation
- Security risk: Minimal (prevented at source)
The first week after implementation, detect-secrets blocked 8 attempted commits containing actual secrets:
- OpenAI API key in a test file (copy-pasted from production)
- Database password in a configuration file (developer testing locally)
- AWS access key in a shell script (temporary debugging code)
- Stripe test key in application code (should have been in .env)
- Private SSH key in a deployment script (accidentally staged)
- Google OAuth client secret in auth code (hardcoded for testing)
- Twilio auth token in SMS service (temporary workaround)
- JWT secret in authentication middleware (forgotten after refactor)
Each block represented a potential security incident prevented. Developers appreciated the immediate feedback—they could fix the issue in seconds rather than dealing with emergency rotation and history rewriting later.
False Positive Rate:
Over 6 months:
- Total commits blocked: 147
- Actual secrets: 47 (32%)
- False positives: 100 (68%)
- False positives added to baseline: 23
- False positives fixed by refactoring: 77
The 68% false positive rate sounds high, but the cost is low—developers spend 10-20 seconds reviewing each alert. Compare this to the 4-hour cost of remediating a single leaked secret, and the tradeoff is clearly worthwhile.
Developer Feedback
Initial resistance came from developers annoyed by false positives. "This tool blocked my commit for a git commit hash!" After explaining the tradeoff—10 seconds to review a false positive vs. 4 hours to fix a leaked secret—adoption improved.
We created documentation with common false positive patterns and how to handle them:
Common False Positives:
- Git commit hashes (40-char hex) → Add to baseline
- Hashed passwords in test fixtures → Add to baseline or use pragma
- Example secrets in documentation → Exclude docs/ directory
- UUID values (high entropy) → Refactor to use generated UUIDs
- Base64-encoded images → Exclude from scan
After the first month, false positive complaints dropped to near zero. Developers internalized the patterns and learned to structure code to avoid triggering alerts.
Best Practices
-
Run baseline scan before implementation:
detect-secrets scan --baseline .secrets.baselineThis creates a baseline of existing findings so you can focus on preventing new leaks.
-
Use .env files for secrets:
# Bad API_KEY = "sk-prod-abc123" # Good import os API_KEY = os.getenv("API_KEY") -
Add .env to .gitignore:
# .gitignore .env .env.local .env.*.local -
Use .env.example for structure:
# .env.example API_KEY=your_api_key_here DATABASE_URL=postgresql://user:pass@host:5432/db -
Review baseline periodically:
# Audit baseline to ensure false positives are still valid detect-secrets audit .secrets.baseline
Results
The investment in detect-secrets was minimal—two hours to configure and document. The return was immediate and ongoing:
- Zero secret leaks in 6 months (down from 6 in the previous 6 months)
- 48 hours/year saved in remediation time
- Reduced security risk from exposed credentials
- Improved security culture through immediate feedback
- Negligible performance impact (<100ms per commit)
Detect-secrets became an invisible safety net. Developers rarely think about it—it just works, blocking secrets before they reach the repository. The rare false positive is handled in seconds, and actual secrets are caught 100% of the time at commit.
The cultural shift was subtle but significant. Developers became more conscious of where secrets live in code. Instead of temporarily hardcoding an API key "just for testing," they use environment variables from the start. The tool shaped behavior through immediate feedback, making secure practices the path of least resistance.
Key Takeaways
- Prevent problems at the source - Block secrets at commit time, not after they've leaked
- Accept false positives - Better to review 100 false positives than miss 1 real secret
- Make it automatic - Integrate with pre-commit so developers don't have to remember
- Provide clear feedback - Show exactly what triggered the alert and how to fix it
- Measure impact - Track leaks prevented and time saved to demonstrate value
Secret scanning is a solved problem. Tools like detect-secrets are mature, fast, and effective. The only question is whether you'll implement it before or after your next credential leak. We chose before, and six months of zero incidents proves the value.