
March 1, 2023
Technical Audit: Identifying Revenue Leakage in Subscription Billing
Business Context
A subscription-based SaaS platform noticed their revenue metrics didn't align with user activity levels. The platform appeared operationally stable - no errors, no user complaints, normal uptime. However, monthly recurring revenue was lower than expected given the number of active users.
The client requested a technical audit to identify any discrepancies between billing system operation and actual revenue collection.
Problems
-
Revenue discrepancy. Monthly recurring revenue was consistently lower than expected based on active user counts and subscription tier distribution.
-
No visible errors. The application operated normally. Monitoring systems showed healthy uptime and no error spikes.
-
Limited diagnostic visibility. The existing system lacked reconciliation between application state (user access) and payment platform state (active subscriptions).
Solution
Audit Approach
I conducted a data reconciliation audit comparing application database state with payment platform (Stripe) records. Rather than starting with code review, I focused on identifying discrepancies between systems as the diagnostic starting point.
Data sources:
- Application database: user accounts, access status, subscription records
- Stripe API: subscription status, payment history, event logs
- Application logs: webhook event processing records
Reconciliation Analysis
I wrote SQL queries to cross-reference user access status with Stripe subscription status:
- Query all users with active premium feature access
- Query corresponding Stripe subscription IDs from application database
- Fetch current subscription status from Stripe API
- Identify discrepancies where access was active but subscription was inactive
The analysis revealed hundreds of accounts with active feature access despite canceled or failed subscriptions.
Root Cause Investigation
I traced the webhook event flow to identify where state synchronization failed:
Event processing gaps:
- Subscription cancellation webhooks not revoking access
- Payment failure events not triggering access restrictions
- Trial expiration events not processed correctly
- Webhook retry failures leaving state inconsistent
The webhook endpoint had several failure modes:
- Network timeouts during high-traffic periods
- Missing event handlers for specific webhook types
- Race conditions in concurrent event processing
- Silent failures (returning 200 OK despite processing errors)
Evidence Documentation
I delivered verifiable evidence to the client:
- SQL queries for independent verification of affected accounts
- List of specific user accounts with access/billing discrepancies
- Webhook event IDs and timestamps showing processing failures
- Technical specifications for corrective webhook handling
This approach allowed the client to verify findings independently and use the evidence for vendor discussions or alternative implementation.
Technical Recommendations
I provided specifications for corrective implementation:
Webhook handling improvements:
- Implement idempotent event processing to handle duplicate delivery
- Add event type handlers for all documented Stripe webhook events
- Return 500 status codes for actual processing failures (enabling Stripe retry)
- Log all webhook processing outcomes for reconciliation auditing
State synchronization:
- Use Stripe as authoritative source for subscription status
- Implement defensive checks before granting access (verify active subscription)
- Add background job for periodic reconciliation of access vs. subscription state
- Monitor webhook processing success rates
Monitoring additions:
- Track percentage of users with access matched to active subscriptions
- Alert on reconciliation discrepancies exceeding threshold
- Monitor webhook event processing latency and failure rates
Results
-
Identified $40K in missed payments through data reconciliation between application state and payment platform records.
-
Diagnosed webhook processing failures preventing subscription state updates from propagating to application database.
-
Delivered actionable remediation plan with SQL queries, technical specifications, and affected customers list.
Conclusion
This audit identified $40K in revenue leakage through data reconciliation between application and payment platform state. Webhook failures caused silent billing degradation while traditional monitoring showed healthy uptime.
Key findings:
- Database reconciliation identified discrepancies invisible to uptime monitoring
- Webhook processing gaps prevented subscription state synchronization
- Delivered verifiable evidence with SQL queries for independent verification
For subscription platforms, this demonstrates data reconciliation as a diagnostic approach and the need for monitoring that validates business outcomes rather than just operational health.