Data Flow Mapping
Follow every data path from input to storage.
User input enters your application at the frontend. It passes through API layers, gets transformed by services, and ends up in a database, a log file, or an external API call. codelake traces every step.
No other security tool tracks data flows across service boundaries. codelake follows user input through your entire application stack, identifying where sanitization is missing, where PII is exposed, and where sensitive data leaks to unintended destinations.
Data Flow — User Registration
User Input
POST /api/register { name, email, phone, password }
API Gateway
VALIDATEDSchema validation → rate limiter → route
User Service
NO SANITIZATIONInput passed directly to query builder
Database — users table
name, email (PII), phone (PII), password_hash
How It Works
From entry point to final destination.
codelake performs taint analysis across your entire application. Every user-controlled input is tracked as it moves through function calls, service boundaries, transformations, and storage operations.
Source Identification
codelake identifies all entry points where external data enters your application: HTTP request parameters, headers, file uploads, webhook payloads, message queue consumers, and environment variables.
Propagation Tracking
Each tainted input is followed through variable assignments, function parameters, return values, and inter-service HTTP calls or message passing. codelake tracks data even when it's transformed, serialized, or split into parts.
Sanitization Detection
At every step, codelake checks whether proper sanitization, validation, or encoding has been applied. Framework-specific sanitizers are recognized: Laravel's validation, Express validators, Django form cleaners, and more.
Sink Analysis
Data is flagged when it reaches a sensitive sink without proper sanitization: database queries, file writes, log outputs, external API calls, email templates, or OS commands. Each sink type requires specific sanitization.
What It Detects
Findings that only flow analysis can produce.
These are real vulnerability patterns that require understanding how data moves through your application. No pattern-matching scanner can find them.
Unfiltered Input to Database
User input reaches a database query without sanitization or parameterization at any point in the flow. codelake traces the full path, not just the final query.
POST /api/search → SearchController → SearchService → raw SQL query
Missing Sanitization Layers
Data passes through multiple services where sanitization was expected but is absent. Even if one layer validates, a downstream service may re-parse the data unsafely.
api-gateway (validated) → user-service (re-parsed, no validation) → db
PII Reaching Logs
Personally identifiable information — emails, phone numbers, addresses — flows into log statements, error handlers, or monitoring outputs where it becomes part of your log retention pipeline.
user.email → ErrorHandler → Logger.error(context) → CloudWatch
Cross-Service Data Leaks
Sensitive data crosses service boundaries without encryption or access control. Internal service-to-service calls often skip the security measures applied at the API gateway level.
user-service → HTTP (not HTTPS) → analytics-service (includes PII)
Sensitive Sinks
Every destination where data can cause harm.
codelake understands the security implications of each data destination. Different sinks require different protection strategies.
Database Writes
SQL queries, ORM operations, document store writes. Requires parameterized queries or proper escaping to prevent injection.
Protection: parameterization, ORM, input validation
Log Outputs
Application logs, error handlers, audit trails, monitoring services. PII in logs creates compliance risk and expands the data breach surface.
Protection: PII redaction, structured logging
External API Calls
Third-party services, webhooks, analytics platforms. Data sent to external APIs leaves your control. Ensure only necessary data is forwarded.
Protection: data minimization, TLS, allowlists
File System Writes
File uploads, temp files, export functions. Unvalidated file paths or names can enable path traversal. Uploaded content may contain malicious payloads.
Protection: path validation, content scanning
PII Detection
Know exactly where personal data flows.
codelake automatically classifies personally identifiable information in your data flows. It identifies fields like email addresses, phone numbers, physical addresses, social security numbers, and financial data — then tracks where that PII travels.
This is essential for GDPR, CCPA, HIPAA, and any privacy regulation that requires you to know where personal data is stored and processed. codelake gives you a complete PII data map without manual documentation.
PII Classification Map
Auto-classified from code analysis · No manual tagging required
Know where every byte of user data goes.
Start a free scan and let codelake trace your data flows. See which paths are unprotected, where PII is exposed, and what needs to be fixed first.