Data Flow Mapping

Follow every data path from input to storage.

User input enters your application at the frontend. It passes through API layers, gets transformed by services, and ends up in a database, a log file, or an external API call. codelake traces every step.

No other security tool tracks data flows across service boundaries. codelake follows user input through your entire application stack, identifying where sanitization is missing, where PII is exposed, and where sensitive data leaks to unintended destinations.

Data Flow — User Registration

input

User Input

POST /api/register { name, email, phone, password }

arrow_downward
router

API Gateway

VALIDATED

Schema validation → rate limiter → route

arrow_downward
warning

User Service

NO SANITIZATION

Input passed directly to query builder

arrow_downward
database

Database — users table

name, email (PII), phone (PII), password_hash

How It Works

From entry point to final destination.

codelake performs taint analysis across your entire application. Every user-controlled input is tracked as it moves through function calls, service boundaries, transformations, and storage operations.

1

Source Identification

codelake identifies all entry points where external data enters your application: HTTP request parameters, headers, file uploads, webhook payloads, message queue consumers, and environment variables.

2

Propagation Tracking

Each tainted input is followed through variable assignments, function parameters, return values, and inter-service HTTP calls or message passing. codelake tracks data even when it's transformed, serialized, or split into parts.

3

Sanitization Detection

At every step, codelake checks whether proper sanitization, validation, or encoding has been applied. Framework-specific sanitizers are recognized: Laravel's validation, Express validators, Django form cleaners, and more.

4

Sink Analysis

Data is flagged when it reaches a sensitive sink without proper sanitization: database queries, file writes, log outputs, external API calls, email templates, or OS commands. Each sink type requires specific sanitization.

What It Detects

Findings that only flow analysis can produce.

These are real vulnerability patterns that require understanding how data moves through your application. No pattern-matching scanner can find them.

database

Unfiltered Input to Database

User input reaches a database query without sanitization or parameterization at any point in the flow. codelake traces the full path, not just the final query.

POST /api/search → SearchController → SearchService → raw SQL query

shield

Missing Sanitization Layers

Data passes through multiple services where sanitization was expected but is absent. Even if one layer validates, a downstream service may re-parse the data unsafely.

api-gateway (validated) → user-service (re-parsed, no validation) → db

description

PII Reaching Logs

Personally identifiable information — emails, phone numbers, addresses — flows into log statements, error handlers, or monitoring outputs where it becomes part of your log retention pipeline.

user.email → ErrorHandler → Logger.error(context) → CloudWatch

cloud_upload

Cross-Service Data Leaks

Sensitive data crosses service boundaries without encryption or access control. Internal service-to-service calls often skip the security measures applied at the API gateway level.

user-service → HTTP (not HTTPS) → analytics-service (includes PII)

Sensitive Sinks

Every destination where data can cause harm.

codelake understands the security implications of each data destination. Different sinks require different protection strategies.

database

Database Writes

SQL queries, ORM operations, document store writes. Requires parameterized queries or proper escaping to prevent injection.

Protection: parameterization, ORM, input validation

description

Log Outputs

Application logs, error handlers, audit trails, monitoring services. PII in logs creates compliance risk and expands the data breach surface.

Protection: PII redaction, structured logging

cloud_upload

External API Calls

Third-party services, webhooks, analytics platforms. Data sent to external APIs leaves your control. Ensure only necessary data is forwarded.

Protection: data minimization, TLS, allowlists

folder_open

File System Writes

File uploads, temp files, export functions. Unvalidated file paths or names can enable path traversal. Uploaded content may contain malicious payloads.

Protection: path validation, content scanning

PII Detection

Know exactly where personal data flows.

codelake automatically classifies personally identifiable information in your data flows. It identifies fields like email addresses, phone numbers, physical addresses, social security numbers, and financial data — then tracks where that PII travels.

This is essential for GDPR, CCPA, HIPAA, and any privacy regulation that requires you to know where personal data is stored and processed. codelake gives you a complete PII data map without manual documentation.

PII Classification Map

mail Email Address
4 flows · 3 sinks HIGH
phone Phone Number
2 flows · 2 sinks HIGH
home Physical Address
1 flow · 1 sink MEDIUM
credit_card Payment Data
1 flow · 1 sink CRITICAL
badge User Name
6 flows · 4 sinks LOW

Auto-classified from code analysis · No manual tagging required

Know where every byte of user data goes.

Start a free scan and let codelake trace your data flows. See which paths are unprotected, where PII is exposed, and what needs to be fixed first.