Logging & Telemetry
In one line: Prevention always partially fails, so security depends on seeing malicious activity — and you can only see what you record, which makes logging the foundation of all detection: collect the right telemetry from endpoints, network, cloud, and identity, because a log you didn't capture is an attack you'll never detect.
Every previous chapter tried to stop attacks. This chapter accepts a hard truth: some attacks will get through. No prevention is perfect, attackers only need one gap, and zero-days exist. So a mature security program adds a second question to "how do we keep them out?" — namely, "when they get in, how fast do we notice and kick them out?" That's detection and response, the blue team's craft. And it all rests on one humble thing: logs. A security camera you never installed records nothing; a log you never collected can't reveal the break-in. Detection is, at bottom, the art of recording the right signals and then noticing the bad ones in the flood. This lesson is the recording part — what telemetry to gather and why — because everything else in the chapter (SIEM, detections, the SOC) operates on the data you collect here. Skip the right logs and the rest is blind.
Why detection exists: prevention is never enough
The assume-breach mindset, made operational. Three facts force detection into existence:
- Prevention partially fails. Patches lag, misconfigurations happen, new vulnerabilities appear, and humans get phished. Over a long enough timeline, something gets through.
- The attacker's advantage is asymmetric. Defenders must be right everywhere; an attacker needs one success. Pure prevention is a losing bet on perfection.
- The damage is in the dwell time. The gap between an attacker getting in and being caught — dwell time — is where breaches grow from a foothold into a catastrophe (recall the noisy inward journey). Detection's whole purpose is to shrink dwell time: catch the intruder during their loud lateral movement, before they reach the crown jewels.
So detection is not an admission of failure — it's the necessary second layer. Prevention reduces how often you're breached; detection-and-response reduces how badly each breach hurts.
- Telemetry — the stream of data systems emit about what they're doing (logs, events, metrics). The raw material of detection.
- Log — a timestamped record of an event (a login, a process start, a network connection, an API call).
- Dwell time — how long an attacker is present before detection. Shorter = less damage. A key program metric.
- Detection — a rule or analytic that identifies suspicious/malicious activity in telemetry.
- EDR (Endpoint Detection & Response) — software on endpoints (laptops, servers) that records detailed activity (processes, files, network) and detects/responds to threats.
- Audit log — a security-relevant record of who did what (especially privileged actions), for detection and forensics.
- Detection in depth — collecting telemetry from multiple layers so an attacker who evades one is caught by another (defense in depth for visibility).
The four telemetry sources (where to look)
An attacker's post-exploitation journey crosses several layers, and each layer can record it. Mature detection collects from all four, because attackers who hide in one are often loud in another:
| Source | What it records | Catches (examples) |
|---|---|---|
| Endpoint (EDR) | Process executions, file changes, command lines, local network connections on each host | Malware, privilege escalation, living-off-the-land tool abuse, persistence (new services/tasks) |
| Network | Connections, DNS queries, traffic volumes/flows between hosts | Lateral movement, C2 beaconing, exfiltration (unusual outbound data), SSRF to internal/metadata |
| Cloud | API calls / control-plane activity (e.g., who created/changed/deleted resources), via cloud audit logs | Misconfig changes, credential abuse, suspicious resource creation, metadata-credential use from odd locations |
| Identity | Authentications, MFA events, privilege grants, role changes | Credential stuffing, impossible-travel logins, suspicious privilege escalation, new admin accounts |
Trace the credential-reuse breach from the offensive chapter through the telemetry:
- Phished login → Identity logs show a successful login from a new country/device at an odd hour ("impossible travel").
- Local privilege escalation → Endpoint (EDR) logs an unusual privileged process and credential-dumping behavior.
- Lateral movement via reused credentials → Network logs a workstation suddenly authenticating to servers it never normally touches; Identity logs that service account authenticating in strange places.
- Data theft → Network logs a large, unusual outbound transfer; Cloud logs mass reads from a storage bucket.
The same attack left a trail at four layers. A defender collecting all four has four chances to catch it; one collecting none is blind no matter how good their analysts are. This is detection in depth — and it's exactly why the post-exploitation phase being "noisy" is the defender's opportunity: the noise lands in these logs.
The hard part: collect the right things, not everything
Logging has a deceptive trap. "Log everything" sounds safe but is a failure mode: infinite volume, crushing cost, and so much noise that real signals drown (and queries crawl). The skill is collecting security-relevant, high-value telemetry at a manageable volume:
- Prioritize security-relevant events. Authentications, privilege changes, process creation, admin actions, and outbound network flows carry far more detection value per byte than, say, every debug line.
- Ensure logs are useful — they need enough context (who, what, when, where, source IP, user, command) to actually investigate. A log saying "error" with no detail is noise.
- Centralize them (the next lesson, SIEM) — scattered logs on individual hosts can't be correlated, and an attacker can delete local logs to cover tracks.
- Protect log integrity. Attackers delete or tamper with logs to hide (anti-forensics). Ship logs off-host to a central, append-only store the attacker can't reach, and monitor for logging stopping (a gap can itself be a signal).
- Mind retention. Breaches are often discovered months later (long dwell time); if you only keep 7 days of logs, the evidence is gone. Balance retention against cost deliberately.
This is the foundational constraint of the entire chapter and the next. Detection engineering, SIEM queries, threat hunting, and forensics can only operate on data you captured and kept. The single most common reason an organization can't answer "what did the attacker do?" after a breach is missing logs. So logging decisions made calmly, in advance, determine whether you're blind or sighted during an incident. Get this layer right and everything downstream becomes possible; get it wrong and no amount of clever analysis can compensate.
Why it matters
- It's the substrate of all detection and response. SIEM, detections, the SOC, threat hunting, and forensics are all operations on telemetry. No telemetry, no blue team.
- It directly shrinks dwell time. The faster you can see an attacker's noisy steps, the faster you respond — and dwell time is the variable that decides whether a breach is contained or catastrophic.
- It's where the offensive chapter pays off defensively. Every post-exploitation technique you learned generates signals in one of these four sources. Knowing the attacker's journey tells you exactly what to log and where to look.
Common pitfalls
- "Log everything." Infinite volume buries signal, balloons cost, and slows queries. Collect high-value, security-relevant telemetry with enough context to investigate.
- Logging only one layer. Endpoint-only (or network-only) leaves blind spots; attackers loud in one layer hide in another. Collect across endpoint, network, cloud, and identity.
- Logs with no context. "Error" or a bare event with no who/what/when/where can't be investigated. Ensure logs carry actionable detail.
- Leaving logs on the host. Local logs can be deleted by the attacker and can't be correlated. Centralize to an append-only store off-host.
- Too-short retention. With months-long dwell times, short retention destroys the evidence before discovery. Set retention against realistic detection timelines.
- Not monitoring for missing logs. Silence can mean an attacker stopped logging. Treat unexpected logging gaps as a signal, not an absence of one.
Page checkpoint
Did logging & telemetry click?
Pass to unlock the Next button belowWhat's next
→ Continue to SIEM — the system that aggregates all this telemetry into one place where it can be correlated, searched, and turned into detections at scale.
→ Going deeper: the attacker behaviors these logs capture are post-exploitation; using this telemetry after a breach is Incident Response & Forensics.