Episode 23 — Set SOC Metrics That Drive Quality, Not Ticket Volume Theater

In this episode, we zoom in on a quiet force that shapes almost everything your Security Operations Center (S O C) does: metrics. Metrics are not neutral, even when they look objective on a dashboard, because people naturally optimize for what gets measured and rewarded. If you measure the wrong thing, you do not just get bad reporting, you get warped behavior that slowly drifts your program away from real risk reduction. This is why metrics design is not a reporting task; it is a governance and culture task with operational consequences. A mature S O C treats metrics as guardrails that protect quality under pressure, especially when alert volume spikes and leadership wants quick reassurance. The goal is not to create a pretty scorecard, but to create signals that encourage careful investigations, appropriate escalation, and durable improvement over time. When you get this right, metrics stop being a performance theater and start being a practical steering wheel.

Before we continue, a quick note: this audio course is a companion to our course companion books. The first book is about the exam and provides detailed information on how to pass it best. The second book is a Kindle-only eBook that contains 1,000 flashcards that can be used on your mobile device or Kindle. Check them both out at Cyber Author dot me, in the Bare Metal Study Guides Series.

The first design principle is simple and sometimes uncomfortable: metrics shape behavior, so you must choose them with intention and anticipate how they will be gamed. A metric that sounds reasonable in a meeting can create harmful shortcuts when analysts are under time pressure. If you only track closure speed, you will quietly incentivize shallow triage and premature case closure, which increases false negatives and hides risk until it becomes an incident. If you only track ticket volume, you will reward noise creation, because the easiest way to produce more output is to split work into more units and escalate uncertain activity without resolving it. The best metrics acknowledge that S O C work is a blend of speed, accuracy, judgment, and communication, and any single number will be incomplete. This means you should treat metrics as a balanced set rather than a single target, and you should regularly inspect whether the set is producing the behaviors you want. When you think like an adversary, not only of attackers but of incentives, you design metrics that are harder to exploit and easier to trust.

Quality metrics are the backbone of that balanced set, and quality must be defined in operational terms rather than as a vague aspiration. One foundational metric is the true positive rate, which requires that you define what counts as a confirmed security issue in your environment and how confirmation is documented. Another foundational metric is investigation depth, which you can assess by examining whether analysts consistently identify affected entities, establish a defensible timeline, and capture supporting evidence across multiple relevant sources. Quality also includes decision appropriateness, meaning that the selected response action matched the risk, the confidence level, and the business constraints at the time. You can also track repeatability, which is whether similar cases are handled consistently by different analysts without wildly different outcomes. These metrics encourage analysts to do complete work rather than fast work, and they encourage detection engineers to refine signals so investigations start from better inputs. The subtle benefit is cultural, because quality metrics communicate that careful thinking and defensible evidence matter as much as speed.

Time metrics still matter, but they must be framed as risk exposure measures rather than as productivity measures. Time to detect is commonly tracked as Mean Time to Detect (M T T D), and time to contain is commonly tracked as Mean Time to Contain (M T T C). These can be powerful when you define them precisely, because they represent how long an attacker or failure condition can persist before you notice and limit damage. Precision is crucial, because otherwise the numbers become political rather than informative. For example, detection time should be anchored to when suspicious activity first occurred or first became observable in your telemetry, not merely when an alert fired, because alerting delays can hide real exposure. Containment time should be anchored to when effective limiting action occurred, not when someone acknowledged a ticket, because acknowledgment is not mitigation. You also want to segment M T T D and M T T C by severity and incident type, because combining everything into a single average will mislead you and invite false confidence. Used correctly, these time metrics help you invest in better telemetry, better detections, and clearer escalation paths.

Backlog health is another area where metrics can either support quality or accidentally punish prudence. A backlog number on its own does not tell you whether your S O C is unhealthy, because a backlog can contain many low-risk items that are intentionally deprioritized while high-risk items are handled promptly. The danger appears when leadership treats backlog size as a proxy for failure and then pressures the team to close everything quickly, which turns backlog reduction into a ritual rather than a risk-based decision. A better approach is to measure backlog composition and aging, focusing on how many high-severity items exceed a defined age threshold and how often items are reopened due to incomplete investigation. You can also measure queue volatility, which is how quickly new cases arrive relative to the team’s capacity to triage them with sufficient depth. This kind of backlog measurement encourages disciplined prioritization without framing careful work as laziness. It also reveals whether the problem is staffing, signal quality, or process, rather than simply implying that the team should work harder. When backlog metrics respect context, they become a tool for realistic planning instead of a blunt instrument.

Learning metrics are where many programs miss an opportunity, because learning is the mechanism that turns incident pain into long-term improvement. A S O C that only measures response outputs is like a medical team that only measures how many patients were discharged, not whether treatment reduced repeat visits. Learning metrics can include tuning improvements, such as the number of high-noise detections that were materially improved and validated within a defined period. They can include repeat issue reduction, such as tracking how often the same benign pattern generates alerts after a tuning effort, or how often the same misconfiguration triggers repeated incidents. They can also include detection coverage growth, which might measure how many critical log sources are onboarded with validated parsing and baseline profiles, or how many high-priority tactics are now detected with acceptable fidelity. Another learning metric is post-incident action completion, where you track whether identified improvements are actually implemented rather than celebrated and forgotten. These metrics shift the mindset from handling today’s alerts to making tomorrow’s alerts fewer and more meaningful. Over time, learning metrics are one of the strongest predictors that the S O C will mature rather than stagnate.

A classic pitfall is rewarding ticket count, because ticket volume theater looks productive while quietly undermining both security outcomes and analyst development. When ticket count becomes the goal, the system encourages small, fragmented work units, premature escalation, and minimal narrative quality. Analysts learn to optimize for closure, not for truth, and that is corrosive in a field where uncertainty is normal and evidence quality matters. Ticket count incentives also create a cynical relationship with detection engineering, because noisy detections are no longer a problem to solve, they are a source of work that makes numbers look good. Even worse, volume incentives can drive underreporting of ambiguous but important signals, because ambiguity requires time and careful reasoning that does not translate into rapid closures. The result is a dashboard that looks impressive while the organization remains exposed to real threats that do not fit the easy patterns. This is why a mature metrics program is deliberately skeptical of pure throughput measures, especially when they are not paired with quality inspection. If you must measure throughput, you treat it as a capacity indicator, not as a virtue.

One quick win that can transform your metrics culture is to review case samples and score quality in a structured, repeatable way. This works because it bypasses the limitations of aggregate numbers and forces you to look at real work product. Case sample reviews can evaluate whether the analyst captured key evidence, whether the timeline is coherent, whether the reasoning is explicit, and whether escalation decisions were justified. Over time, you can identify recurring gaps, such as missing identity correlation, weak scoping, or unclear communication with system owners, and you can turn those gaps into targeted training and playbook improvements. This approach also helps calibrate what good looks like across the team, which reduces inconsistency between analysts and shifts the culture toward craftsmanship. It is important that quality scoring is not used as a punishment device, because fear will drive people to hide uncertainty and avoid hard cases. Instead, quality scoring should be framed as coaching data and process feedback, with a focus on lifting the baseline rather than spotlighting individual mistakes. When done consistently, sample review turns the S O C into a learning system rather than a ticket factory.

Now consider a scenario where leadership asks for dashboards, often with the implicit hope that charts will provide certainty in a domain that rarely offers it. The wrong move is to comply by surfacing whatever is easiest to count, because you will end up institutionalizing vanity metrics and teaching the organization to equate activity with safety. The better move is to propose dashboards that reflect meaningful outcomes, using a balanced set that includes quality, speed, learning, and stakeholder experience. You can show trends in confirmed detection quality, trends in M T T D and M T T C segmented by severity, and trends in backlog aging for high-risk categories. You can also show learning indicators, such as how many high-noise detections were improved and how repeat issues are trending. Most importantly, you can pair dashboard visuals with plain-language interpretation that explains what the metrics mean and what they do not mean, because dashboards without interpretation become misused. This is where seasoned leadership support matters, because a well-designed dashboard invites smarter questions rather than demanding simplistic conclusions. The goal is to help leadership make resourcing decisions and risk decisions, not to create a comforting illusion of control.

Balancing metrics across speed, accuracy, and customer satisfaction is how you prevent your S O C from optimizing one dimension at the expense of the others. Speed metrics like M T T D and M T T C reduce exposure, but speed without accuracy can cause unnecessary disruption, such as isolating systems based on weak evidence. Accuracy metrics like true positive rate protect focus, but accuracy without speed can allow real threats to persist too long. Customer satisfaction metrics, where the customer is internal stakeholders such as I T operations, engineering, and business owners, reflect whether the S O C is a trusted partner or a constant source of friction. Satisfaction here is not about making everyone happy; it is about whether escalations are actionable, communication is clear, and response decisions consider operational reality. You can also measure responsiveness to stakeholders during incidents, such as whether requested context is provided promptly and whether follow-up questions are handled with clarity. A balanced set of metrics encourages the S O C to be both decisive and careful, which is the real operational sweet spot. When metrics are balanced, they act like a stabilizer that keeps the program from swinging wildly between extremes.

Trends are where metrics become decision support, especially when you use them to justify staffing, tools, and training with evidence rather than anecdotes. For staffing, you can show workload patterns over time, including case complexity, interruption rates, and after-hours incident load, to justify changes that reduce burnout and error rates. For tooling, you can show whether detection fidelity improved after onboarding new telemetry or whether investigation time decreased after better enrichment and correlation capability. For training, you can show quality score improvements in case samples, reduced escalation bounce-back, or faster scoping performance after targeted coaching in specific domains. The key is to link investments to measurable outcomes, while being honest about what is correlation and what is causation. Leaders tend to support investments when they can see a plausible chain from spending to risk reduction, and metrics trends can provide that chain when designed thoughtfully. Trends also help you spot early warning signs, such as a rising false positive rate or a widening gap between detection and containment, before those problems turn into crises. In this way, metrics become an early warning system for the S O C itself.

A memory anchor worth keeping close is that you measure outcomes, not activity for activity. Outcomes are the reduction of real risk, the timely detection of meaningful threats, the containment of incidents with appropriate disruption, and the continuous improvement of the system so tomorrow is better than today. Activity is everything that can be counted, including alerts processed, tickets closed, and meetings held, and activity can be impressive while outcomes remain poor. When you align metrics with outcomes, you encourage work that matters even when it is slower, such as a careful investigation that prevents a major incident. You also protect against the temptation to manufacture busywork that looks productive to outsiders but does not improve safety. Outcome-focused measurement is also psychologically healthier for analysts, because it recognizes the value of thoughtful work and reduces the pressure to perform shallow speed. This anchor is especially helpful when the organization is stressed, because stress makes people reach for simple numbers, and simple numbers often lead to harmful incentives. If the metric does not connect to risk reduction or operational effectiveness, it is likely a distraction.

Sharing metrics transparently is another lever that improves quality, because transparency builds trust and enables cooperation across teams. When stakeholders can see how the S O C is performing, including where it is strong and where it is constrained, they are more likely to treat the S O C as a partner rather than as a black box. Transparency also reduces rumor-driven narratives, such as the belief that analysts are slow because they are not working hard, when the real cause is noisy signals or missing telemetry. Transparency must be paired with context, because raw numbers without explanation can be misinterpreted and weaponized. This means you should explain what changed when a trend shifts, such as a major onboarding effort, a new detection set, or a business event that increased risk. It also means acknowledging uncertainty, because some metrics are proxies and some incidents do not fit neat categories. Over time, transparent reporting encourages shared ownership of improvement, because other teams start to see how their logging practices, change discipline, and response collaboration affect outcomes. In mature environments, transparency turns metrics from an internal report into a cross-functional improvement tool.

As a quick mini-review, keep four concrete metrics in mind and what each reveals about the health of your S O C. True positive rate reveals whether your detections are meaningful enough to justify analyst time and whether the team is focusing on real security issues. M T T D reveals how quickly you notice meaningful threats, which is a measure of both telemetry visibility and detection effectiveness. M T T C reveals how quickly you limit damage once something is detected, which reflects escalation clarity, response authority, and collaboration with operational teams. Case sample quality scores reveal whether investigations are deep enough to be defensible and repeatable, and they uncover training and process gaps that aggregates cannot show. You might also remember backlog aging for high-severity work, which reveals whether risk is accumulating due to capacity or signal quality problems. The point of this review is to reinforce that each metric answers a different question, and no single metric should be treated as the truth. When you can explain what each metric reveals and what it cannot reveal, you are less likely to fall into dashboard mythology. That explanatory skill is part of what makes metrics useful rather than performative.

To conclude, replace one vanity metric with a quality metric and observe what changes in behavior over the next reporting cycle. A vanity metric might be raw ticket closures, raw alerts processed, or any number that rewards motion without verifying value. A quality metric might be true positive rate, investigation depth as reflected in case sample scoring, or the percentage of escalations that include required evidence and coherent timelines. The replacement matters because it sends a signal to the team about what leadership values and what good work looks like. When you replace vanity with quality, you also change what conversations sound like, shifting from how many things were done to how well meaningful work was done and what improved as a result. Over time, that shift reduces noise, reduces burnout, and increases trust across the organization, because stakeholders receive clearer, more actionable outcomes. Metrics are one of the few levers that can shape culture without dramatic reorganizations, but only if you treat them as behavioral design rather than as reporting decoration. When you choose quality metrics deliberately and retire volume theater, you give your S O C permission to focus on truth, impact, and continuous improvement.

Episode 23 — Set SOC Metrics That Drive Quality, Not Ticket Volume Theater
Broadcast by