April was a busy month for consumers of information security reports as two highly cited reports released 2016 versions: the Trustwave Global Security Report and the Verizon DBIR. And shortly thereafter, security luminaries start picking them apart for various reasons. One of the challenges with these reports is the datasets have some bias. Early on in the DBIR, the bias was substantial because the only data used in the analysis came from Verizon. As the report gained wider distribution, more datasets were included to reduce the bias.
Make no mistake, there is still bias in the data as it only represents a subset of what is actually happening in the industry. You can even tell how different Trustwave’s & Verizon’s customer bases are by looking at how they represent compromise data and which items pop to the top.
As you work through the reports, I have two great blog posts that are absolutely worth reading. The first is from Michael Roytman of Kenna who discusses an enlightened view of top vulnerabilities, brought to his attention by Adrian Sanabria of 451 Research. In this post, Roytman goes through the data sources and how the summaries are constructed. In it, he links to this fantastic post by Jericho who really digs into the issues with the top 10 vulnerabilities. He also discusses something that is missing from much of information security research reports—the ability to reproduce results.
[…] the report does not explain the methodology for detecting the vulnerabilities, does not include details about the generation of the statistics, and provides a loose definition of what “successfully exploited” means.
Aside from the obvious reasons around why people hold on to security data, the point is, well, on point. Many practitioners shun academic research, but these are standard tenets for anyone whose work is featured in respected publications. It’s the same problem I’ve had with the Ponemon report all these years. The methodology, data sources, and data dictionary are loosely discussed, summary statistics are missing, and the results work better as a headline than as a report.
When research is truly representative, the conclusions drawn from the data should be at least similar, if not very close. To Jericho’s point, “Why doesn’t this list remotely match US-CERT’s ‘Top 30 Targeted High Risk Vulnerabilities’ […]?”
Perhaps one of the best features in both reports is that they have removed all stats around what a data breach costs. Page 64 of the Verizon DBIR lays out the rationale perfectly.
As we are all digesting these reports and trying to make sense of what they mean for those of us fighting the good fight every day, remember that we need to consider all of the data available to us, and make decisions full well realizing the bias in the data.