Breaking News

Benchmarking accuracy of automated AppSec testing

Over the past year, data breaches, through web, business, and mobile application exploitation, have continued to run rampant. In 2018, major household names like Ticketmaster, the United States Postal Service (USPS), Air Canada, and British Airways were hit by application-based exploits. To minimize vulnerabilities–and identify existing ones before they can do this level of damage–application security solutions need to be fast, provide good coverage for capturing all classes of vulnerabilities, and more importantly, they need to be highly accurate, to be useful to DevOps application development teams. Providing results fast but less accurately is counter-productive to an efficient and successful application security program. Time wasted by engineers to triage the false positives far outweighs the speedier results provided.

Most automated application security testing solutions have the ability to scan thousands of applications containing millions of lines of code and can produce results containing millions of attack vectors. But every application is different–different functionality, different code, different size, and different complexity–resulting in significantly different security findings with different accuracy. More so, selecting any single scanned application with the best accuracy from many and claiming accuracy is misleading. Even taking averages would be misleading, because it would be a measure of only the limited set of applications that the vendor’s solution scanned, and hence, incomparable to the accuracy of other solutions. So, how do you benchmark and compare the accuracy of application security testing solutions?

OWASP Benchmark

OWASP Benchmark Project is a vendor-neutral, well-respected, and true indicator of accuracy that can be used to compare different solutions. It is a free and open testing project that evaluates how automated software vulnerability detection tools stack up in terms of accuracy. The project is sponsored by DHS and has created a huge test suite to gauge the true effectiveness of all kinds of application security testing tools–over 21,000 test cases. It calculates an overall score for application security solutions, based on both true positive rate (TPR) and false positive rate (FPR).


True Positive Rate (TPR): True Positives / ( True Positives  + False Negatives ) – Also referred to as Precision. False Positive Rate (FPR): False Positives  / ( False Positives + True Negatives ). OWASP Benchmark Score: Normalized distance from the “guess line” TPR – FPR.


WhiteHat Security, like many other vendors who use OWASP Benchmark, believes in the value of an unbiased score for vulnerability assessment.

According to the most recent SAST vendor evaluations, WhiteHat Security’s Sentinel Source Standard Edition (SE), covering the deployment phase of the software lifecycle (SLC), scored 77 percent. While WhiteHat Security’s Sentinel Source Essentials Edition (EE), which provides SAST for the DevOps Build/Test phase, received a 42 percent accuracy rating. To put it in perspective, the next highest rating on the OWASP Benchmark accuracy chart was 39 percent.

Ten non-commercial and six commercial SAST solutions have been submitted to the OWASP Benchmark Project for scoring—most anonymously. The commercial average capped at just 26 percent, a 50 percent gap from Sentinel Source SE’s rating.

 Ref: https://rawgit.com/OWASP/Benchmark/master/scorecard/OWASP_Benchmark_Home.html

This accomplishment supports WhiteHat Security’s mission to help businesses build the most secure applications by providing the deepest coverage, faster speeds, and highest accuracy available on the market, while most SAST solutions only provide one or two of these critical components. Learn more here: https://www.whitehatsec.com/products/static-application-security-testing/.