Artificial intelligence (AI) is a bit of a buzzword, and it has been thrown around quite a bit in the past few years. But many companies are making real game-changing use of it.
At WhiteHat, we use AI to improve both speed and accuracy of our Application Security Platform.
AppSec teams are constantly caught between the need to keep pace with security testing and the ability to allow developer teams to operate in the rapid DevOps environment. Our AI software dramatically decreases threat vector identification times and improves the efficiency of false positive identification. As a result, enterprises can increase the speed at which developers are made aware of potential application security vulnerabilities and deliver real-time security risk assessments.
But to fully understand the impact this technology has, we need to investigate the meaning behind AI.
WHAT ARE ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING?
AI is not just 1,000,000 If-Else statements in a trenchcoat. It is a broad category that can be summed up as “the study of intelligent agents.” However, what we actually use at WhiteHat is a subset of AI, called machine learning (ML).
ML uses a sophisticated process to ingest extremely large amounts of labelled data in order to “train” a model that can later accurately classify new information. This is usually what is working behind the scenes for programs such as facial recognition.
As our senior engineer likes to say, “In a nutshell, machine learning is a tool you use to reverse engineer a statistical model, when building it the classical way would be way too complex or impossible.”
HOW CAN MACHINE LEARNING APPLY TO APPLICATION SECURITY?
At its core, machine learning is a heavy-duty classification engine. It can answer questions like:
- “Whose face is this? Amy’s, Betsy’s or Christie’s?”
- “Is this a picture of a dog or a cat?”
- “Is this thing a risk or not?”
The algorithm then (usually) assigns a weight to each answer: “This picture looks 87% like Amy, 26% like Betsy and 3% like Christie,” for instance. From that, you can conclude that is probably a picture of Amy.
A large part of running a successful security program is having metrics, identifying risks and identifying incidents, all of which are forms of classification. Does this set of signals and responses look more like a risk or more like normal, everyday responses?
WHAT DOES THIS MEAN FOR THE USER?
When all is said and done, AI and ML are just supporting technologies. Generally speaking, to the user, saying “our product uses AI” should mean no more than if we said “some of our backend services are written using Go.”
ADDRESSING THE DATA GAP
One of the challenges to any machine learning project is getting the required training data. Generally speaking, the more complex the problem, the more training data you need, and the more training data you have, the more accurate your model will be. The data also has to be labeled in a way that a machine can read it. It’s not enough to just have 1,000 pictures of Amy, Betsy and Christie. You have to have someone label the faces in each picture. This can be a major roadblock for many startup projects.
We are lucky at WhiteHat to be able to sidestep this issue. We have almost 19 years’ worth of findings from our scanner, all of which have already been labeled as “vulnerable” or “not vulnerable” by our Threat Research Center. Our engineers were able to take this data and use it to train models that aide our Threat Research Center in verifying findings, resulting in faster overall delivery of service.
Our machine learning team is involved in quite a few research projects, each of which I will explore more deeply in future articles.
The first, and probably most obvious, is to help classify findings (as mentioned above). This will help answer: “Is this finding obviously a false positive or obviously vulnerable?”
The second major focus of research is to identify “landing spaces.” This is a more technical concept that refers to allowing the scanner to infer metadata about the surrounding response in the same way a human attacker or researcher would.
The third major focus of research is form classification. This will help answer: “Is this a search form or a contact-us form?” “Does this form delete or create anything?” This is necessary to allow for more accurate testing, while still remaining safe in a production environment.
While WhiteHat is utilizing AI and ML, it is only the beginning, and we look forward to sharing these upcoming developments with our customers.