Static Analysis-Tools and Applications-Web Application Security

Announcing Support for PHP in WhiteHat Sentinel Source

It is with great pleasure that I formally announce support for the PHP programming language in WhiteHat Sentinel Source! Joining Java and CSharp, PHP is now a member of the family of languages supported by our static code analysis technology. Effective immediately, our users have the ability to start scheduling static analysis scans of web applications developed in PHP for security vulnerabilities. Accurately and efficiently supporting the PHP language was a tremendous engineering effort; a great reflection upon our Engine and Research and Development teams.

Static code analysis for security of the PHP programming language is nothing new. There are open source and commercial vendors already claiming support for this language and have done so for quite some time now. So what’s so special about the support for PHP in WhiteHat Sentinel Source? The answer centers on our advancement in the ability for static analysis to model and simulate the execution of PHP accurately. Almost every other PHP static code analysis offering is limited by the fact that it cannot overcome the unique challenges presented by dynamic programming languages, such as dynamic typing. With an inability to accurately model type information associated with expressions and statements, for example, users were unable to correctly capture rules needed to identify security vulnerabilities. WhiteHat Sentinel Source’s PHP offering ships with a highly tuned type inference system that piggy-backs off our patented Runtime Simulation algorithm to provide much deeper insight into source code – thereby overcoming the limitations of previous technologies.

Here is a classic PHP vulnerability that most, if not all, static code analysis tools should identify as a Cross-Site Scripting vulnerability:




$unsafe_variable = $_GET['user_input'];



echo "You searched for:<strong>".$unsafe_variable."</strong>";



This code retrieves untrusted data from the HTTP request, concatenates that input with string literals, and finally echoes that dynamically constructed content out to the HTML page. Writing static code analysis rules for any engine capable of data flow analysis is fairly straight forward… the return value of the _GET field access is a source of untrusted data and the first argument to the echo function invocation is a sink. Why is this easy to capture? Well… the signatures for the data types represented in the rules are incredibly simple. For example, the signature for the echo method is “echo(object)” where “echo” is the function name and “(object)” indicates that it accepts one argument whose data type is unknown. We assume all parameter types are the type ‘object’ to keep things simple; we cannot possibly know all parameter types for all function invocations without performing more rich analysis. Static analysis tools all have their own unique way of capturing security rules using a custom language. To keep discussions about static analysis security rules vendor agnostic, we will only be discussing rule writing and rule matching at the code signature level. Let’s agree on the format of [class name]->[function name]([one or more parameters]).

Let’s make this a little more interesting. Consider the following PHP code snippet that may or may not be indicative of a SQL Injection vulnerability:






$anonymous = new Anonymous ();



$anonymous->runQuery($_GET['name']);





This code instantiates a class called “Anonymous” and invokes the “runQuery” method of that class using unstrusted data from the HTTP request. Is this vulnerable to SQL Injection? Well – it depends on what runQuery does. Let’s assume that the Anonymous class is part of a 3rd party library for which we do not have the source or for which we simply do not want to include in the scan. Let’s also assume that we know runQuery dynamically generates and executes SQL queries and is thus considered a sink. Based on these assumptions, a manual code review would clearly indicate that yes, this is vulnerable to SQL Injection. But how would we do this with static analysis? Here’s where it gets tricky…

We want to mark the “runQuery” method of the “Anonymous” type as a sink for SQL commands such that if untrusted data is used as an argument to this method, then we have SQL Injection. The problem is we need to not only capture information about the “runQuery” in our sink rule, but we must also capture the fact that it is associated with the “Anonymous” class. The code signature that must be reflected in the security rule would look as follows: Anonymous->runQuery(object).

Unfortunately, basic forms of static analysis are unable to determine with reasonable confidence that $anonymous variable is of type Anonymous – in fact $anonymous could be of any type! As a result, the underlying engine is never able to match the security rule to the $anonymous->runQuery($_GET[‘name’]) statement resulting in a lost vulnerability.

How does WhiteHat Sentinel Source’s PHP offering overcome this problem? it’s simple… in theory. When the engine first runs, it builds a model of the source code and attempts to compare the date sink rule against the $anonymous->runQuery($_GET[‘name’]) method invocation. At this point, the only information we have about this statement is the method name and number of arguments producing a code signature as follows: ?->runQuery(object). Compare this signature to the signature represented in our sink rule: Anonymous->runQuery(object). Since we cannot know the type of the $anonymous variable at this point, we perform a partial match such that we only compare the method name and arguments to those captured in the security rule. Since the statement’s signature is a partial match to our rule’s signature, we mark the model as such and move on.

After processing all our security rules, the static code analysis engine will begin performing data flow analysis on top of our patented Runtime Simulation algorithm looking for security vulnerabilities. The engine will first see the expression “$anonymous = new Anonymous();” and actually record the fact that the $anonymous variable, at any point in the future, may be of type “Anonymous”. Next, the engine will hit the following statement “$anonymous->runQuery($_GET[‘name’]);”. This statement was previously marked with a SQL Injection sink via a partial match. We now have more information about the $anonymous variable and now check if it is a full match with the original security rule. The statement’s signature is now known to be: Anonymous->runQuery(object) which fully matches the signature represented by the sink security rule. With a full match realized, the engine treats this statement as a sink and flags the SQL Injection vulnerability correctly!

Ok – that was a little too easy. Let’s make this more challenging… consider the following PHP code snippet:




$anonymous = null;



if (funct_a() > funct_b()) {

     $anonymous = new Anonymous();

} elseif (funct_a() == funct_b()) {

     $anonymous = new NotSoAnonymous();

} else {

     $anonymous = new NsaAnonymous();

}



$anonymous->runQuery($_GET['name']);



What the heck does the engine do now when it sees the runQuery statement? The engine will collect all data type assignments for the $anonymous variable and use all such types in an attempt to realize a full matching for the runQuery statement. The engine will see three different possible code signatures for the statement. They are as follows:

Anonyous->runQuery(object)

NotSoAnonymous->runQuery(object)

NsaAnonymous->runQuery(object)

The engine will take these three signatures and compare to the data type found in the signature represented by the partial match security rule. Given that the first of three signatures fully matches the signature represented by the security rule, the engine will treat the statement as a sink and flag the SQL Injection vulnerability correctly!

You may be asking yourself: “what if $anonymous was of type NotSoAnonymous or NsaAnonymous? Would it still flag a vulnerability?” The answer is a resounding yes. Static analysis technologies do not, and in my opinion should not, attempt to evaluate conditionals as such practice will lead to an overwhelming number of lost vulnerabilities. Static code analysis could support trivial conditionals, such as comparing primitives, but conditionals in real-world code require much more guesswork and various forms of heuristics that ultimately lead to poor results. Even so, is it not fair to say that at some point in the application “funct_a()” will be greater than “funct_b()”? Otherwise, what is the point of the conditional in the first place? Our technology assumes all conditionals will be true at some point in time.

Remember when I said this was easy in theory? Well, this is where it starts to get really interesting: Consider the following code snippet and assume we do not have the source code available for “create_class_1()”, “create_class_2()” and “create_class_3()”:




$anonymous = null;



if (funct_a() > funct_b()) {

     $anonymous = create_class_1();

} elseif (funct_a() == funct_b()) {

     $anonymous = create_class_2();

} else {

     $anonymous = create_class_3();

}



$anonymous->runQuery($_GET['name']);



Now what is the range of possible data types for the $anonymous variable when used in the vulnerable statement? This is where we being to stress the capabilities of the security rule language itself. WhiteHat Sentinel Source solves this by allowing the security rule writer to explicitly define the data type returned from a function invocation if such data type cannot be determined based on the provided source code. For example, the security rule writer could capture a rule stating that the return value of the create_class_3() function invocation is of type Anonymous. The engine would then take this information, propagate data types as before and correctly flag the SQL Injection vulnerability associated with runQuery method invocation.

WhiteHat Sentinel Source’s type inference system allows us to perform more accurate and efficient analysis of source code in dynamically typed languages. Our type inference system not only allows us to more accurately capture security rules, but it also allows us to more accurately model control flow from method invocations of instances to method declarations of class declarations. Such capability is critical for any real world static code analysis of PHP source code.

I hope you enjoyed this rather technical and slightly lengthy blog post. It was a blast building out support for PHP and I look forward to our Sentinel Source customers benefiting from our newly available technology. Until next time…

Tags: whitehat security