Automated Scanners vs. Low-Hanging Fruit
Pasted Graphic Low-Hanging Fruit (LHF) are vulnerabilities that are easy to find and exploit. We certainly don't want these types of issues in our websites, especially if they can be quickly mitigated with a small amount of effort. In network security, scanning does the trick for LHF identification. Unfortunately, in website security, though scanning is absolutely vital, it’s not that simple or sufficient. That’s because LHF may fall into either technical vulnerabilities, which website vulnerability scanners can find, or business logic flaws, which they can't find much of any.

Technical vulnerabilities, including Cross-Site Scripting (XSS) and SQL Injection, can be found in large supply by scanners and usually can be classified as LHF. For instance, when a website echoes user-supplied HTML, that’s a dead giveaway of an XSS vulnerability. The same with SQL Injection and the notorious ODBC error messages dumping database statements. These instances are easy to spot and exploit. Though as common as these issues are, they’re not always classifiable as LHF.

New XSS issues in YahooMail, MySpace, Gmail, sla.ckers.org (heh) and other high profile websites have become significantly harder to come by because so many people already cherry picked the easy stuff. Discoveries often rely on clever filter-bypass tricks (XSS Cheat Sheet), complex input encoding techniques (UTF-7 or US-ASCII), or sophisticated combinations. SQL Injection exploits frequently have to be performed blind because helpful error messages are suppressed. These instances could be comfortably labeled Mid-tier or even (shall we say) Golden Apples since they reside far out of the reach of scanners, and most humans for that matter.

Then we have business logic flaws like Abuse of Functionality and Insufficient Authentication/Authorization. These mostly require humans (security experts) to uncover them even when classifiable as LHF. For example, during the MacWorld 2007 Expo, several people discovered an easy (LHF) way to obtain free Platinum Passes (a $1,695 value with a chance to see Apple's CEO Steve Jobs up close). By viewing the source code of the sign-up web page, they found "hidden" Priority (Discount) Codes freely usable during registration. Unlike humans, scanners wouldn’t recognize the significance of Priority Codes, how to use them, what the page looks like when they're accepted/denied, let alone being able to pick up the badge to verify the attack succeeded.

WhiteHat Security's engineers continually discover a wide variety LHF business logic flaws in a majority of the websites they assess. The more sophisticated the business logic flaw, the more expertise is required to identify the vulnerability and its remediation. Anyone can find one or two business logic flaws, but it takes a team of experts to try to find them all, all of the time. That’s a big reason why good, complete website vulnerability management is so hard to achieve.

From my experience, any class of attack can be LHF, Mid-tier, or Golden Apples. And, any vulnerability identifiable through a purely automated fashion (a scanner) can be classified as LHF – since anyone without much skill may buy/download a scanner, find a few technical vulnerabilities, and begin exploiting websites. Still, WhiteHat believes the goal of an effective website security program should be to find and manage all the vulnerabilities all the time. Weeding out the LHF can be a good first step. There’s no reason to make exploiting websites that easy for the bad guys.
Input Validation or Output Filtering, which is Better?
This question is asked regularly with respect to solutions for Cross-Site Scripting (XSS). The answer is input validation and output filtering are two different approaches that solve two different sets of problems, including XSS. Both methods should be used whenever possible. However, this answer deserves further explanation.

Input Validation
(aka: sanity checking, input filtering, white listing, etc.)
Input validation is one of those things ranted about incessantly in website security, and for good reason. If input validation was done properly and religiously throughout all website code we’d wipe out a huge percentage of vulnerabilities, XSS and SQL Injection included. I’m also a believer that developers shouldn’t have to be experts in all the crazy attacks potentially thrown at a websites. There’s simply too much to learn and their primary job should be writing new code, not to become website hackers. Developer should only have to concern themselves with the solutions required to mitigate any attack no matter what it might be. This is where input validation comes in play.

Input validation should be performed on any incoming data that is not heavily controlled and trusted. This includes user-supplied data (query data, post data, cookies, referers, etc.), data in YOUR database, from a third-party (web service), or elsewhere. Here are the steps that should be performed before any incoming data is used:

Normalize
URL/UTF-7/Unicode/US-ASCII/etc decode the incoming data.

Character-set checking
Ensure the data only contains characters you expect to receive. The more restrictive the rules are the better.

Length restrictions (min/max)
Ensure the data falls within a restricted minimum and maximum number of bytes. Limit the window of opportunity for an attacks as exploits tend to require lengthy input strings.

Data format
Ensure the structure of the data is consistent with what is expected. Phone should look like phone numbers, email addresses should look like email address, etc.

Regular expression examples with iteratively more restrictive security:
(These are just samples, not recommended for production use)

Phone number:

/* 555-555-5555 */
String phone = req.getParameter(”phone&rdquoWinking;

/* character-set OK */
String regex1 = “^([0-9\-]+)$”;

/* character-set with length restrictions */
String regex2 = “^([0-9\-]{12})$”;

/* with data format restrictions */
String regex3 = “^([0-9]{3})(\-)([0-9]{3})(\-)([0-9]{4})$”;
if (phone.matches(regex3)) {

/* data is ok, do stuff... */

}

Email Address:

/* user@somehostname.com */
String email = req.getParameter(”email&rdquoWinking;

/* character-set */
String regex1 = “^([0-9a-ZA-Z@\.\-]+)$”;

/* character-set with length restrictions */
String regex2 = “^([0-9a-ZA-Z@\.\-]{1,128})$”;

/* with data format restrictions */
String regex3 = “^([0-9a-ZA-Z\.\-]{1,64})(@)([0-9a-ZA-Z\.\-]{1,64})
(\.)([a-zA-Z]{2,3})$”;

if (email.matches(regex3)) {

/* data is ok, do stuff... */

}


Implementation
For a variety of reasons input validation has proved time consuming, prone to mistakes, and easy to forget about. The best approach is defining all the expected application data-types (account ID’s, email addresses, usernames, etc.), abstract them into reusable objects, and made easily available from inside the development framework. Input validation is all handled behind the scenes, no need to parse URLs, or remember to apply all the relevant business logic rules. The benefit to this approach is security becomes consistent and predictable. Plus developers are assisted is creating software at faster rate. Security and business goals are in alignment, which is exactly the place you want to be.

For example, let’s say you’re in an objected oriented environment working with a product purchase process:

URL:
http://website/purchase.cgi

Post Data:
product=100&quanitiy=4&cc=4444333322221111&exp=01/08


// Check if the user is properly logged-in and their account is active
if (user.isActive) {

// make sure the product is available in the requested quantity
if (req.product.isAvailable) {

// calculate the total purchase price
var total = req.product.price * req.qty;

// make sure the credit card is valid for the purchase total
if (req.creditcard.isValid(total)) {

// initiate the transaction
processOrder(user, req.product, req.qty, total, req.creditcard);

} else {

// inform user that their credit card was not accepted with a consistent message and also log the error to central database.
requestFailed(req.creditcard.error);

}

} else {

// inform user that items is not available with a consistent message and also log the error to central database.
requestFailed(req.product.error);

}

} else {

// inform user that they are not properly logged-in with a consistent message and also log the error to central database.
requestFailed(user.error);

}



Notice in the example code there is no input validation, direct database calls, or implicit strings. Everything is handled behind the scenes by the objects and methods. This makes mistakes less likely to occur and extremely helpful in preventing a wide variety of attacks including XSS, SQL Injection, and more.

Output Filtering
When you get right down to it, XSS happens on output when the unfiltered data hits the user (victim) web browser. Plus untrusted data may originate from a variety of locations, including your own database. As a developer you’re never really certain if someone else is doing their job and placing potentially malicious data in the DB. Better to play it safe when printing to screen.

Control the output encoding
Don’t let the web browser guess at a web pages content encoding. They’re known for making mistakes that could lead to strange XSS variants. There are two ways to set encoding, response header and meta tags. Its best to use both methods to make certain the browser gets it right.

Response Header:
Content-Type: text/html; charset=utf-8
or
Content-Type: text/html; charset=iso-8859-1

Meta Tags:
<* meta http-equiv="Content-Type" content="text/html; charset= utf-8">
or
<* meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

Removing HTML/JavaScript
Many of the languages and frameworks have their own methods to convert special characters in their equivalent HTML Entities, it’s probably best to use one of those. If not, here is Perl regex snippet that can be used or ported. I welcome anyone to comment on libraries they like, I’m not familiar and up to date with all of them. As with input validation its best to abstract this layer and make it second nature for developers.

$data =~ s/(<|>|\"|\'|\(|\)|Happy/'