Industry Observations-Technical Insight-Tools and Applications-Vulnerabilities-Web Application Security

URLs are content

Justifications for the federal government’s controversial mass surveillance programs have involved the distinction between the contents of communications and associated “meta-data” about those communications. Finding out that two people spoke on the phone requires less red tape than listening to the conversations themselves. While “meta-data” doesn’t sound especially ominous, analysts can use graph theory to draw surprisingly powerful inferences from it. A funny illustration of that can be found in Kieran Healy’s blog post, Using Metadata to find Paul Revere.

On November 10, the Third Circuit Court of Appeals made a ruling that web browsing histories are “content” under the Wiretap Act. This implies that the government will need a warrant before collecting such browsing histories. Wired summarized the point the court was making:

A visit to “,” for instance, might count as metadata, as Cato Institute senior fellow Julian Sanchez explains. But a visit to “” clearly reveals something about the visitor’s communications with WebMD, not just the fact of the visit. “It’s not a hard call,” says Sanchez. “The specific URL I visit at or or tells you very specifically what the meaning or purport of my communications are.”

Interestingly, this party accused of violating the Wiretap Act in this case wasn’t the federal government. It was Google. The court ruled that Google had collected content in the sense of the Wiretap Act, but that’s okay because you can’t eavesdrop on your own conversation. I’m not an attorney, but the legal technicalities were well-explained in the Washington Post.

The technical technicalities are also interesting.

Basically, a cookie is a secret between your browser and an individual web server. The secret is in the form of a key-value pair, like id=12345. Once a cookie is “set,” it will accompany every request the browser sends to the server that set the cookie. If the server makes sure that each browser it interacts with has a different cookie, it can distinguish individual visitors. That’s what it means to be “logged in” to a website: after proving your identity with a username and password, the cookie assigns you a “session cookie.” When you visit, you see your own profile because the server read your cookie, and your cookie was tied to your account when you logged in.

Cookies can be set in two ways. The browser might request something from a server (HTML, JavaScript, CSS, image, etc.). The server sends back the requested file, and the response contains “Set-Cookie” headers. Alternatively, JavaScript on the page might set cookies using document.cookie. That is, cookies can be set server-side or client-side.

A cookie is nothing more than a place for application developers to store short strings of data. These are some of the common security considerations with cookies:

  • Is inappropriate data being stored in cookies?
  • Can an attacker guess the values of other people’s cookies?
  • Are cookies being sent across unencrypted connections?
  • Should the cookies get a special “HttpOnly” flag that makes them JavaScript-inaccessible, to protect them from potential cross-site scripting attacks?

OWASP has a more detailed discussion of cookie security here.

When a user requests a web page and receives an HTML document, that document can instruct their browser to communicate with many different third parties. Should all of those third parties be able to track the user, possibly across multiple websites?

Enough people feel uncomfortable with third-party cookies that browsers include options for disabling them. The case before the Third Circuit Court of Appeals was about Google’s practices in 2012, which involved exploiting a browser bug to set cookies in Apple’s Safari browser, even when users had explicitly disabled third-party cookies. Consequently, Google was able to track individual browsers across multiple websites. At issue was whether the list of URLs the browser visited consisted of “content.” The court ruled that it did.

The technical details of what Google was doing are described here.

Data is often submitted to websites through HTML forms. It’s natural to assume that submitting a form is always intentional, but forms can also be submitted by JavaScript, or without user interaction. It’s easy to assume that, if a user is submitting a form to a server, they’ve “consented” to communication with that server. That assumption led to the bug that was exploited by Google.

Safari prevented third party servers from setting cookies, unless a form was submitted to the third party. Google supplied code that made browsers submit forms to them without user interaction. In response to those form submissions, tracking cookies were set. This circumvented the user’s efforts not to comply with third-party Set-Cookie headers. Incidentally, automatically submitting a form through JavaScript is also the way an attacker would carry out cross-site scripting or cross-site request forgery attacks.

To recap: Apple and Google had a technical arms race about tracking cookies. There was a lawsuit, and now we’re clear that the government needs a warrant to look at browser histories, because URL paths and query strings are very revealing.

The court suggested that there’s a distinction to be made between the domain and the rest of the URL, but that suggestion was not legally binding.

Tags: Google