Much of the internet operates on HTTP, Hyper Text Transfer Protocol. With HTTP, the user sends a request and the server replies with its response. These requests are like the pneumatic tubes at the bank — a delivery system for the ultimate content. A user clicks a link; a request is sent to the server; the server replies with a response; the response has the content; the content is displayed for the user.
Different kinds of requests (methods) exist for different types of actions, though some types of actions can be requested in more than one way (using more than one method). Here are some of the more common methods:
- POST requests write to the server.
- GET requests read from the server.
- HEAD is similar to GET, but retrieves only headers (headers contain meta-information, while rest of the content is in the response body.)
- PUT requests allow for the creation and replacement of resources on the server.
- DELETE requests delete resources.
Browsers and Crawlers
Browsers and most web crawlers (search engine crawlers or WhiteHat’s scanner or other production safe crawlers) treat method types differently. Production safe crawlers will send some requests and refrain from sending others based on idempotency (see next section) and safety. Browsers will also treat the methods differently; for instance, browsers will cache or store some methods in the history, but not others.
Idempotency and Safety
Idempotency and safety are important attributes of HTTP methods. An idempotent request can be called repeatedly with the same results as if it only had been executed once. If a user clicks a thumbnail of a cat picture and every click of the picture returns the same big cat picture, that HTTP request is idempotent. Non-idempotent requests can change each time they are called. So if a user clicks to post a comment, and each click produces a new comment, that is a non-idempotent request.
Safe requests are requests that don’t alter a resource; non-safe requests have the ability to change a resource. For example, a user posting a comment is using a non-safe request, because the user is changing some resource on the web page; however, the user clicking the cat thumbnail is a safe request, because clicking the cat picture does not change the resource on the server.
Production safe crawlers consider certain methods as always safe and idempotent, e.g. GET requests. Consequently, crawlers will send GET requests arbitrarily without worrying about the effect of repeated requests or that the request might change the resource. However, safe crawlers will recognize other methods, e.g. POST requests, as non-idempotent and unsafe. So, good web crawlers won’t send POST requests.
Why This Matters
While crawlers deem certain methods safe or unsafe, a specific request is not safe or idempotent just because it’s a certain request method. For example, GET requests should always be both idempotent and safe, while POST requests are not required to be either safe or idempotent. It is possible, however, for an unsafe, non-idempotent request to be sent as a GET request. A web site that uses a GET request where a POST request should be required can result in problems. For instance:
- When an unsafe, non-idempotent request is sent as a GET request, crawlers will not recognize the request as dangerous and may call the method repeatedly. If a web site’s “Contact Us” functionality uses GET requests, a web crawler could inadvertently end up spamming the server or someone’s email. If the functionality is accessed by POST requests, the web crawler would recognize the non-idempotent nature of POST requests and avoid it.
- When an unsafe or non-idempotent GET request is used to transmit sensitive data, that data will be stored in the browser’s history as part of the GET request history. On a public computer, a malicious user could steal a password or credit card information merely by looking at the history if that data is sent via GET. The body of a POST request will not be stored in the browser history, and consequently, the sensitive information stays hidden.
It comes down to using the right HTTP method for the right job. If you don’t want a web crawler arbitrarily executing the request or you don’t want the body of the request stored in the browser history, use a POST request. But if the request is harmless no matter how often it’s sent, and does not contain sensitive data, a GET request will work just fine.