Building Bowser – A password cracking story

At Fox-IT we perform a lot of penetration tests. Invariably we encounter hashed versions of passwords that need to be tested for strength. We suspected that with a relatively small investment most passwords could be cracked, regardless of their complexity. It turns out this is true for any password of 8 characters or less. 

This is a story about passwords. It’s intended for hardware lovers and IT security managers alike. The purpose is to change the views on what requirements a password has to fulfill to be considered “safe”. Basically, this story should convince you that conforming to the default Microsoft Password Complexity Requirements (http://technet.microsoft.com/en-us/library/cc264456.aspx) barely bothers hackers trying to crack your password, and explains why everyone should be switching to passphrases. Don’t bother with taking the first letter of all words in a sentence, just use the whole sentence. The longer the better. Cracking passwords in a short timeframe is something we were able to do on a fairly limited budget, but will scale up very nicely with a larger one. Imagine what someone with virtually unlimited funds (intelligence agencies, countries) is able to do to your precious password hashes.

Over the years we’ve done our fair share of penetration tests, both on external services and internal networks. This includes web applications, Windows- and Unix networks, mobile applications, etc. During these tests it’s not uncommon to come across hashed passwords in one form or another. While some protocols (like NTLM) are vulnerable to a technique called “passing-the-hash” (essentially using the hash itself as a password), sometimes you just want its plaintext counterpart. For the sake of consistency, this story will focus on NTLM (Windows) passwords.

One of the ways of obtaining the plaintext password is by “cracking” the hash. You can go about this using different techniques. The most basic of which being a “brute-force” attack, simply trying all combinations of letters, numbers, special characters, etc. and see if the resulting hash matches the hash you are trying to crack.

Another technique is using wordlists. These are huge lists containing hundreds, thousands and even millions of words depending on the size of the list. By simply hashing each word on the list one by one, and seeing if the resulting hash matches your target hash you can try and “crack” it this way.

Modern tools allow you to use these methods in smarter ways, by combining wordlists with brute-force techniques, or applying rules to words on the list (adding capitals, appending numbers or special characters, etc.). Still, this can be a time consuming project when using regular desktop hardware.

Enter the video card or GPU (Graphics Processing Unit). While designed for running games, these cards have power in all the right places for use in password cracking. I could go on explaining about the how, why and other technical mumbo-jumbo, but sometimes pictures indeed are better than words. This screenshot was taken from the password cracking tool hashcat (http://www.hashcat.net) and shows the speed of cracking an NTLM hash while running on two cores of an Intel i7 3840QM CPU:

image001

Almost 34 million words per second (34MH/s). Sounds like a lot right? Well, it’s not, really. At this rate, trying all 8 character passwords (uppercase letters, lowercase letters, decimals and special characters) takes approximately four to five months.

Let’s try cracking that same password using oclHashcat, hashcat’s big brother, designed for use with GPU’s. This screenshot was taken while running oclHashcat on our old cracking machine with a single AMD Radeon 7970 GPU:

image003

Wow, a single GPU can do almost 18000 MH/s. Compared to the 34MH/s the CPU can do, that’s around 53000% more per second. At this rate, trying all 8 character passwords will take somewhere around five days. We are impatient people though, so we wanted more speed.
This is when we decided to build a new dedicated server, designed for one thing, and one thing only: cracking password hashes. This new server, codename “Bowser”, would be a 4U server chassis from Tyan (FT72B7015), have two Intel Xeon CPU’s, two 1TB SSD’s, 4x8GB of RAM and most important, eight AMD Radeon R9 290X GPU’s. Time to place some orders!
The first things to arrive were the GPU’s. Just look at these beauties:

The next day, the CPU’s came in:

image012

The rest however, took a while. The Tyan chassis wasn’t in stock and had to be back-ordered from Taiwan. Almost two weeks went by, until:

Finally! Time to start building. Instead of boring you with a wall of text, I’ll just let the pictures do the talking:

Quite something right? Now time for the hard part. Making the system secure enough for customer data. The OS we’ve chosen to install was Debian 7 at first, but it looked like the 3.2 kernel shipped with Debian 7 did not very much like our hardware. Random kernel panics and freezes everywhere. So, on to Ubuntu Server 12.04.3.

The first 1024MB of the first SSD was partitioned as unencrypted /boot partition. The first 1024MB of the second SSD was partitioned as unused space. This was done because the rest of both SSD’s were configured to run as a RAID0 volume. Why? It’s not business critical data, and large wordlists on slow disks really impact the cracking performance.

On the newly created RAID0 volume, a LUKS encrypted partition was made. Using LVM, a volume group was created on that partition. This volume group contained two logical volumes. The root file system (/) and some swap space.

After OS installation was done, it was time for some benchmarks! Firstly, we made sure that the BIOS switch on all the cards was set to “Über mode” (http://www.hardocp.com/article/2013/10/23/amd_radeon_r9_290x_video_card_review/2#.Uukqp3VdXZg). Then we installed a tool called od6config (http://epixoip.github.io/od6config/) to fix the fan speed of the cards at 100%, the GPU clock at 1000MHz and the Memory clock at 1250Mhz. Because a running X session is necessary to use the GPU’s, we also installed Ubuntu Desktop and the AMD 13.12 driver package.

The firewall was configured to only allow communication needed for remote management. All other incoming and outbound traffic was disallowed.
Finally, benchmark time! Feast your eyes on this:

image059

145GH/s per second! That is a performance increase of around 800% compared to the single card. This means that we are able to retrieve all 8 character NTLM hashed passwords (uppercase, lowercase, decimal and special characters) within 24 hours! All the while, due to the insane cooling capabilities of the Tyan chassis, temperatures keep well within acceptable ranges:

image061

Note: adapter 1 has a higher temperature than the rest, because this adapter is responsible for showing the desktop.
After some tuning, we managed to get a 100 MHz core overclock on all the cards. This resulted in a 20Gh/s increase in performance with a total of almost 190GH/s (while cracking a single NTLM hash):

image062

That’s the equivalent of 9 of these cards running at stock speeds. The temperatures after about 19 hours seem almost unaffected. The card responsible for showing the desktop peaked at 76C, while the other cards average at about 65C.

It does take a considerable amount of power to run this machine though:

image064 image066

2500 Watts, 12 Amps. Nice.

I can hear you thinking “Why not just use rainbow tables and get it over with”. While it’s true that rainbow tables can be an effective way of cracking passwords, they are not a magical solution. First off, the tables themselves require a lot of storage space (for example: around 1TB for alpha-numeric passwords, length one to eight). Secondly they take a very long time to generate and every type of hash needs a new table. Lastly, with a large list of hashes to crack (say, a dump of all active directory passwords) the pre-calculation when using rainbow tables can take much longer than just using wordlists or brute-force attacks. In most cases, it is faster to just brute-force a hash then to use rainbow tables on it. Besides, the combination of wordlists, rules and brute-force can usually crack more than 80% of the hashes already. If you really need a specific hash cracked, you can always resort to rainbow tables after brute forcing the bulk of the hashes.

Well, that’s it. Feel free to leave any questions you might have in the comments, and happy cracking!

Donny Maasland, Pentester at Fox-IT

17 thoughts on “Building Bowser – A password cracking story

  1. Zie trouwens dat je bij Intertoys hebt gewerkt, Donny. Dat bevalt me. Misschien kunnen we een keertje ‘GoldenEye” spelen op de Nintendo 64. Much more fun than a Fox-IT rig 😉

  2. Any way, really cool article and hope to see more of this. What I really meant by the way, is if in War Time, and there was true need for speed, would anyone choose for the unencrypted version of the disks, or not? That’s why I thought it was relevant.

  3. @DMaasland: Snip:”The first 1024MB of the first SSD was partitioned as unencrypted /boot partition. The first 1024MB of the second SSD was partitioned as unused space. This was done because the rest of both SSD’s were configured to run as a RAID0 volume. Why? It’s not business critical data, and large wordlists on slow disks really impact the cracking performance.”

    This bottleneck I mean as mentioned in the above snips fro the article. Just as Oscar asked.

    First you question the existence of a bottleneck, and when Oscar asks you explain. Huh? I guess only the speed of reading the file into the GPU memory, is relevant if the file is very large. Once in memory of the GPU, I would consider that speed to be the bottleneck. I just would have loved to see the stats on an uncrypted vs encrypted setup, to measure the loss you get by this specific disk-encryption. I run a GTX 560 448 Cores and it overclocks very well. But if you set the fans on 100% then with the Fox rig I surely understand why they remote into it. It’s noisy 😉

  4. True, that is why we usually use a combination of wordlists, rules, brute-force and loopback of previously cracked passwords (with rules).

  5. More I/O would probably help with wordlist only performance, but because you always need to transport words from files to the GPU’s versus work on the GPU’s only. That would be the real bottleneck in regard to using wordlists. So a combination of the SSD’s, DMA and the kernel I guess. PCIe is barely a bottleneck, as PCIe 1x provides enough bandwidth for full-speed cracking.
    Bruteforcing simply is much faster. Think kH/s for wordlists vs MH/s for bruteforce.

  6. Thats not really what Oscar is asking. He’s asking if you can lookup what is the slowest part of the machine. If those SSD’s delivered more words/s: would it be faster? If the words got faster into the GPU memory: would it be faster? Or is the oclHashcat algorithm the bottleneck?

  7. Nice Article and great setup. The little thing I want to add here is that it is not only what kind of cracking rig you have but how you are using it. In my opinion, the best and effective way to crack passwords is by using effective wordlists plus rules. I am not claiming something for no-one. I just want to express my point of view
    Thanks
    m3g9tr0n

  8. Well, that would decrease the number of attempts per second by one million, but that also means a legitimate login attempt will be slowed down by a factor of a million. Luckily for us, Windows still uses good old NTLM :).

  9. I’m not quite sure what bottleneck you are referring to, but the SSD’s do around 600MB/s read / write. Keep in mind though that the SSD’s are encrypted, resulting in a lower performance.

  10. So far, the only bottleneck seems to be budget. Especially with the new oclHashcat (with distributed cracking support) coming up. It would be trivial to build another one of these boxes and have them share the load.

  11. I’m not quite sure what bottleneck you are referring to, but the SSD’s do around 600MB/s read / write. Keep in mind though that the SSD’s are encrypted, resulting in a lower performance.

  12. What do you think the new bottleneck is? The speed of data from disks to GPUs? The CPUs? The memory bus or DMA? PCIe? The kernel? And did it meet your expectations?

  13. I didn’t see any stats of the SSD throughput in this article – my box does 1 GB with 2 Samsung 830 in RAID0. Which is slow. Did you measure this dear Fox-IT, as you mentioned the bottleneck issue?

Leave a Reply to Marnix Petrarca

Your email address will not be published. Required fields are marked *