Cryptohaze.com

by **Jazsun** » Fri Nov 19, 2010 8:15 pm

Work is interested in building a dedicated box to throw in our lab to be used exclusively for password cracking. (We do lots of pen tests)

I am in charge of spec-ing out a setup which will run around the $1000-$1500 range (hopefully). We can't blow it all on the video card since we need to build a new case, cpu and mobo to go with it. I am looking for what you guys consider is one of the best (priced around $500) price->value cards as far as # of cores and clock rate goes. This card would also need stable drivers for a linux environment (backtrack). From my limited research I have landed on the EVGA GeForce GTX 580(Fermi) 1.5GB 385-bit 512 cores.

Questions:

1) I have run Multiforcer on my windows box, but we would likely run it on a backtrack. How important is CPU utilization going to be? Should a quad core (i5) be the minimum?
2) How important are motherboard RAM requirements (non-video), my desktop didn't seem to have a problem with 4GB but I was not entirely sure how Multiforcer functioned and how important non-GPU RAM speeds would be for performance.
3) Should I be looking at cards which are SLI capable for future upgrades to Multiforcer which will be capable of dual cards?

Thanks a lot!

by **Sc00bz** » Sat Nov 20, 2010 1:02 am

Jazsun wrote:I am looking for what you guys consider is one of the best (priced around $500) price->value cards as far as # of cores and clock rate goes.

The best price/performance is GTX 460 (768 MiB) at $142.55 it's 57% the speed of a GTX 580. So buying two would be cheaper and faster. All the other GTX 400s besides GTX480 has a better price/performance than GTX 580. Also note that GTX 580's memory is 1002 MHz vs +3000 MHz for the GTX 400s (I don't know if this will impact performance in Multiforcer but it doesn't look good for GTX 580s). 768 MiB might not be enough for somethings since Multiforcer uses 512 MiB plus at least 16 bytes per hash and the OS uses some video memory.

Code: Select all: SORTED GPU | Memory | Memory | TDP | Price | Performance | Performance/Watt | Performance/Cost GTX 580 | 1536 MiB | 1002 MHz | 244 Watts | $519.99 | 791 | 3.2399 | 1.5203 GTX 480 | 1536 MiB | 3696 MHz | 250 Watts | $449.99 | 672 | 2.6899 | 1.4944 GTX 470 | 1280 MiB | 3348 MHz | 215 Watts | $249.99 | 544 | 2.5317 | 2.1774 GTX 460 | 768 MiB | 3600 MHz | 150 Watts | $142.55 | 454 | 3.0240 | 3.1820 GTX 460 | 1024 MiB | 3600 MHz | 160 Watts | $197.55 | 454 | 2.8350 | 2.2961 GTX 465 | 1024 MiB | 3206 MHz | 200 Watts | $197.55 | 428 | 2.1384 | 2.1649 SORTED GPU | Memory | Memory | TDP | Price | Performance | Performance/Watt | Performance/Cost GTX 580 | 1536 MiB | 1002 MHz | 244 Watts | $519.99 | 791 | 3.2399 | 1.5203 GTX 460 | 768 MiB | 3600 MHz | 150 Watts | $142.55 | 454 | 3.0240 | 3.1820 GTX 460 | 1024 MiB | 3600 MHz | 160 Watts | $197.55 | 454 | 2.8350 | 2.2961 GTX 480 | 1536 MiB | 3696 MHz | 250 Watts | $449.99 | 672 | 2.6899 | 1.4944 GTX 470 | 1280 MiB | 3348 MHz | 215 Watts | $249.99 | 544 | 2.5317 | 2.1774 GTX 465 | 1024 MiB | 3206 MHz | 200 Watts | $197.55 | 428 | 2.1384 | 2.1649 SORTED GPU | Memory | Memory | TDP | Price | Performance | Performance/Watt | Performance/Cost GTX 460 | 768 MiB | 3600 MHz | 150 Watts | $142.55 | 454 | 3.0240 | 3.1820 GTX 460 | 1024 MiB | 3600 MHz | 160 Watts | $197.55 | 454 | 2.8350 | 2.2961 GTX 470 | 1280 MiB | 3348 MHz | 215 Watts | $249.99 | 544 | 2.5317 | 2.1774 GTX 465 | 1024 MiB | 3206 MHz | 200 Watts | $197.55 | 428 | 2.1384 | 2.1649 GTX 580 | 1536 MiB | 1002 MHz | 244 Watts | $519.99 | 791 | 3.2399 | 1.5203 GTX 480 | 1536 MiB | 3696 MHz | 250 Watts | $449.99 | 672 | 2.6899 | 1.4944

Performance is number of cores times shader clock rate in GHz.

Jazsun wrote:1) I have run Multiforcer on my windows box, but we would likely run it on a backtrack. How important is CPU utilization going to be? Should a quad core (i5) be the minimum?

You really only need a single core, but it is nice to have one CPU core per GPU. You can always run multiple instances that cover different key spaces to use all your GPUs.

Jazsun wrote:2) How important are motherboard RAM requirements (non-video), my desktop didn't seem to have a problem with 4GB but I was not entirely sure how Multiforcer functioned and how important non-GPU RAM speeds would be for performance.

It's not that important because it only sends a little bit of data back and forth.

Jazsun wrote:3) Should I be looking at cards which are SLI capable for future upgrades to Multiforcer which will be capable of dual cards?

You don't need SLI. If you have it turned on you won't be able to use the other GPUs.

by **Bitweasil** » Sat Nov 20, 2010 2:31 pm

Don't get the 460s. They are a different core that is difficult to extract CUDA performance from.

Go with 2x470s if possible. They're around $250 and two of them will outrun a 580. I'm nearly done with multigpu support. Another few weeks, at least for Linux...

I'll write up more later, posting from my phone currently.

Host RAM doesn't matter for the multiforcer. If you're using the rainbow tables, more ram is better, 8gb would be good.

by **Bitweasil** » Sat Nov 20, 2010 9:55 pm

Alright, now I'm on a keyboard, so hardware thoughts.

For the multiforcer, all other things being equal (will expand on this later), performance scales linearly with (stream processor count * shader clock). This is where the "performance number" comes from in the code - and it's mostly accurate.

However, the "all things equal" applies to the 460s. They've got a different dispatch unit setup that reduces CUDA efficiency on non-vector code. So, basically, the 460s behave like a 256 stream processor card under *most* workloads.

Details here: http://www.anandtech.com/show/3809/nvid ... 200-king/2 I don't know how my code runs on a 460... benchmarks welcome. They may work well, but I'd not expect great performance off them. There's a possibility I could tune my code to perform better on the 460, and that may happen around the time I go OpenCL, but for now, unless benchmarks prove otherwise, I would consider the 460 to be a sub-optimum choice for running my code.

The 470s, on the other hand, are pretty sweet. $250/card, decent clocking, and generally are respectable units.

As for host CPU/RAM/etc: Optimum performance *right now* would happen with one CPU per GPU, running at 100% in a spin wait for the GPU, so it can fire off the new kernel ASAP.

Long term, this will change. With the multiGPU support, it's not hard at all to add support for CPU cracking. So, shortly, having a good CPU will be beneficial, and I'll tune the code to interrupt on complete instead of spin wait - I'll lose a bit of GPU performance, but more than gain it in the CPU thread performance.

However, GPUs are still the primary contributor. So, in a tradeoff between money spent on GPUs and money spent on CPUs, go with the faster GPUs.

Also, while I don't recommend this, overclocking well cooled GPUs will gain a good bit of performance... if you have random water blocks or something laying around.

As for RAM, it really doesn't matter. As long as it's a reasonable amount, you're fine.

For the rainbow table systems, you'll want a fast GPU, a lot of RAM, and fast disk. However, SSDs don't benefit my code as much as one might hope, because with the indexes in place, the table search is just a bunch of quick linear reads, which spindle disks do well. If you don't have indexes, yeah, SSDs are beneficial, but I did write my code with big spindle disk arrays in mind. I *suspect* (but haven't proven) that a RAID1 mirror with a few disks would be the fastest option, since it can read from multiple places at once. I'm also thinking of adding a prefetching thread which would help even more with this.

by **Jazsun** » Mon Nov 22, 2010 6:35 pm

Thanks guys I really appreciate all the great information. I think I am going to go w/ the 470's as recommended. We may start off with a single and go to two later on, pending budget. I also think we are going to go with a dual core Xeon as a CPU, starting off with 4GB of DDR3 with up to 8GB expandable. After browsing Nvidia's site it appears the 470's have an x64 linux driver, so I assume this setup Multiforcer + x64 Backtrack + GTX 470 should work well under backtrack?

I do have one more quick question, you stated dual card support is coming soon. Since this would be a lab server, it can be accessed from anyone internationally. What would happen if one user was remotely running a cracking session and another remote user also decided to start a cracking session. Is there anything inside of multiforcer which would prevent this or would it allow them both to run at half the performance? Would it be beneficial to block more then one current cracking session? I'm unsure what the best method to detect this would be.

Thanks again!

by **Bitweasil** » Mon Nov 22, 2010 8:14 pm

Jazsun wrote:Thanks guys I really appreciate all the great information. I think I am going to go w/ the 470's as recommended. We may start off with a single and go to two later on, pending budget. I also think we are going to go with a dual core Xeon as a CPU, starting off with 4GB of DDR3 with up to 8GB expandable. After browsing Nvidia's site it appears the 470's have an x64 linux driver, so I assume this setup Multiforcer + x64 Backtrack + GTX 470 should work well under backtrack?

I do have one more quick question, you stated dual card support is coming soon. Since this would be a lab server, it can be accessed from anyone internationally. What would happen if one user was remotely running a cracking session and another remote user also decided to start a cracking session. Is there anything inside of multiforcer which would prevent this or would it allow them both to run at half the performance? Would it be beneficial to block more then one current cracking session? I'm unsure what the best method to detect this would be.

Thanks again!

That sounds like a good option. I at least know Linux x64 + GTX470 works wonderfully, that's my primary dev environment.

As for multiple people using the cards, they sort of will share tasks, but it's not very good. I expect the tools would run at about half speed each, but this is something I'm leaving up to the end users to sort out. It's pretty easy to tell if a multiforcer is running with top or ps.

by **hackajar** » Tue Nov 23, 2010 8:34 am

CUDA Multiforcer runs very well under Ubuntu 10.10 amd_64 KERNAL. This is much better than the Backtrack route if you plan to make this a dedicated GPU cracking system. There is no need for all the extra kernal mods backtrack tosses in, and my numbers show _slight_ performance uptick. Not sure if it is the updated C++ libs, slimmer kernal or all of the above.

In the end, with the fast turnaround of tweaks to the CUDA MF app right now, coupled with latest glib c 64bit improvements, you would best be served with 10.04 (for long term support) or cutting edge 10.10.

-hackajar

by **Jazsun** » Mon Nov 29, 2010 4:20 pm

Thanks again. Hackajar, orignally I was thinking backtrack since we use a lot of other tools already packaged within backtrack but I too think an Ubuntu server route is probably the better option for speed and reliability.

One last question I have concerns what kinds of speeds we can expect from one GTX470. I need some sort of proven benchmarks to help sell this implementation budget over our existing cracking solution (obviously system specs play a role, lets assume multi-core cpu) we currently use a multi-core box with John the Ripper. Below are some benchmarks from one of are core2duo setups using John, anyone happen to have some numbers for a one GPU setup (I assume two would double) with perhaps a screen shot of the speeds your seeing?

From my understanding the step rate is the most comparable rate to John's c/s? And the search rate = the step rate * # of hashes in the list? For example, on the homepage you list a step rate of 390M/sec with a list of 10 hashes. With only one Hash that would be about 38M/sec (or probably higher due to new prioritization)?

Thanks a lot!

Traditional DES [64/64 BS MMX]
Many salts: 994021 c/s
Only one salt: 928655 c/s

BSDI DES (x725) [64/64 BS MMX]
Many salts: 32307 c/s
Only one salt: 32000 c/s

FreeBSD MD5 [32/32]
Raw: 6423 c/s

OpenBSD Blowfish (x32) [32/32]
Raw: 394 c/s

Kerberos AFS DES [48/64 4K MMX]
Short: 311852 c/s
Long: 832317 c/s

NT LM DES [64/64 BS MMX]
Raw: 9496K c/s

by **Bitweasil** » Mon Nov 29, 2010 5:22 pm

Jazsun wrote:One last question I have concerns what kinds of speeds we can expect from one GTX470. I need some sort of proven benchmarks to help sell this implementation budget over our existing cracking solution (obviously system specs play a role, lets assume multi-core cpu) we currently use a multi-core box with John the Ripper. Below are some benchmarks from one of are core2duo setups using John, anyone happen to have some numbers for a one GPU setup (I assume two would double) with perhaps a screen shot of the speeds your seeing?

From my understanding the step rate is the most comparable rate to John's c/s? And the search rate = the step rate * # of hashes in the list? For example, on the homepage you list a step rate of 390M/sec with a list of 10 hashes. With only one Hash that would be about 38M/sec (or probably higher due to new prioritization)?

On measured speeds:
There are two ways to measure speeds: Passwords tested per second, and passwords checked per second (excuse the confusing wording, I'll expand). Passwords *tested* per second counts the actual number of unique passwords being checked. This is what I report. For my 390M/sec rate, this is how many unique passwords I'm testing per second. If there are 10 passwords in the list, this is 3900M password checks per second (10 checks per password). So, for my tools, for the *most* part, multiply the displayed rate by the number of hashes to get the effective rate. I stopped displaying that with the 0.80 version since it was getting stupid for large hash lists.

... this is only valid for unsalted hashes. Salted hashes, you typically only get the measured checks per second, since it's unique per password.

Jazsun wrote:Traditional DES [64/64 BS MMX]
Many salts: 994021 c/s
Only one salt: 928655 c/s

BSDI DES (x725) [64/64 BS MMX]
Many salts: 32307 c/s
Only one salt: 32000 c/s

FreeBSD MD5 [32/32]
Raw: 6423 c/s

OpenBSD Blowfish (x32) [32/32]
Raw: 394 c/s

Kerberos AFS DES [48/64 4K MMX]
Short: 311852 c/s
Long: 832317 c/s

NT LM DES [64/64 BS MMX]
Raw: 9496K c/s

So, here's the trick:

Not all of those are implemented on GPUs. Actually, very, very few are implemented on GPUs.

As such, it's rather hard to get benchmark data.

What types of hashes are you specifically looking to attack? I'm certainly open to trades of donations or hardware for specifically needed algorithms. ;-)

Give me a few weeks and there's a good chance of DES/LM, though.

by **Jazsun** » Wed Dec 01, 2010 9:13 pm

Thanks bitweasel. I think as of now, all we really need is NTLM, LM (maybe), and MD5 (BSD) for most of the things we do. The stats I posted above are some test speeds a coworker provided me from John the Ripper.

Cryptohaze.com

Building a dedicated Multiforcer box

Building a dedicated Multiforcer box

Re: Building a dedicated Multiforcer box

Re: Building a dedicated Multiforcer box

Re: Building a dedicated Multiforcer box

Re: Building a dedicated Multiforcer box

Re: Building a dedicated Multiforcer box

Re: Building a dedicated Multiforcer box

Re: Building a dedicated Multiforcer box

Re: Building a dedicated Multiforcer box

Re: Building a dedicated Multiforcer box

Who is online