Cryptohaze.com GPU Rainbow Cracker

Overview

The Cryptohaze GPU Rainbow Cracker is a fully GPU and OpenCL accelerated set of rainbow table tools. Unlike the existing CUDA accelerated rainbow table tools that simply accelerate the stock RainbowCrack tables, the Cryptohaze rainbow tables are a totally ground up implementation. Of major significance is the fact that the reduction function is now something very well suited to GPUs and high speed CPU implementations. While the RainbowCrack reduction function was very good, it was not very fast, and did not scale well onto video cards. The new reduction function is significantly faster and allows for some truly impressive speeds.
Additionally, throughout development, feedback from users has been taken, and the Cryptohaze GPU Rainbow Cracker supports many often-requested features, including table headers and the ability to specify a large number of tables to search (until a result is found).
The Cryptohaze tools support working on lists of hashes, and also support using as many GPUs as you have in the system to speed cracking.

Downloads

Download the latest versions from http://sourceforge.net/projects/cryptohaze/files/Cryptohaze-Combined/
The Rainbow Table tools require a 64-bit OS. This is due to memory mapping the table files into the process space for rapid searching. There's no way around this unless you use tiny files. Sorry.

New Features

The Cryptohaze GPU Rainbow Cracker has a number of brand new features that make it even more powerful and easy to use, including:

Full GPU acceleration for table generation and cracking, using nVidia CUDA.

ALL the metadata including the character set is now stored in each table file - they can be named anything you want and still work!

The cracking tool can be pointed to a directory or list of files, and will try each one in order until a result is found or all the tables are searched.

The package includes tools to merge multiple tables together - you can use as many systems and GPUs as you want to generate tables, then merge/perfect the results into one large table.

A CPU based table verification tool allows you to check tables for correctness and corruption.

Cracking no longer requires storing tables in RAM - if you can fit the table on your disk, you can use it!

Advanced table indexing to improve cracking speeds - in some cases, hashes can be cracked in under 2 minutes!

Cracking code is optimized for performance on large RAID arrays - cracking performance does not depend on table size!

Performance

It's fast! Table generation on a GTX295 core (so half of a GTX295) for MD5 proceeds at around 430M links/sec.
Cracking speaks for itsself (these times are on something like the Optimum Platform, just with a GTX295):

Password (between the ' marks): 'Pa5$w0d'
Hex: 0x50613524773064

real    2m12.367s
user    1m23.420s
sys     0m16.510s

--------

Password (between the ' marks): 'K#n&r4Z'
Hex: 0x4B236E2672345A

real    1m51.962s
user    1m4.740s
sys     0m15.320s

GPU Performance
This is a comparison of GPU performance for the GPU accelerated parts in links per second. Also noted are the times to calculate all the candidate hashes for length 100k and length 200k chains ((chain length * chain length / 2) / (stepping rate)). Other data is welcome!

GPU	SPs	Frequency	links/s	100k regen	200k regen
nVidia GTX295 (single core)	240	1.24 GHz	436M	11.5s	45.9s
nVidia GTX260	216	1.24 GHz	394M	12.7s	50.7s
nVidia 8800 GTX OC	128	1.46 GHz	272M	18.4s	73.5s

Table search rate
This is a comparison of table search rates on different systems. Performance is heavily tied to linear read performance. Times are with a empty filesystem cache.

Hard drive configuration	Read rate (hdparm -t)	Indexed rate	100k indexed	200k indexed
1x Maxtor 6B250S0 7200 RPM	54MB/s	640 h/s	156s	312s
6x Seagate ST31500341AS 7200 RPM (RAID6)	390MB/s	4375 h/s	23s	46s

Supported Platforms

The Cryptohaze GPU Rainbow Cracker supports Windows, Linux, and Mac OS X. A 64-bit OS is required - playing with 300GB files on a 32-bit OS makes Bitweasil cry. Supported environments are anything that supports CUDA or OpenCL - so pretty much anything. While you can use the OpenCL tools with nVidia cards, it is recommended to use the CUDA tools, as they are faster and better supported on nVidia. You will need an OpenCL runtime for any of the OpenCL tools. Obviously fast GPUs are recommended.

Optimum Platform

The ideal system to run this on would be something very, very powerful. The cracker will use all the GPUs in your system (so load up with 580s or 6970s if performance is critical), but the table searching is a significant portion of the cracking time. SSDs are perfect, except for the size limitations - the tables get very, very big (100s of GB or larger). If SSDs aren't large enough, the next best option is a big RAID array with high linear read speeds. The cracking code is tuned for spindle disk RAID arrays. I suspect a multi-disk mirrored array (4-5 disks mirroring each other) would be very fast as well, but have not tried it. Get a lot of RAM too - the more RAM you have, the bigger index files you can have, which will improve cracking speed. The increased RAM performance of an i5/i7 board is useful.

Manual

File formats

The only real file you need to worry about is the character set when creating tables (if you're not using pre-generated tables).
The character set file is very simple: Just the character set in a file, followed by a newline:

Single charset file (-c parameter)
abcdefghijklmnopqrstuvwzyx0123456789

While the rainbow tables have their own format, it is supported by all the tools, so you don't need to worry about it!

Command line parameters

Table Generation
-v / --verbose (optional) Verbose output.
-h / --hashtype {MD4,MD5,NTLM,SHA1} (required) This specifies the hash type for this table.
-c / --charsetfile <filename> (required) This specifies the charset file.
-l / --passwordlen (required) The length of passwords to generate. Note that these tables only work with a single password length per table.
-i / --tableindex (required) The table index. This is used to create different "versions" of tables to improve cracking probabilities.
--chainlength (required) The length of each chain. Ranges of 100000 to 400000 are reasonable for this tool.
--numchains (required) The number of chains to generate for each table.
--numtables (optional) How many tables to generate. All the generated tables will be compatible and can be merged together with the merge tool. Default 1.
-s / --seed (optional) The seed to initialize the random number generator with. This should not be used under normal conditions (default is a random seed).
-d / --device (optional) If you have multiple CUDA-enabled video cards, this allows you to select which card to use. The current card is printed on program execution. Default is 0 (the first CUDA GPU in the system).
-m / --ms (optional) This specifies the target kernel time, in milliseconds (1/1000th of a second). When using a system with a GUI, lower times will allow better display response, but will lower performance. See below for more details. The default is 50ms, which should not interfere with general system use.
-b / --blocks (optional) Force a certain block count (default 128).
-t / --threads (optional) Force a certain thread count (default 64).
The outputs from table generation will be put in the "parts/" directory.

Table Merging
Note: The table merge code does not require a GPU. It is entirely CPU based.
-o / --outputdir (optional) Specify the output directory. Default is "./output/"
--buildfull (optional) Build a full table (merge all input tables with no perfecting)
--buildperfect (optional) Build a perfect table (only represent each end hash once)
<files> (required) A list of files to merge (example, "parts/*")

Table Indexing
Note: The table indexing code does not require a GPU. It is entirely CPU based.
-b / --bits (required) The number of bits of index to build.
<files> (required) A list of files to index. Indexes will be written to <filename>.idx

Rainbow Cracking
-v / --verbose (optional) Verbose output.
-h / --hashtype {MD4,MD5,NTLM,SHA1} (required) This specifies the hash type for this table.
-s / --hashstring (required) The hash to crack.
<files> (required) A list of table files to test against
-d / --device (optional) If you have multiple CUDA-enabled video cards, this allows you to select which card to use. The current card is printed on program execution. Default is 0 (the first CUDA GPU in the system).
-m / --ms (optional) This specifies the target kernel time, in milliseconds (1/1000th of a second). When using a system with a GUI, lower times will allow better display response, but will lower performance. See below for more details. The default is 50ms, which should not interfere with general system use.
-b / --blocks (optional) Force a certain block count (default 128).
-t / --threads (optional) Force a certain thread count (default 64).

FAQ/Troubleshooting

What does the -m parameter do?

The -m parameter sets the target execution time, in ms. When running CUDA code on a system with an active display, the display cannot be updated while a kernel is running. This requires the work to be broken into small chunks, such that the display can update. However, the smaller the work unit, the less efficient the kernel is (as seen in the performance tables). This allows tuning of the kernel execution time to take this into account. A target time of 10-15ms will leave your display effectively "normal" and allow typical desktop activities, watching movies, and possibly light gaming. However, this will be very slow and may not work at all on Vista or Windows 7. Longer target execution times of 100ms or greater will dramatically affect the screen update, but will provide better performance. On a system that is not being used, or a headless system (or on a GPU that does not have any monitors attached), target execution times of 500ms or greater will allow the maximum performance.

It doesn't run!

Yes... this is a common problem with bleeding edge software. This is also the reason for the support forum. First steps would be to ensure you're running the most recent nVidia provided driver, and then to see if you can run /any/ CUDA enabled programs. CUDA requires an nVidia Geforce 8000 series or above (or some of the Quadros). Additionally, if you are using a large amount of video RAM, there may not be sufficient memory remaining for the kernels to launch. Debugging for this will be added in future versions, but if you are getting vague errors, try rebooting and running the code with no other applications open. If this doesn't help, post details of your error and configuration in the forum and I'll try to help.

Why is it slower on Vista/Win7 than on Linux?

Classy as it is to say "Blame Microsoft," it's true. They changed the driver model for video cards, and as a result, kernel launches take longer, and the GUI does not let the GPU do whatever it wants. Try passing in a larger -m value and see if it helps.

Forget desktop response - how do I make this run as fast as possible?

The best option is a headless Linux server with a very high end GPU in it. If that's not an option, try passing --threads 512 --blocks 512 -m 500 to it. You may see improvements with an even longer kernel execution time - it depends on the system. Your display will be nearly unusable when this is running. If you have a low end GPU and this does not launch, try reducing the thread count to 256.

It doesn't run on my ATI card! When will you support ATI?

I would like to support ATI, but I don't have any ATI cards, and don't have the spare funds to go out and buy a few top of the line ATI cards to test on. Perhaps you should consider donating to my GPU fund!

I have another question...

The forum is linked in the nav on the left.

~Bitweasil