Rainbow Tables
| Contents | 
Overview
Rainbow tables are a concept developed by Philippe Oechslin as a time-memory tradeoff attack on passwords. They involve the generation of large, pre-computed sets of data that speed password cracking by a large amount. The Cryptohaze implementation is a GPU accelerated version with a modified reduction function that is faster on GPUs (and CPUs). This leads to a very significant improvement in generation and cracking speed over other implementations. The Cryptohaze tools are also designed to work with the full 128 bits of hash instead of only 64 bits. This leads to fewer chain merges and the ability to support password spaces over 64 bits.
More information is available on the Wikipedia page: http://en.wikipedia.org/wiki/Rainbow_table
Concepts
There are several concepts that are required to fully understand rainbow tables (or at least understand them well enough to understand the process). I'll try to explain them here.
Hash function
The hash function is simply your target hash function. It can be MD5, NTLM, SHA1, LM, some funky algorithm I've never heard of, etc. All that is required of it is that it be usable as a hash function. Arbitrary data (or password) goes in, random data comes out. Awesome. Simple!
Reduction function
The reduction function is a function that takes as inputs a hash, the current position in a chain, the table index, and the key space. It deterministically converts these inputs into a password. Note that it does NOT make the same password as was used to create the hash (or this function would solve password cracking). The chain position index is used to help reduce chain merges, and the index is used to help prevent merges between tables.
Chains
A chain is simply a series of passwords & hashes, generated by the hash function and the reduction function feeding each other. The chain starts with a password, which is then hashed to form a hash of that password. That hash is fed into the reduction function along with the table index and the position in the chain, which generates another password. This password is hashed, and the process continues for the desired chain length. At the end of the chain, the final hash is stored. So, a chain consists of the starting password & the end hash, stored together in a unit.
Tables
A table is nothing more than a collection of chains in sorted order (sorted by hash). This allows rapid searching of the table during the cracking process. The Cryptohaze tables also include an 8192 byte header that contains the table information - the table filename can be changed at will without any impact on the table behavior (unlike some other tools, which were dependent on their filename).
Perfect Table
A perfect table is one in which there is only one instance of a certain chain end hash value present in the table. Given enough chains, there will eventually be merging chains (chains that start with different passwords but end with the same hash). This reduces the space efficiency of the table as there are large numbers of duplicate passwords stored in the merging chains (after the merge point). By removing these, it increases the amount of work required to generate a table, but reduces the space needed to store a table of a given efficiency for cracking passwords. This does NOT mean there are not duplicate hashes in the table! As long as the duplicates occur at different positions in the chain, the chains will not merge.
Table Index
A table index is nothing more than a value that is stored with a table, used as an input to the reduction function. This means that in two different tables (with different indexes), a given hash at the same position in the chain will reduce to a different password. Annoyingly, this term is also used in the Cryptohaze tools to refer to table indexes used for speeding table searching. They are two different concepts.
Example Table Chains
An examples will help illustrate these concepts.
This demonstrates the generation of a table with 4 chains of length 3 (3 hash operations). The 4 chains start with random passwords, and are hashed into their corresponding hashes. They are then run through the reduction function for position 0 and another series of passwords is generated. This series of passwords is hashed. Note that in both Password 0 & Password 1, "myPass" hashes to the same value. This is correct, as the hash of a given string should always be the same. However, in the next reduction function, it is run for position 1, and the 0xAA12 hash turns into something different. Also, in Password 2, we see that two different hashes have been reduced to the same password. At this point, the chains are merged, and will continue to the same end value. Note at the bottom that there are two chains with the same hash value. In a perfect table, only one of these would be present (either one is acceptable).
However, note that if Chain 2 is removed to make the table perfect, the password/hash combination "myPass/0xAA12" is present twice in the table. A perfect table may contain duplicate hashes, but as long as they occur at different locations, the chains will not merge.
Cracking with Rainbow Tables
Now that the tables are generated, the goal is to go about actually cracking passwords with them. For this example, we want to crack the hash 0x1135 - which, looking at the generated table, we can see corresponds to password "J2nK4z"
There are three stages to cracking: Candidate hash generation, table search, and chain regeneration (or false alarm checking, though the two are combined with the Cryptohaze tools).
Candidate hash generation creates a number of hashes that may be present in the target table file. Once generated, the table search stage involves searching the table for these hashes (present in a chain), and returning all the chains that match. Finally, chain regeneration involves running all these found chains from their start point, looking for the target hash. If it is found, the corresponding password is reported.
Candidate Hash Generation
The process for candidate hash generation, as the reduction function changes with every chain position, involves asking, "If the target hash was at position 0, what would the final hash be? If the target hash was at position 1, what would the final hash be? If the target hash was at position 2, what would the final hash be?" for all possible positions.
For each possible position, we generate the chain starting with the target hash, and determine the final hash for this. These final hashes are the candidate hashes. Note that the target hash is present in the list of candidate hashes - this is important when considering the WebTables approach, although I have implemented protections against this.
The candidate hashes must be generated for every separate table index (so for every complete table being used for cracking).
The complexity of generating candidate hashes is O(chain length ^ 2) - it requires (chain length ^ 2) / 2 hash/reduce steps to generate all the candidate hashes. Doubling the chain length quadruples the candidate hash generation time.
At the end of the candidate hash generation, there exists a set of candidate hashes equal in number to the chain length of the table.
Table Search
Table searching involves looking through the table file for each of the candidate hashes. There are ways that this can be sped up, but it typically involves, at minimum, a number of disk seeks equal to the number of candidate hashes. On a typical desktop hard drive with a ~5ms seek time, this means that 200 searches per second can happen. For a table of length 200 000, this means 1000 seconds is required to search the table, or just under 17 minutes. Faster drives help a lot here. If a chain is found, the starting password of the chain is stored for regeneration.
In this case, the chain found will be the one with the final hash 0xDEAD. However, there were two chains generated with this endpoint. If the one containing 0x1135 is present, the password will be found, but if that chain were removed, the password would not be found in this particular table.
Chain regeneration
Once the table searching is complete, a set of chains to be regenerated exists. This set of chains is then regenerated, and at each stage the target hash is checked against the currently generated hash. If there is a match, the password is reported. Once all the candidates are regenerated, the search for the table is complete. If there are any unfound passwords, the process continues with the next table.
In this example, the chains starting with "Hacker" and "Secure" would be regenerated, and the password would be found.
Table Parameters
The only major parameter to tune with table generation is the chain length (--chainlength). This parameter works as follows:
Storage space is reduced by a factor of the chain length. If the chain length is doubled, the size of the table will be cut roughly in half.
Cracking time is increased with the chain length according to a few factors. Candidate hash generation time goes up with the square of chain length, table searching time goes up linearly with chain length, and candidate hash regeneration time goes up roughly linearly with chain length (though it could be chain length squared in a worst case - more candidate hashes means more chains, each of which are longer).
Tune the chain length to your needs. Typically, a chain length of 200 000 is being used for large tables, though some people are generating length 8 tables with chains of length 422 000. In general, the shortest chain length you can use to fit your table will lead to the best performance. However, GPUs and CPUs continue to increase in speed, so longer chain lengths that are a bit long to crack now will be less of an issue in the future. It's probably best to generate tables with a chain length that is slightly long for today's technology, with the understanding that increases in computer speed will bring this down to much more reasonable times in the future.



