Adding a hash type to the MFN Framework

From Cryptohaze Project Wiki
Revision as of 02:38, 11 May 2012 by Bitweasil (Talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

One of the more common tasks is to add a hash type to the MFN framework. As I'm in the middle of adding SHA256 kernel types, I'm documenting the process!

Contents

Overview of the process

The basic process of adding a hash type is fairly simple.

  • Implement the hash. This should be done in a manner suited to the GPUs, which is a topic well beyond this short post.
  • Add the host side C++ classes for CUDA and OpenCL (and CPU, if it's being added).
  • Add the GPU kernels for CUDA and OpenCL.
  • Add the hash type to the make system, and add it to the hash class launcher so it knows about it.

That's it if you're adding a "supported" hash type (Plain, Salted, etc).

We're going to add a plain hash right now. No salt, SHA256, 32 bytes (8 words) long output.

Hash Overview

You need to know a few things about the hash to implement it (other than the actual hash function). One of the most important things is to understand if the hash is "little endian" or "big endian." This may not be the exact term for the traits I'm about to mention, so if not, please correct me. :) The GPU and CPU architecture of any device currently supported is little endian. Hashes like MD4 and MD5 map nicely to this. Hashes in the SHA family, on the other hand, are "big endian" hashes, and things need to be reversed in a few places. The easiest way to tell this is to determine how you set up a null hash. If the padding is set up as (0x80 << 0) for b0, you've got a little endian hash. If it's (0x80 << 24) for b0, it's a big endian hash. It matters a lot, as we don't do endian reversal anymore - we just deal with it natively. We'll use this later.

Checking for prerequisites

The first prerequisite for implementing a hash type is a hash file class. In this case, it's a plain hash of length 32 (bytes), so the CHHashFileVPlain class will work perfectly (called with a constructor telling it that the hashes are 32 bytes long).

The second is that you will need a general hash type class for the hash. Since we're implementing a plain, unsalted hash, CHHashTypePlain will work as a base to derive our classes from (again, needing to know the size of the hash in bytes).

Let's get started!

CUDA Classes

The first step will be the CUDA classes. The easiest way to create a new hash file is to copy from an old, similar one, and modify it as needed. This is what I do for most cases, so this will be done here. We're copying from MFNHashTypePlainCUDA_SHA1 - this is another "big endian" hash class, so it does a lot of what we need.

Create the new files

Create three files for the CUDA class.

  • inc/MFN_CUDA_host/MFNHashTypePlainCUDA_SHA256.h - the header file for the host class
  • src/MFN_CUDA_host/MFNHashTypePlainCUDA_SHA256.cpp - the source file for the host class
  • src/MFN_CUDA_device/MFNHashTypePlainCUDA_SHA256.cu - the device functions and "glue" wrappers

Populate the class header file

Easiest way? Copy-paste from the SHA1 header, paste into the SHA256 header, and do a global search/replace in the file for "SHA1" -> "SHA256" - this is just about what needs to be done. This is how I usually do it. Save it and you're done. Easy!

Populate the class source file

Not quite as easy as the header, but not far. Start with a global search & replace for SHA1 -> SHA256. This will get the variable and class names.

The variables are all named after the hash file as opposed to standard names because all the CUDA files are compiled into one binary. There can be issues with multiple .cu files having the same name constants, so this just avoids any possibility of an issue.

The most important thing is to update the constructor to call the parent constructors with the correct hash length. In this case, the hash length is 32 bytes, so we do:

MFNHashTypePlainCUDA_SHA256::MFNHashTypePlainCUDA_SHA256() :  MFNHashTypePlainCUDA(32) {...

The next step is to look at the hash pre-processing and post-processing. These steps are run to convert the hash from hashfile format into device format, and then back into hashfile format after being found on the device. This step is used to reverse the byte ordering if needed, and to "unwind" hashes (see my upcoming talk at Defcon, if I'm selected). In this case, we will be subtracting out the final additions, but we do this later, after verifying everything works. The only change needed from SHA1 to SHA256 is to add the number of words reversed. Right now, it's 5. It needs to be 8. So, do something like this for preProcess:

    hash32[0] = reverse(hash32[0]);
    hash32[1] = reverse(hash32[1]);
    hash32[2] = reverse(hash32[2]);
    hash32[3] = reverse(hash32[3]);
    hash32[4] = reverse(hash32[4]);
    hash32[5] = reverse(hash32[5]);
    hash32[6] = reverse(hash32[6]);
    hash32[7] = reverse(hash32[7]);

And, comment out the subtraction lines for now. We'll come back to these.

For the post process step, we comment out the addition lines (for now) and add the additional reversals.

That does it for the host CUDA class.

Rough in the CUDA device file

Now, we need to build a device file. Again, this is an awful lot of copy-paste. I copy the source from MFNHashTypePlainCUDA_SHA1.cu and do a general search and replace on SHA1 with SHA256.

Since I don't have the kernel defined yet, I rip out the actual call to the kernel (the SHA1_PARTIAL_ROUNDS() call) for now. This effectively noops the kernel, but that's OK for now. Note that the kernel is a define, so needs a backslash at the end of every line to indicate it continues.

I also change the checkHashList160BE function to be a 256 bit wide function. This involves replacing the multipliers of value 5 with value 8 (for the larger hashes), and a few other tweaks.

I also add the additional hash variables (f, g, h) to the defines, and update a few other functions. This is only needed when using a new hash width - it's not needed otherwise.

At this point, I should try compiling everything to check for obvious syntax errors.

Since "make" will make all the files, I just use it in the root. It won't link the stuff in, but it will try to build it.

Lots of undeclared variables in the .cu file, which is fine (I'm not doing anything), and everything else builds clean. Awesome.

Add the glue to link the new hash class into the runtime environment

We have a new class, and a new (null function) kernel. It's time to link it into the environment so I can run the kernel and debug the hash algorithm while developing it.

There are a few things that need to be done here.

Add to the makefile/project

First, the new files need to be added to the makefile/project. For Linux, this involves editing the root Makefile, and adding two filenames.

To NEW_MULTIFORCER_CPP_FILES, add the .cpp file - MFNHashTypePlainCUDA_SHA256.cpp

And, to NEW_MULTIFORCER_CU_FILES, add the .cu file - MFNHashTypePlainCUDA_SHA256.cu

Now, a "make binaries/New-Multiforcer" should complete without errors.

Add a hash type define

In inc/MFN_Common/MFNDefines.h, be sure to add the needed defines. You'll need a hash type define, and possibly a hash file define (if it's not using an existing size/type). For this, we need both.

I add the following (one for the hash type, one for a 32-byte wide plain hash file):

#define MFN_HASHTYPE_SHA256 0x1800
#define CH_HASHFILE_PLAIN_32 0x4002

Add a hash identifier

We need to be able to call the hash from the command line with the "-h" flag. In this case, we want to use the logical "SHA256" option.

Open src/MFN_Common/MFNHashIdentifiers.cpp.

Copy an existing hash define block, and set it up for the new hash type. In this case, it looks like this:

    // 32-byte SHA256 hash
    NewHash.HashDescriptor = "SHA256";
    NewHash.HashID = MFN_HASHTYPE_SHA256;
    NewHash.HashDetails = "Unsalted SHA256 hashes.";
    NewHash.HashAlgorithm = "sha256($pass)";
    NewHash.DefaultWorkunitSize = 32;
    NewHash.MinSupportedLength = 1;
    NewHash.MaxSupportedLength = 8;
    NewHash.NetworkSupportEnabled = 0;
    NewHash.MaxHashCount = 0;
    NewHash.HasCPUSupport = 1;
    NewHash.HasCUDASupport = 1;
    NewHash.HasOpenCLSupport = 1;
    NewHash.HashTypeIdentifier = MFN_HASHTYPE_SHA256;
    NewHash.HashFileIdentifier = CH_HASHFILE_PLAIN_32;
    this->SupportedHashTypes.push_back(NewHash);

Adding the hash file to the class factory

The hash file is built by the class factory.

Open src/MFN_Common/MFNMultiforcerClassFactory.cpp

In createHashfileClass, add your hashfile like so:

        case CH_HASHFILE_PLAIN_32:
            this->HashfileClass = new CHHashFileVPlain(32);
            break;

Adding the CUDA class to the thread launcher

Finally, we need to tell the thread launcher about the shiny new CUDA class.

Open src/MFN_Common/MFNHashClassLauncher.cpp

Add the new class header file at the top.

And, in the addCUDAThread function, add a new line for a SHA256 kernel.

        case MFN_HASHTYPE_SHA256:
            newCUDAThread = new MFNHashTypePlainCUDA_SHA256();
            break;

Testing!

Finally, this should all come together.

Do a "make binaries/New-Multiforcer" again. If all goes well, it will actually build!

You can now do a quick function test. Since you've not implemented the kernel yet, it's not going to find anything, but it should at least run.

Something like:

./binaries/New-Multiforcer -h SHA256 -c shared/charsets/charsetall -f shared/test_hashes/Hashes-SHA256-Full.txt --debug --min 5 --max 5 --noopencl --nocpu --cudadevice 0

You should see sane debug output:

Got 2 CUDA devices!
Device 0: CUDA, device 0
MFND: Setting hash name SHA256
MFND: Setting password len 5
MFND: Status: Starting pw len 5
MFND: Setting password len 5
MFND: WU status: 0/2
MFND: Hash status: 0/13
MFND: Total crack rate 0.00 /s
MFND: Total rate: 0.00 
MFND: Status: Td 0: CID 17767.
MFND: Thread 0 setting rate to 18.14B
MFND: WU status: 0/2
MFND: Hash status: 0/13
MFND: Total crack rate 18.14B/s
MFND: Total rate: 18.14B
MFND: WU status: 1/2
MFND: Hash status: 0/13
MFND: Total crack rate 18.14B/s
MFND: Total rate: 18.14B
MFND: Thread 0 setting rate to 18.30B
MFND: WU status: 1/2
MFND: Hash status: 0/13
MFND: Total crack rate 18.30B/s
MFND: Total rate: 18.30B
MFND: WU status: 2/2
MFND: Hash status: 0/13
MFND: Total crack rate 18.30B/s
MFND: Total rate: 18.30B
MFND: Status: Td 0: out of WU.
MFND: Thread 0 setting rate to 0.00 
MFND: WU status: 2/2
MFND: Hash status: 0/13
MFND: Total crack rate 0.00 /s
MFND: Total rate: 0.00 

Awesome. Other than the insanely fast rate of doing nothing, it looks good.

Now to implement the hash type...

Implement your hash

This is left as an exercise to the reader. :)

Put it as a define, and try to avoid using memory unless required. This is where the art goes.

Testing your hash

Now comes the fun part - figuring out why it doesn't work.

Uncomment the following lines that are likely part of the .cu file:

#include "CUDA_Common/cuPrintf.cu"

cudaPrintfInit();

cudaPrintfDisplay(stdout, true);
cudaPrintfEnd();

Congratulations. You've destroyed your kernel's performance.

And can now put cuPrintf statements in the right places to debug it.

I suggest something like this:

cuPrintf("input: %08x %08x ... %08x\n", b0, b1, b15); \
CUDA_SHA256_FULL(); \
cuPrintf("output: %08x %08x %08x %08x...\n", a, b, c, d); \

Now, you don't want a gazillion threads outputting stuff, so run it with --threads 1 --blocks 1

Glacial, but easy to read.

Once you've fixed your typos, remove all the cuPrintf lines (and the includes/cuPrintfInit!) and test speed/verify correctness.

Final work for CUDA

After you've pulled out the cuPrintf lines (seriously - pull them out or they kill performance!), test to make sure you find hashes. Create a makeTestHashes.php script in shared/test_hashes/ if you want - it's a good idea to make 1000 or so hashes, or 10000, and verify that Every. Single. One. is found.

Then commit!

OpenCL Classes

Same thing, different kernel type!

This process looks a lot like the CUDA steps, just with OpenCL. There's a bit less work, since you've already created the hash file.

Create the new files

Create three files for the OpenCL class.

  • inc/MFN_OpenCL_host/MFNHashTypePlainOpenCL_SHA256.h - the header file for the host class
  • src/MFN_OpenCL_host/MFNHashTypePlainOpenCL_SHA256.cpp - the source file for the host class
  • src/MFN_OpenCL_device/MFNHashTypePlainOpenCL_SHA256.cl - the device kernel

Populate the class header file

Easiest way? Copy-paste from the SHA1 header, paste into the SHA256 header, and do a global search/replace in the file for "SHA1" -> "SHA256" - this is just about what needs to be done. This is how I usually do it. Save it and you're done. Easy!

Populate the class source file

This is slightly more complex than for the CUDA version, because there's some metaprogramming going on.

But, as always, search/replace from the old hash type to the new hash type, and make sure the constructor is getting called with the right bytes per hash.

Like in the CUDA kernel, you'll need to increase the number of words reversed from 5 to 8 for both the preProcess and postProcess.

Ensure that in getHashFileNames, you point to all the source files you'll need.

Finally, getDefineStrings is the metaprogramming module that creates on-the-fly defines for the kernel. This is better described elsewhere, but the defaults should be mostly OK. This gets into kernel programming.

In this case, going from SHA1 to SHA256, we alter the makeBitmapLookup password check function from "CheckPassword160" to "CheckPassword256".

You might want to make sure things compile at this point...

Rough in the OpenCL device file

Copy-paste, as usual. Replace SHA1 with SHA256.

You'll need to redefine the CheckPassword160 function as CheckPassword256 (and make the needed changes to the offsets).

Add the glue to link the new hash class into the runtime environment

We have a new class, and a new (null function) kernel. It's time to link it into the environment so I can run the kernel and debug the hash algorithm while developing it.

There are a few things that need to be done here.

Add to the makefile/project

First, the new files need to be added to the makefile/project. For Linux, this involves editing the root Makefile, and adding two filenames.

To NEW_MULTIFORCER_CPP_FILES, add the .cpp file - MFNHashTypePlainOpenCL_SHA256.cpp

Now, a "make binaries/New-Multiforcer" should complete without errors.

Adding the OpenCL class to the thread launcher

Finally, we need to tell the thread launcher about the shiny new CUDA class.

Open src/MFN_Common/MFNHashClassLauncher.cpp

Add the new class header file at the top.

And, in the addOpenCLThread function, add a new line for a SHA256 kernel.

        case MFN_HASHTYPE_SHA256:
            newOpenCLThread = new MFNHashTypePlainOpenCL_SHA256();
            break;

Testing!

Finally, this should all come together.

Try running a "make binaries/New-Multiforcer" - it should succeed. If not, fix it.

As before, try running the empty kernel to check basic function.

./binaries/New-Multiforcer -h SHA256 -c shared/charsets/charsetall -f ./sha256_1000 --min 5 --max 5 --nocuda --openclplatform=1 --nocpu --debug

This is good news:

Got 1 OpenCL GPUs
Device 0: OpenCL, p:1, d:0
MFND: Setting hash name SHA256
MFND: Setting password len 5
MFND: Status: Starting pw len 5
MFND: Setting password len 5
MFND: WU status: 0/2
MFND: Hash status: 0/1000
MFND: Total crack rate 0.00 /s
MFND: Total rate: 0.00 
MFND: Status: Td 0: CID 17767.
MFND: Thread 0 setting rate to 31.42B
MFND: WU status: 0/2
MFND: Hash status: 0/1000
MFND: Total crack rate 31.42B/s
MFND: Total rate: 31.42B
MFND: WU status: 1/2
MFND: Hash status: 0/1000
MFND: Total crack rate 31.42B/s
MFND: Total rate: 31.42B
MFND: Thread 0 setting rate to 30.34B
MFND: WU status: 1/2
MFND: Hash status: 0/1000
MFND: Total crack rate 30.34B/s
MFND: Total rate: 30.34B
MFND: WU status: 2/2
MFND: Hash status: 0/1000
MFND: Total crack rate 30.34B/s
MFND: Total rate: 30.34B
MFND: Status: Td 0: out of WU.
MFND: Thread 0 setting rate to 0.00 
MFND: WU status: 2/2
MFND: Hash status: 0/1000
MFND: Total crack rate 0.00 /s
MFND: Total rate: 0.00 

Implement your hash

This is left as an exercise to the reader. :)

Put it as a define, and try to avoid using memory unless required. This is where the art goes.

Testing your hash

Now comes the fun part - figuring out why it doesn't work.

You should be able to use "printf" in the kernel, and have it output from the GPU - at least on an AMD GPU. Try the "--threads 1 --blocks 1" commandline params again to reduce the output to a sane amount.

Final work for OpenCL

Test it to ensure it finds everything over a wide range of lengths, and you're done!

CPU kernels

If you've made it this far, you should have a good idea of how to generate CPU kernels...

The only real difference is that you don't have a device file. All the code is contained in the CPP file - which should be built with -msse2 for SSE2 ops. If you have a good reason for using newer than SSE2, please do, and adjust the makefile accordingly.

Personal tools
Namespaces
Variants
Actions
Navigation
Toolbox