OpenCL/C++ GRT code

Discussion of the upcoming GPU accelerated rainbow table implementation
  • Ads

OpenCL/C++ GRT code

Postby Bitweasil » Sat Oct 30, 2010 7:31 pm

I've sort of been off the grid lately, as I'm sure some people have noticed. I've had a bunch of stuff going on in my life, and have been working on the work/life/code balance. I swung hard towards code for a while, and lost some stuff in my life that was of much value to me. I swung life-ways for a while, and now am trying to find some balance in the middle.

Anyway, in that vein, I'm starting to code on things again.

The current codebase is C/CUDA/(ugly and not expandable).

I'm working on converting all my existing code to a new paradigm: C++/OpenCL. The reasons are as follows:

- OpenCL is now usable, and decently quick. It's damned fast on ATI as well, which is what my primary focus here is. ATI + bitalign = "ZOMGFAST on crypto." Also, easier CPU code, if OpenCL does a decent job vectorizing things. This opens up the code to a larger target audience.
- C++ lets me do things more cleanly, and more easily accept new code from people in modular form. To expand:
-- By going to C++, I can use factory functions to alternate between different functionally equivalent code. I can, for instance, support a tighter RT format as well as my current format, and simply switch based on the table version in the headers. I can also support a 32-bit compatible table search that works on Windows.
-- If all the functionality for a hash type is encapsulated in a C++ class & .cl files, it should be very easy to add new hash types. And, importantly, it should be easy for other people to submit hash types to plug in.
-- The code should be much easier to read if it's broken up into class files that encapsulate sane functionality.
-- C++ lets me use Boost for cross-platform networking calls, among other things. This will again make cross-platform code easier to write.

The goal is a single codebase that compiles sanely on Windows/Linux/OSX, and functions the same on all platforms. If I do OpenCL properly, the same code can also run on CPU hosts at a sane speed. Not *fast*, mind you, but usable. Hopefully. At least for smaller table formats.

I'm also planning to write my code in a manner that utilizes all the CPUs/GPUs on a system. This will take some work, but should be doable. This should help immensely on laptops that don't have much of anything powerful, but have between 2 CPU cores & 1 or 2 GPUs, a respectable amount of processing.

And, related, I'm going to be setting up a wiki, bugtracking system, and other useful utilities to help turn this into a better quality project.

So, I guess, just keep your eyes open. I'm starting up the new codebase now, and should have something usable in a month or two for testing. I especially could use some ATI people to test my code on that platform.

Enjoy! :D
Bitweasil
Site Admin
 
Posts: 912
Joined: Tue Jan 20, 2009 4:26 pm

Re: OpenCL/C++ GRT code

Postby blazer » Sun Oct 31, 2010 11:35 am

indeed life is so much more important, don't work too hard.
like usual will be looking forward to updates and perhaps purchasing of an ATI if the 6870 drops to 250 i just mite get it.

It seems the ATI cards are roughly 2.5-3x better in performance (cryptowise) and the supposed lower power consumption heat output do make them damn appealing.
blazer
 
Posts: 104
Joined: Fri Jan 23, 2009 10:18 am

Re: OpenCL/C++ GRT code

Postby blazer » Mon Nov 01, 2010 1:38 pm

hey got a question

what are the benefits of using the GRT reduction function over the standard RT one
Can the standard one be somehow optimized to run well on GFX cards?

Its just that i noticed that when making GRTs, they seem to be well short of the expected unq chains as i previously posted.

I'm just kinda siding with what PB said at FRT that there is no point having it 5x faster when 5x more work is needed. (Though I guess it will mean it cracks 5x faster).
blazer
 
Posts: 104
Joined: Fri Jan 23, 2009 10:18 am

Re: OpenCL/C++ GRT code

Postby Bitweasil » Mon Nov 01, 2010 5:24 pm

blazer wrote:hey got a question

what are the benefits of using the GRT reduction function over the standard RT one
Can the standard one be somehow optimized to run well on GFX cards?

Its just that i noticed that when making GRTs, they seem to be well short of the expected unq chains as i previously posted.

I'm just kinda siding with what PB said at FRT that there is no point having it 5x faster when 5x more work is needed. (Though I guess it will mean it cracks 5x faster).


The reduction function I use is a mix of bitshifting, bitmasking, and lookup tables. It works very well on GPUs, as observed. I'm not aware of any way to get integer division or multiplication nearly as fast, which is what the stock function is doing a lot of.

How much short of the expected unique chains are you? I know it's not *quite* as ideal as the stock reduction function, but it's somewhat significantly faster (not having seen a CUDA rtgen when I was working on this code, I wasn't sure how much faster, other than "a good bit"). I'm still a bit weak on the RT math, need to sit down with it some weekend. Could you play with large table numbers with both setups & see which comes out ahead assuming a 5x speed difference?

I'm also open to playing with new reduction functions. They're non-trivial to create in a manner that works well.
Bitweasil
Site Admin
 
Posts: 912
Joined: Tue Jan 20, 2009 4:26 pm

Re: OpenCL/C++ GRT code

Postby blazer » Fri Nov 05, 2010 4:42 am

well this with the first set i made of the Len6 ALL with chain length of 150,000

Generated:146 000 000 Chains
After perfection:7 805 394
Perfected Table Prob: 79.66% Success

Expected Unique:9 184 644
Expected Table Prob: 84.65%

So i'm rougly 1.3 million chains short

That was at a super overkill 29.79x work factor.
blazer
 
Posts: 104
Joined: Fri Jan 23, 2009 10:18 am

Re: OpenCL/C++ GRT code

Postby Bitweasil » Fri Nov 05, 2010 2:16 pm

What do those numbers end up looking like if one wanted to do NTLM len8 tables? Is the reduction in performance sufficient to overtake the faster speeds at that point?

I am working on a reduction function analysis tool, so hopefully I can play with stuff & find something even better soon.
Bitweasil
Site Admin
 
Posts: 912
Joined: Tue Jan 20, 2009 4:26 pm

Re: OpenCL/C++ GRT code

Postby blazer » Fri Nov 05, 2010 3:34 pm

hmm good question, i guess some small scale test would be appropriate. Will hopefully get round to it.

Though, with the above stats, it seems there is a max capped unq chains just under 8 mil so it doesn't seem possible to hit the needed 99.96 using 4 tables

It probably won't even be possible to get 99.99 using 5 tables i don't think
blazer
 
Posts: 104
Joined: Fri Jan 23, 2009 10:18 am

Re: OpenCL/C++ GRT code

Postby Sc00bz » Fri Nov 05, 2010 4:55 pm

blazer wrote:well this with the first set i made of the Len6 ALL with chain length of 150,000

Generated:146 000 000 Chains
After perfection:7 805 394
Perfected Table Prob: 79.66% Success

Expected Unique:9 184 644
Expected Table Prob: 84.65%

So i'm rougly 1.3 million chains short

That was at a super overkill 29.79x work factor.

I'm thinking the real success rate is not 79.66%. Can you test it with like a 100 random passwords in the key space?
PHP script for generating random MD5s http://hashkiller.com/index.php?topic=3 ... 2#msg12552

Then can you test these 20 passwords (I picked these because I think these should have a low success rate, there are several others for each of the first 15 lines there are 719 other permutations that and you can do plus or minus a few for each letter b->a,c [there's more leeway on higher length passwords]):
Code: Select all
hash:password
5a3d4fdda6a00bc65522c565806a2930: 0@P`p
fb8bab517553ed9b94c22e34ac2ff705:!1AQaq
9af67b2b4378f58e96291113341cbb05:"2BRbr
fd740ba3a28f95be94ab970c0c1879a3:#3CScs
253a0bda23418aa0d520bd3aad4102cf:$4DTdt
e8883a35052d67cb1d9fc8df028355ab:%5EUeu
526b8f710cee4b2b051e4f5009ed15c0:&6FVfv
262540e4c6874363488a42647df483dc:'7GWgw
0624ff546ccd687474307c8b9a05644f:(8HXhx
371813a9f06e63a4330a1b830ed708c4:)9IYiy
3cfd27b16f3e6145a3d247f0582cfcbe:*:JZjz
ab0759b2f28f778d77ac690db3a3e99d:+;K[k{
391347fc1e8374d6833872d8b5e68286:,<L\l|
4c8f7d5968d6b6e6665398ba3c5de609:-=M]m}
6741bfdc543f2ae89fef7a0f47d3bddd:.>N^n~
ec1626fa06b46e81e41954439704898b:>N^n~.
d7a16075848e2786bcb80ee28d259e0c:N^n~.>
c30400859210bc7b82f51f40ef2677fb:^n~.>N
7f016c2c760274d8289f09302ec8e5fc:n~.>N^
ad9a412aa83ef58f40c268da3be911df:~.>N^n


If you still want to test more, try passwords that contain only one unique letter (aaaaaa, bbbbbb, ...) these should have a higher success rate.
Sc00bz
 
Posts: 93
Joined: Thu Jan 22, 2009 9:31 pm

Re: OpenCL/C++ GRT code

Postby Bitweasil » Fri Nov 05, 2010 5:00 pm

Interesting. I'll play with some of those numbers when I get home. What's the maximum limit for the rcrack reduction function?
Bitweasil
Site Admin
 
Posts: 912
Joined: Tue Jan 20, 2009 4:26 pm

Re: OpenCL/C++ GRT code

Postby blazer » Sat Nov 06, 2010 2:27 am

hey sc00bz, 100% hits for those hashes using the MD5 L6 tables Bitweasil generated.

The ones at 100k chain length

Roughly 53-58k hits per hash per index
Only indexes 1,2,3 were needed to lookup those hashes index 4 wasn't even needed.

Index1:9915810
Index2:9915646
Index3:9916806
Index4:9916585

Success: 99.54%

Was quite tedious copy and pasting each hash, really need a list mode.

what are some good numbers for very very small scale tests to look at the current merge rate.
BTW NTLM step rate is at about 1.1 Bln for me on the GTX470. hehe just under half of FRTs current one. Makes winrtgen look like my old 486
blazer
 
Posts: 104
Joined: Fri Jan 23, 2009 10:18 am

Next

Return to GPU Rainbow Tables

Who is online

Users browsing this forum: No registered users and 1 guest

cron