Cryptohaze.com

by **Bitweasil** » Sat Oct 30, 2010 7:31 pm

I've sort of been off the grid lately, as I'm sure some people have noticed. I've had a bunch of stuff going on in my life, and have been working on the work/life/code balance. I swung hard towards code for a while, and lost some stuff in my life that was of much value to me. I swung life-ways for a while, and now am trying to find some balance in the middle.

Anyway, in that vein, I'm starting to code on things again.

The current codebase is C/CUDA/(ugly and not expandable).

I'm working on converting all my existing code to a new paradigm: C++/OpenCL. The reasons are as follows:

- OpenCL is now usable, and decently quick. It's damned fast on ATI as well, which is what my primary focus here is. ATI + bitalign = "ZOMGFAST on crypto." Also, easier CPU code, if OpenCL does a decent job vectorizing things. This opens up the code to a larger target audience.
- C++ lets me do things more cleanly, and more easily accept new code from people in modular form. To expand:
-- By going to C++, I can use factory functions to alternate between different functionally equivalent code. I can, for instance, support a tighter RT format as well as my current format, and simply switch based on the table version in the headers. I can also support a 32-bit compatible table search that works on Windows.
-- If all the functionality for a hash type is encapsulated in a C++ class & .cl files, it should be very easy to add new hash types. And, importantly, it should be easy for other people to submit hash types to plug in.
-- The code should be much easier to read if it's broken up into class files that encapsulate sane functionality.
-- C++ lets me use Boost for cross-platform networking calls, among other things. This will again make cross-platform code easier to write.

The goal is a single codebase that compiles sanely on Windows/Linux/OSX, and functions the same on all platforms. If I do OpenCL properly, the same code can also run on CPU hosts at a sane speed. Not *fast*, mind you, but usable. Hopefully. At least for smaller table formats.

I'm also planning to write my code in a manner that utilizes all the CPUs/GPUs on a system. This will take some work, but should be doable. This should help immensely on laptops that don't have much of anything powerful, but have between 2 CPU cores & 1 or 2 GPUs, a respectable amount of processing.

And, related, I'm going to be setting up a wiki, bugtracking system, and other useful utilities to help turn this into a better quality project.

So, I guess, just keep your eyes open. I'm starting up the new codebase now, and should have something usable in a month or two for testing. I especially could use some ATI people to test my code on that platform.

Enjoy!

by **blazer** » Sun Oct 31, 2010 11:35 am

indeed life is so much more important, don't work too hard.
like usual will be looking forward to updates and perhaps purchasing of an ATI if the 6870 drops to 250 i just mite get it.

It seems the ATI cards are roughly 2.5-3x better in performance (cryptowise) and the supposed lower power consumption heat output do make them damn appealing.

by **blazer** » Mon Nov 01, 2010 1:38 pm

hey got a question

what are the benefits of using the GRT reduction function over the standard RT one
Can the standard one be somehow optimized to run well on GFX cards?

Its just that i noticed that when making GRTs, they seem to be well short of the expected unq chains as i previously posted.

I'm just kinda siding with what PB said at FRT that there is no point having it 5x faster when 5x more work is needed. (Though I guess it will mean it cracks 5x faster).

by **Bitweasil** » Mon Nov 01, 2010 5:24 pm

blazer wrote:hey got a question

what are the benefits of using the GRT reduction function over the standard RT one
Can the standard one be somehow optimized to run well on GFX cards?

Its just that i noticed that when making GRTs, they seem to be well short of the expected unq chains as i previously posted.

I'm just kinda siding with what PB said at FRT that there is no point having it 5x faster when 5x more work is needed. (Though I guess it will mean it cracks 5x faster).

The reduction function I use is a mix of bitshifting, bitmasking, and lookup tables. It works very well on GPUs, as observed. I'm not aware of any way to get integer division or multiplication nearly as fast, which is what the stock function is doing a lot of.

How much short of the expected unique chains are you? I know it's not *quite* as ideal as the stock reduction function, but it's somewhat significantly faster (not having seen a CUDA rtgen when I was working on this code, I wasn't sure how much faster, other than "a good bit"). I'm still a bit weak on the RT math, need to sit down with it some weekend. Could you play with large table numbers with both setups & see which comes out ahead assuming a 5x speed difference?

I'm also open to playing with new reduction functions. They're non-trivial to create in a manner that works well.

by **blazer** » Fri Nov 05, 2010 4:42 am

well this with the first set i made of the Len6 ALL with chain length of 150,000

Generated:146 000 000 Chains
After perfection:7 805 394
Perfected Table Prob: 79.66% Success

Expected Unique:9 184 644
Expected Table Prob: 84.65%

So i'm rougly 1.3 million chains short

That was at a super overkill 29.79x work factor.

by **Bitweasil** » Fri Nov 05, 2010 2:16 pm

What do those numbers end up looking like if one wanted to do NTLM len8 tables? Is the reduction in performance sufficient to overtake the faster speeds at that point?

I am working on a reduction function analysis tool, so hopefully I can play with stuff & find something even better soon.

by **blazer** » Fri Nov 05, 2010 3:34 pm

hmm good question, i guess some small scale test would be appropriate. Will hopefully get round to it.

Though, with the above stats, it seems there is a max capped unq chains just under 8 mil so it doesn't seem possible to hit the needed 99.96 using 4 tables

It probably won't even be possible to get 99.99 using 5 tables i don't think

by **Sc00bz** » Fri Nov 05, 2010 4:55 pm

blazer wrote:well this with the first set i made of the Len6 ALL with chain length of 150,000

Generated:146 000 000 Chains
After perfection:7 805 394
Perfected Table Prob: 79.66% Success

Expected Unique:9 184 644
Expected Table Prob: 84.65%

So i'm rougly 1.3 million chains short

That was at a super overkill 29.79x work factor.

I'm thinking the real success rate is not 79.66%. Can you test it with like a 100 random passwords in the key space?
PHP script for generating random MD5s http://hashkiller.com/index.php?topic=3 ... 2#msg12552

Then can you test these 20 passwords (I picked these because I think these should have a low success rate, there are several others for each of the first 15 lines there are 719 other permutations that and you can do plus or minus a few for each letter b->a,c [there's more leeway on higher length passwords]):

Code: Select all: hash:password 5a3d4fdda6a00bc65522c565806a2930: 0@P`p fb8bab517553ed9b94c22e34ac2ff705:!1AQaq 9af67b2b4378f58e96291113341cbb05:"2BRbr fd740ba3a28f95be94ab970c0c1879a3:#3CScs 253a0bda23418aa0d520bd3aad4102cf:$4DTdt e8883a35052d67cb1d9fc8df028355ab:%5EUeu 526b8f710cee4b2b051e4f5009ed15c0:&6FVfv 262540e4c6874363488a42647df483dc:'7GWgw 0624ff546ccd687474307c8b9a05644f:(8HXhx 371813a9f06e63a4330a1b830ed708c4:)9IYiy 3cfd27b16f3e6145a3d247f0582cfcbe:*:JZjz ab0759b2f28f778d77ac690db3a3e99d:+;K[k{ 391347fc1e8374d6833872d8b5e68286:,<L\l| 4c8f7d5968d6b6e6665398ba3c5de609:-=M]m} 6741bfdc543f2ae89fef7a0f47d3bddd:.>N^n~ ec1626fa06b46e81e41954439704898b:>N^n~. d7a16075848e2786bcb80ee28d259e0c:N^n~.> c30400859210bc7b82f51f40ef2677fb:^n~.>N 7f016c2c760274d8289f09302ec8e5fc:n~.>N^ ad9a412aa83ef58f40c268da3be911df:~.>N^n

If you still want to test more, try passwords that contain only one unique letter (aaaaaa, bbbbbb, ...) these should have a higher success rate.

by **Bitweasil** » Fri Nov 05, 2010 5:00 pm

Interesting. I'll play with some of those numbers when I get home. What's the maximum limit for the rcrack reduction function?

by **blazer** » Sat Nov 06, 2010 2:27 am

hey sc00bz, 100% hits for those hashes using the MD5 L6 tables Bitweasil generated.

The ones at 100k chain length

Roughly 53-58k hits per hash per index
Only indexes 1,2,3 were needed to lookup those hashes index 4 wasn't even needed.

Index1:9915810
Index2:9915646
Index3:9916806
Index4:9916585

Success: 99.54%

Was quite tedious copy and pasting each hash, really need a list mode.

what are some good numbers for very very small scale tests to look at the current merge rate.
BTW NTLM step rate is at about 1.1 Bln for me on the GTX470. hehe just under half of FRTs current one. Makes winrtgen look like my old 486

Cryptohaze.com

OpenCL/C++ GRT code

OpenCL/C++ GRT code

Re: OpenCL/C++ GRT code

Re: OpenCL/C++ GRT code

Re: OpenCL/C++ GRT code

Re: OpenCL/C++ GRT code

Re: OpenCL/C++ GRT code

Re: OpenCL/C++ GRT code

Re: OpenCL/C++ GRT code

Re: OpenCL/C++ GRT code

Re: OpenCL/C++ GRT code

Who is online