New design paradigm discussion: Massively Multiprocess

Discussion and support for the CUDA Multiforcers (Windows and Linux)
  • Ads

New design paradigm discussion: Massively Multiprocess

Postby Bitweasil » Mon May 18, 2009 1:20 pm

I've been thinking about this for the past few weeks and wanted to submit this idea for discussion.

The "old guard" of password cracking is John the Ripper, Ophcrack, Cain & Abel, etc. They are mostly designed for single CPU systems or low numbers of CPU cores. There is a MPI enabled John, and there is a distributed John, but they aren't particularly easy to use, and still rely on the old design system. John, specifically, makes extensive use of large in-memory structures to effectively search human-likely password space, but it's not suited to many new architectures.

The new way of hardware, as made clear by the recent advances in GPGPU computing, the Cell processor, desktop and laptop processor core counts, and Intel's upcoming Larrabee chip, is massively multicore. Instead of one or two powerful cores, there will be dozens to hundreds of less powerful cores, with a few powerful cores to manage them. We see this with nVidia CUDA, with ATI's Stream SDK, and to an extent with SSE2 and other vector extensions. Doing the same thing to a bunch of data in parallel is the new way to extract peak performance out of today's processing cores.

This explosion of power requires a new approach to effectively utilize it for password cracking. The rates achievable are now fast enough to properly brute force password space, and, in some cases, to do it far faster than the old tools were with their optimized routines. Additionally, mutating dictionary attacks can be done at a much higher rate.

My proposal is a new open source framework for password cracking that takes advantage of the new hardware.

The core would be a small, easily extendable daemon that handles taking crack tasks. It does no computation, but hands out and manages work units for a variety of compute clients. The daemon listens on a network port, allowing both local and remote systems to contribute.

The compute clients are where the work takes place. They grab a work unit, process it as they can, and submit results. The advantage of this model is that clients can be built for any architecture - SSE2, CUDA, Stream SDK, Cell, Larabee, etc. As long as they conform to the established communication API, and do the requested work, they can connect & contribute to the brute forcing.

Additionally, this modular design will make it easy for people who are skilled in optimization to create optimized clients - and it will be easy to compare compute clients to one another, as they will be taking part in the same tasks.

I intend to create this as an open source project, and document it well so expanding the project is easy.

Thoughts? Feedback? Suggestions?
Bitweasil
Site Admin
 
Posts: 912
Joined: Tue Jan 20, 2009 4:26 pm

Re: New design paradigm discussion: Massively Multiprocess

Postby foobar2342 » Mon May 18, 2009 4:23 pm

i am working on a similar thing: a c++ framework for device independent
rainbow table generation. as i have a pretty large codebase (10k lines),
and already a sound architecture you might want to merge your effort
with mine. problem is i cannot release the source right now.
although i do not support bruteforcing right now and
have to change a bit to generate plaintexts on the gpu instead of
the host cpu (as you can do when you generate tables and need
only a few start values to crunch numbers for a while) all this is planned
for the future.
this is not a network oriented program but rather a traditional framework
where classes are written to extend it.
the algorithm i have implemented already is not a password related one,
but i intend to implement SHA1 at least in the near future and port
to the cell.
source code to be released in a month or so.

there are lots of cool features already implemented, like
a template metaprogramming algorithm that makes a cross product
of round functions, chain abort conditions and algorithm
implementations (as a template class), so you can have modularity and still
inline functions or the interleaved scheduler, that takes 2 implementations
of the cruncher as arguments (one that uses global memory on the GPU, one that
uses shared memory) and interleaves their execution (in a single call
to the cuda kernel) (like 25% with DRAM access)
for a small speed advantage to only the shared memory algorithm.

i was following your work for the last weeks and just wanted to let you
know that there is another work in progress to build a flexible framework
(ranging from bruteforcers to rainbow tables) for all kinds of gpu/crypto
related stuff and yes, there is also network distribution/collection of
work planned for the not so near future.
i can send you some parts of the source for evaluation.
foobar2342
 
Posts: 17
Joined: Sun Apr 05, 2009 7:41 pm

Re: New design paradigm discussion: Massively Multiprocess

Postby Bitweasil » Mon May 18, 2009 4:58 pm

Interesting. I would be interested in talking to you more about the project. I can be found on irc.freenode.net as Bitweasil, or PM me here.

Are you looking to release your framework, or is it for specific purposes? If you are doing non-password algorithms, I have my thoughts on what you might be doing, but the process is similar for all tables.

What's the best way to get in touch with you? There appear to be multiple people working on the same thing in parallel, so it may be good to coordinate - if not combine code, at least discuss ideas.
Bitweasil
Site Admin
 
Posts: 912
Joined: Tue Jan 20, 2009 4:26 pm

Re: New design paradigm discussion: Massively Multiprocess

Postby neinbrucke » Mon May 18, 2009 6:12 pm

i agree, i think it's time for a new brute forcing framework. I like the way you think about multi processing, doing it network wise... this will immediately support distributed attacks. It might need an extra layer (of security) in the future to even support online collaboration as well, but that wouldn't be much of a problem.

it would be nice to also have some (optional) bare templates to start with for several platforms, containing code for plaintext generation, code for comparisons and such. This might speed up some people who'd like to contribute and just focus on the optimized algorithm code.
neinbrucke
 
Posts: 18
Joined: Mon Feb 16, 2009 8:09 pm

Re: New design paradigm discussion: Massively Multiprocess

Postby Bitweasil » Mon May 18, 2009 6:24 pm

neinbrucke wrote:i agree, i think it's time for a new brute forcing framework. I like the way you think about multi processing, doing it network wise... this will immediately support distributed attacks. It might need an extra layer (of security) in the future to even support online collaboration as well, but that wouldn't be much of a problem.


I intend to eventually support an online cracking system like that, but verifying that clients aren't "cheating" is much more difficult with brute force than with rainbow tables.

it would be nice to also have some (optional) bare templates to start with for several platforms, containing code for plaintext generation, code for comparisons and such. This might speed up some people who'd like to contribute and just focus on the optimized algorithm code.


Yes, I intend to create samples that can be "filled in" with appropriate code. You're one of the target developer types - "Good at algorithm optimization."
Bitweasil
Site Admin
 
Posts: 912
Joined: Tue Jan 20, 2009 4:26 pm

Re: New design paradigm discussion: Massively Multiprocess

Postby vampyr » Mon May 18, 2009 8:48 pm

I feel the sudden need to dump my ideas here, of which about 60% are already implemented in my code, which is unfortunately:( windows-specific.

--Extension support. By using dynamic loading of dll's with a common interface, you can extend the code to support virtually any architecture. Is there any way to port the use of dll's to *nix?
--Verification: Add random hashes at random places in the cracker task. If the server fails to break these, there was likely an error in the processing of the set, so we'll assign it to another computer or retry.
--Ati support: Got the code done, i'll be lurking on IRC if you need me :P

So personally i prefer an extension mechanism in the form of dll's/ a network daemon, as this allows ANYONE to code extensions to the network.
What's needed now is a p2p architecture that DOES NOT require a central server.

About the verification. Say you give the cracker a set of sha1 hashes.
5% of these hashes, for instance, are randomly generated hashes between plaintext_start and plaintext_end.
So for instance, if your server is to scan for hashes between aaaaaa and zzzzzz , the set of hashes to crack would contain hashes like aabaaa.
If the server fails to report the hash for aabaaa as cracked, and/or fails to return the proper plaintext, this bit of work was corrupted.

Using small work units, is, obviously essential here.
vampyr
 
Posts: 9
Joined: Mon May 18, 2009 11:23 am

Re: New design paradigm discussion: Massively Multiprocess

Postby Bitweasil » Mon May 18, 2009 9:30 pm

I like that verification idea. It's essentially the reverse of what I'd be using for rainbow tables, but I agree that it has a high likelyhood of success.

I'm not at all sure how a P2P cracking system would work, though... I'm intrigued, though - this would potentially be the "final cracking solution" for things.
Bitweasil
Site Admin
 
Posts: 912
Joined: Tue Jan 20, 2009 4:26 pm

Re: New design paradigm discussion: Massively Multiprocess

Postby vampyr » Mon May 18, 2009 10:56 pm

Think of torrents. You 'd have essentially a lot of (simple) trackers tracking the IP adresses of nodes.
Nodes get sent work items (5-10 items of 150ms for instance) by nodes that have hashes for cracking.
Ip adresses get reps for work done, and lose them for posting hashes to the network.

Think: torrents, but with mandatory seeding :)
vampyr
 
Posts: 9
Joined: Mon May 18, 2009 11:23 am

Re: New design paradigm discussion: Massively Multiprocess

Postby Bitweasil » Mon May 18, 2009 11:33 pm

How would one keep track of cheating with that system? Also, I think significantly longer units would be required for making effective use of bandwidth - passing around a ton of tiny units would be inefficient. I was thinking more along the lines of 5-10 minute units, especially if a large hash list was being used. It is an interesting concept, though...
Bitweasil
Site Admin
 
Posts: 912
Joined: Tue Jan 20, 2009 4:26 pm

Re: New design paradigm discussion: Massively Multiprocess

Postby foobar2342 » Tue May 19, 2009 1:12 am

Bitweasil wrote:Interesting. I would be interested in talking to you more about the project. I can be found on irc.freenode.net as Bitweasil, or PM me here.

Are you looking to release your framework, or is it for specific purposes? If you are doing non-password algorithms, I have my thoughts on what you might be doing, but the process is similar for all tables.

What's the best way to get in touch with you? There appear to be multiple people working on the same thing in parallel, so it may be good to coordinate - if not combine code, at least discuss ideas.


The guy i am working for/with wants a specialized program to generate rainbow tables and i decided to use the opportunity
to build a general purpose framework and release the source.
so for the next weeks i am focused on getting the algo to work, but with a little bit of rework the code will also support
other algos. parts which are already completely independent of the algo are for example disk file management,
the composition of the algorithm aspects (algo, condition, round function) including the configuration of the components,
the highlevel functionality (generate, test, benchmark) and the code that interconnects all those components.
foobar2342
 
Posts: 17
Joined: Sun Apr 05, 2009 7:41 pm

Next

Return to CUDA Multiforcers

Who is online

Users browsing this forum: No registered users and 1 guest

cron