New design paradigm discussion: Massively Multiprocess

I've been thinking about this for the past few weeks and wanted to submit this idea for discussion.
The "old guard" of password cracking is John the Ripper, Ophcrack, Cain & Abel, etc. They are mostly designed for single CPU systems or low numbers of CPU cores. There is a MPI enabled John, and there is a distributed John, but they aren't particularly easy to use, and still rely on the old design system. John, specifically, makes extensive use of large in-memory structures to effectively search human-likely password space, but it's not suited to many new architectures.
The new way of hardware, as made clear by the recent advances in GPGPU computing, the Cell processor, desktop and laptop processor core counts, and Intel's upcoming Larrabee chip, is massively multicore. Instead of one or two powerful cores, there will be dozens to hundreds of less powerful cores, with a few powerful cores to manage them. We see this with nVidia CUDA, with ATI's Stream SDK, and to an extent with SSE2 and other vector extensions. Doing the same thing to a bunch of data in parallel is the new way to extract peak performance out of today's processing cores.
This explosion of power requires a new approach to effectively utilize it for password cracking. The rates achievable are now fast enough to properly brute force password space, and, in some cases, to do it far faster than the old tools were with their optimized routines. Additionally, mutating dictionary attacks can be done at a much higher rate.
My proposal is a new open source framework for password cracking that takes advantage of the new hardware.
The core would be a small, easily extendable daemon that handles taking crack tasks. It does no computation, but hands out and manages work units for a variety of compute clients. The daemon listens on a network port, allowing both local and remote systems to contribute.
The compute clients are where the work takes place. They grab a work unit, process it as they can, and submit results. The advantage of this model is that clients can be built for any architecture - SSE2, CUDA, Stream SDK, Cell, Larabee, etc. As long as they conform to the established communication API, and do the requested work, they can connect & contribute to the brute forcing.
Additionally, this modular design will make it easy for people who are skilled in optimization to create optimized clients - and it will be easy to compare compute clients to one another, as they will be taking part in the same tasks.
I intend to create this as an open source project, and document it well so expanding the project is easy.
Thoughts? Feedback? Suggestions?
The "old guard" of password cracking is John the Ripper, Ophcrack, Cain & Abel, etc. They are mostly designed for single CPU systems or low numbers of CPU cores. There is a MPI enabled John, and there is a distributed John, but they aren't particularly easy to use, and still rely on the old design system. John, specifically, makes extensive use of large in-memory structures to effectively search human-likely password space, but it's not suited to many new architectures.
The new way of hardware, as made clear by the recent advances in GPGPU computing, the Cell processor, desktop and laptop processor core counts, and Intel's upcoming Larrabee chip, is massively multicore. Instead of one or two powerful cores, there will be dozens to hundreds of less powerful cores, with a few powerful cores to manage them. We see this with nVidia CUDA, with ATI's Stream SDK, and to an extent with SSE2 and other vector extensions. Doing the same thing to a bunch of data in parallel is the new way to extract peak performance out of today's processing cores.
This explosion of power requires a new approach to effectively utilize it for password cracking. The rates achievable are now fast enough to properly brute force password space, and, in some cases, to do it far faster than the old tools were with their optimized routines. Additionally, mutating dictionary attacks can be done at a much higher rate.
My proposal is a new open source framework for password cracking that takes advantage of the new hardware.
The core would be a small, easily extendable daemon that handles taking crack tasks. It does no computation, but hands out and manages work units for a variety of compute clients. The daemon listens on a network port, allowing both local and remote systems to contribute.
The compute clients are where the work takes place. They grab a work unit, process it as they can, and submit results. The advantage of this model is that clients can be built for any architecture - SSE2, CUDA, Stream SDK, Cell, Larabee, etc. As long as they conform to the established communication API, and do the requested work, they can connect & contribute to the brute forcing.
Additionally, this modular design will make it easy for people who are skilled in optimization to create optimized clients - and it will be easy to compare compute clients to one another, as they will be taking part in the same tasks.
I intend to create this as an open source project, and document it well so expanding the project is easy.
Thoughts? Feedback? Suggestions?