Page 1 of 1

Speed War :)

PostPosted: Thu Mar 15, 2012 1:03 pm
by Sc00bz
So I did some optimizations and yours is about 6.5% faster with a compute capability 1.1 card (probably same for all 1.x), 24% faster with a compute capability 2.1 card, and probably somewhere in between for compute capability 2.0.

md5_loweralpha-numeric#1-7
Code: Select all
9800 GTX+ (128 cores, 1836 MHz, compute capability 1.1)
  330 MLinks/sec 1.065x CryptoHaze (generation)
  310 MLinks/sec 1.00x  mine (generation)
  240 MLinks/sec 0.77x  rcrack's GPU version (pre-work 100k)
   82 MLinks/sec 0.26x  rcracki_mt 0.7b (pre-work 100k)

GTS 450 (192 cores, 1566 MHz, compute capability 2.1)
  360 MLinks/sec 1.24x CryptoHaze (generation)
  290 MLinks/sec 1.00x mine (generation)
  200 MLinks/sec 0.69x rcrack's GPU version (pre-work 100k)
   91 MLinks/sec 0.31x rcracki_mt 0.7b (pre-work 100k)

Re: Speed War :)

PostPosted: Thu Mar 15, 2012 3:52 pm
by Bitweasil
What algorithm are you using for your reduction function, since this is the main factor to consider?

//EDIT: And Atom has apparently gotten into a speed war with me after I proved I could outrun hashcat in multihash brute forcing on nVidia.

Re: Speed War :)

PostPosted: Fri Mar 16, 2012 4:22 am
by Sc00bz
This is the standard rcrack method. I guess it was more apparent in the context when I posted it on FRT because I had it next to CPU benchmarks.

This is a single 32 bit thread of a 2.5GHz Q9300.
It looks like the winner is "divcfl-3" for the CPU version:
Code: Select all
  10.24 MLinks/sec  md5_loweralpha-numeric#1-6
   9.35 MLinks/sec  md5_alpha-space#1-9
   9.87 MLinks/sec  md5_loweralpha#1-10
   9.54 MLinks/sec  md5_loweralpha-numeric-space#1-8
   9.56 MLinks/sec  md5_loweralpha-numeric-space#1-9
   9.70 MLinks/sec  md5_loweralpha-numeric-symbol32-space#1-7
  10.40 MLinks/sec  md5_loweralpha-numeric-symbol32-space#1-8
   9.32 MLinks/sec  md5_loweralpha-space#1-9
  10.37 MLinks/sec  md5_mixalpha-numeric#1-8
  10.28 MLinks/sec  md5_mixalpha-numeric-all-space#1-7
  10.69 MLinks/sec  md5_mixalpha-numeric-all-space#1-8
   9.89 MLinks/sec  md5_mixalpha-numeric-space#1-7
  10.38 MLinks/sec  md5_mixalpha-numeric-space#1-8
   8.95 MLinks/sec  md5_numeric#1-12
   9.03 MLinks/sec  md5_numeric#1-14
   8.97 MLinks/sec  md5_hybrid3(omni6.txt)#0-0
   8.70 MLinks/sec  md5_hybrid3(omni7.txt)#0-0

Re: Speed War :)

PostPosted: Fri Mar 16, 2012 4:27 am
by Bitweasil
Damn. Using the multiply to divide trick?

Re: Speed War :)

PostPosted: Fri Mar 16, 2012 7:31 am
by Sc00bz
Yes. I'm so glad I was too lazy to finish and properly test the fixed point multiply reduction function I came up with. I basically couldn't decide if I should do 32 bit or 24 bit multiply. Now the difference in speed is probably negligible but FPM is less uniformly distributed (and has problems with small sub key spaces compared to the total key space which is why I dropped 1-4 password lengths).

Re: Speed War :)

PostPosted: Thu Apr 05, 2012 5:03 am
by Sc00bz
ARTGen 0.1a

GTS 450:
273 MLinks/second for md5_mixalpha-numeric#1-9
282 MLinks/second for md5_loweralpha-numeric#1-7

Too lazy too swap out GPU for 9800 GTX+. Also If you use a 1.x compute capability card you should recompile it without defining USE___fmul_rd since it should be faster. On that note does anyone know how to tell at compile time which compute capability a .cu file is being compiled for.

Re: Speed War :)

PostPosted: Fri Apr 25, 2014 6:25 am
by TheLostMind
Sc00bz wrote:ARTGen 0.1a

GTS 450:
273 MLinks/second for md5_mixalpha-numeric#1-9
282 MLinks/second for md5_loweralpha-numeric#1-7

Too lazy too swap out GPU for 9800 GTX+. Also If you use a 1.x compute capability card you should recompile it without defining USE___fmul_rd since it should be faster. On that note does anyone know how to tell at compile time which compute capability a .cu file is being compiled for.


Your program is well but have bugs :

artgen rt MD5 div alpha-numeric#8-8 10 48000 1048576 0 .\ test 0 0
pause

Error: gpudivsb.cpp(468) : getLastCudaError() CUDA error : Failed kernel launch.
: (7) too many resources requested for launch.


if you can fix this ,it will be a good rt table generator .