LM-opencl benchmark much slower than actual cracking

Default benchmark:

```
$ john -te -form=lm-opencl
Device 1: Tesla V100-SXM2-16GB
Benchmarking: LM-opencl [DES BS OpenCL/mask accel]... LWS=128 GWS=131072 DONE
Raw:    6224M c/s real, 6025M c/s virtual
```

Different mask:

```
$ john -te -form=lm-opencl -mask='?a?a?a?a?a?a?a'
Device 1: Tesla V100-SXM2-16GB
Benchmarking: LM-opencl (length 7) [DES BS OpenCL/mask accel]... LWS=128 GWS=524288 DONE
Raw:    8057M c/s real, 7455M c/s virtual
```

Also longer benchmark (didn't make a difference):

```
$ john -te=60 -form=lm-opencl -mask='?a?a?a?a?a?a?a'
Device 1: Tesla V100-SXM2-16GB
Benchmarking: LM-opencl (length 7) [DES BS OpenCL/mask accel]... LWS=128 GWS=524288 DONE
Raw:    8041M c/s real, 7480M c/s virtual
```

Actual cracking:

```
$ john sample-hashes-windows -form=lm-opencl -mask='?a' -min-len=7 -max-len=7
Device 1: Tesla V100-SXM2-16GB
Using default input encoding: UTF-8
Using default target encoding: CP850
Loaded 2996 password hashes with no different salts (LM-opencl [DES BS OpenCL])
Remaining 254 password hashes with no different salts
LWS=128 GWS=524288
Press 'q' or Ctrl-C to abort, almost any other key for status
0g 0:00:00:10 0.20% (ETA: 16:47:27) 0g/s 1497Mp/s 1497Mc/s 11068046TC/s AAY=-0A
0g 0:00:00:20 2.11% (ETA: 15:40:23) 0g/s 7862Mp/s 7862Mc/s 2767010TC/s AA?/_FE
0g 0:00:00:31 4.22% (ETA: 15:36:49) 0g/s 10145Mp/s 10145Mc/s 1190110TC/s AAZ4R^1
0g 0:00:00:43 6.50% (ETA: 15:35:37) 0g/s 11261Mp/s 11261Mc/s 9437866TC/s AAL!@ZO
0g 0:00:00:51 8.05% (ETA: 15:35:09) 0g/s 11746Mp/s 11746Mc/s 13021228TC/s AAV01!N
0g 0:00:01:00 9.75% (ETA: 15:34:51) 0g/s 12106Mp/s 12106Mc/s 15679730TC/s AA$UJ=R
0g 0:00:01:12 12.03% (ETA: 15:34:34) 0g/s 12446Mp/s 12446Mc/s 18190537TC/s AA5*>4S
Session aborted
```

So even when comparing against 254 loaded hashes, we got much better speed than what the benchmark got with the same mask after running for the same time. (Somehow the speed was poor early on, and it kept growing. In fact, the average speed would be even higher for a longer run.)

Checking `nvidia-smi`, I see that GPU utilization is somewhat low during actual cracking (around 75%) and even lower during benchmark (after the auto-tuning is complete, it nevertheless fluctuates between 0% and 80%, with average perhaps around 40%).

The lower GPU utilization during benchmark explains the speed difference, but I am puzzled why the utilization is lower. We could also look into and improve GPU utilization during actual cracking, and switch to a more suitable default mask for benchmarks.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LM-opencl benchmark much slower than actual cracking #4381

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

LM-opencl benchmark much slower than actual cracking #4381

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions