benchmark

Benchmark

As a reference, the performance of dirhash is benchmarked against the shell command:

find path/to/folder -type f -print0 | sort -z | xargs -0 md5 | md5

(top answer for the SO-question: Linux: compute a single hash for a given folder & contents?)

Each test case contains 1 GiB of random data, split equally into 8, 1k or 32k files, in a flat or nested (binary tree of depth 8) structure.

For a fair comparison, the CLI version of dirhash was used (including startup time for loading of python modules etc.).

For full details/reproducibility see/run the run.py script for which the output is found in results.csv. These results were generated on a MacBook Pro (2018):

2,2 GHz Intel Core i7 (sysctl -n hw.physicalcpu hw.logicalcpu-> 6, 12)
16 GB 2400 MHz DDR4
APPLE SSD AP0512M

Sample results:

Implementation	Test Case	Time (s)	Speed up
shell reference	flat_1k_1MB	2.29	-> 1.0
`dirhash`	flat_1k_1MB	1.67	1.36
`dirhash`(8 workers)	flat_1k_1MB	0.48	4.73
shell reference	nested_32k_32kB	6.82	-> 1.0
`dirhash`	nested_32k_32kB	3.43	2.00
`dirhash`(8 workers)	nested_32k_32kB	1.14	6.00

Name		Name	Last commit message	Last commit date
parent directory ..
README.md		README.md
results.csv		results.csv
results.json		results.json
results_v0.2.0.csv		results_v0.2.0.csv
results_v0.2.0.json		results_v0.2.0.json
run.py		run.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Benchmark

Sample results:

FilesExpand file tree

benchmark

Directory actions

More options

Directory actions

More options

Latest commit

History

benchmark

Folders and files

parent directory

README.md

Benchmark

Sample results: