search you tried in the issue tracker
check-hooks-apply
describe your actual problem
I was looking at the implementation of check_hooks_apply and noticed that while the filenames_for_hook function returns multiple matching files, the caller is only interested in whether the tuple is empty or not. I was curious if short-circuiting would have any benefit:
I prototyped a function Classifier.any_filenames_for_hook which short-circuits:
def any_filenames_for_hook(self, hook: Hook) -> bool:
include_re, exclude_re = re.compile(hook.files), re.compile(hook.exclude)
types, types_or, exclude_types = (
frozenset(hook.types),
frozenset(hook.types_or),
frozenset(hook.exclude_types),
)
for i, name in enumerate(self.filenames):
if include_re.search(name) and not exclude_re.search(name):
return True
tags = self._types_for_file(name)
if (
tags >= types
and (not types or tags & types_or)
and not tags & exclude_types
):
return True
return False
for hook in all_hooks(config, Store()):
if hook.always_run or hook.language == 'fail':
continue
- elif not classifier.filenames_for_hook(hook):
+ elif not classifier.any_filenames_for_hook(hook):
print(f'{hook.id} does not apply to this repository')
retv = 1
Just to try it out, I applied this change in my pre-commit install directory and profiled with hyperfine on a local repo. The repo has about 12000 files. The short-circuit version ran about 20% faster. I also added logging and saw the short-circuit version cut out after ~26 files, whereas the existing version looks at all 12000.
Old
$ hyperfine 'pre-commit run check-hooks-apply --all-files'
Benchmark 1: pre-commit run check-hooks-apply --all-files
Time (mean ± σ): 1.253 s ± 0.066 s [User: 0.546 s, System: 0.695 s]
Range (min … max): 1.193 s … 1.419 s 10 runs
New
$ hyperfine 'pre-commit run check-hooks-apply --all-files'
Benchmark 1: pre-commit run check-hooks-apply --all-files
Time (mean ± σ): 969.9 ms ± 65.6 ms [User: 429.0 ms, System: 535.0 ms]
Range (min … max): 873.1 ms … 1067.6 ms 10 runs
pre-commit --version
3.2.2
search you tried in the issue tracker
check-hooks-apply
describe your actual problem
I was looking at the implementation of
check_hooks_applyand noticed that while thefilenames_for_hookfunction returns multiple matching files, the caller is only interested in whether the tuple is empty or not. I was curious if short-circuiting would have any benefit:I prototyped a function
Classifier.any_filenames_for_hookwhich short-circuits:for hook in all_hooks(config, Store()): if hook.always_run or hook.language == 'fail': continue - elif not classifier.filenames_for_hook(hook): + elif not classifier.any_filenames_for_hook(hook): print(f'{hook.id} does not apply to this repository') retv = 1Just to try it out, I applied this change in my pre-commit install directory and profiled with
hyperfineon a local repo. The repo has about 12000 files. The short-circuit version ran about 20% faster. I also added logging and saw the short-circuit version cut out after ~26 files, whereas the existing version looks at all 12000.Old
New
pre-commit --version
3.2.2