Skip to content

Short-circuit the check-hooks-apply hook #2935

Description

@mxr

search you tried in the issue tracker

check-hooks-apply

describe your actual problem

I was looking at the implementation of check_hooks_apply and noticed that while the filenames_for_hook function returns multiple matching files, the caller is only interested in whether the tuple is empty or not. I was curious if short-circuiting would have any benefit:

I prototyped a function Classifier.any_filenames_for_hook which short-circuits:

	def any_filenames_for_hook(self, hook: Hook) -> bool:
	    include_re, exclude_re = re.compile(hook.files), re.compile(hook.exclude)
	    types, types_or, exclude_types = (
	        frozenset(hook.types),
	        frozenset(hook.types_or),
	        frozenset(hook.exclude_types),
	    )
	
	    for i, name in enumerate(self.filenames):
	        if include_re.search(name) and not exclude_re.search(name):
	            return True
	
	        tags = self._types_for_file(name)
	        if (
	            tags >= types
	            and (not types or tags & types_or)
	            and not tags & exclude_types
	        ):
	            return True
	
	    return False
     for hook in all_hooks(config, Store()):
         if hook.always_run or hook.language == 'fail':
             continue
-        elif not classifier.filenames_for_hook(hook):
+        elif not classifier.any_filenames_for_hook(hook):
             print(f'{hook.id} does not apply to this repository')
             retv = 1

Just to try it out, I applied this change in my pre-commit install directory and profiled with hyperfine on a local repo. The repo has about 12000 files. The short-circuit version ran about 20% faster. I also added logging and saw the short-circuit version cut out after ~26 files, whereas the existing version looks at all 12000.

Old

$ hyperfine 'pre-commit run check-hooks-apply --all-files'
Benchmark 1: pre-commit run check-hooks-apply --all-files
  Time (mean ± σ):      1.253 s ±  0.066 s    [User: 0.546 s, System: 0.695 s]
  Range (min … max):    1.193 s …  1.419 s    10 runs

New

$ hyperfine 'pre-commit run check-hooks-apply --all-files'
Benchmark 1: pre-commit run check-hooks-apply --all-files
  Time (mean ± σ):     969.9 ms ±  65.6 ms    [User: 429.0 ms, System: 535.0 ms]
  Range (min … max):   873.1 ms … 1067.6 ms    10 runs

pre-commit --version

3.2.2

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions