multiprocessing

Collection of parallelizable examples

Feature counts

Example building a dict that contains word counts. This example showcases three different approaches found in

feature_counts_dict_serial.py
feature_counts_dict_joblib_parallel_and_reduce.py
feature_counts_dict_joblib_parallel_and_reduce_in_chunks.py

Serial version (bottlenecked by the completly serial operation)

python feature_counts_dict_serial.py

num docs = 1131400

time overall  117.3199 sec

len(vocabulary.items())---> 130107
(vocabulary['from'], vocabulary['gift'])---> (2267000, 6600)

Parallel version working one element at a time (bottlenecked by the reduce step)

python feature_counts_dict_joblib_parallel_and_reduce.py

num docs = 1131400

time build vocabularies  42.0592 sec
time aggregate vocabularies  35.4708 sec
time overall  77.5302 sec

len(partial_vocabularies)---> 1131400
len(vocabulary.items())---> 130107
(vocabulary['from'], vocabulary['gift'])---> (2267000, 6600)

Parallel version working in minibatches

Note this implementation is not bottlenecked in the parallel part, it has an irrelevant bottlenecked in the reduce step.

python feature_counts_dict_joblib_parallel_and_reduce_in_chunks.py

num docs = 1131400

time build vocabularies  26.7191 sec
time aggregate vocabularies  0.236 sec
time overall  26.9552 sec

len(partial_vocabularies)---> 12
len(vocabulary.items())---> 130107
(vocabulary['from'], vocabulary['gift'])---> (2267000, 6600)

Name		Name	Last commit message	Last commit date
parent directory ..
other_tests		other_tests
README.md		README.md
feature_counts_dict_joblib_parallel_and_reduce.py		feature_counts_dict_joblib_parallel_and_reduce.py
feature_counts_dict_joblib_parallel_and_reduce_in_chunks.py		feature_counts_dict_joblib_parallel_and_reduce_in_chunks.py
feature_counts_dict_serial.py		feature_counts_dict_serial.py
utils.py		utils.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

Collection of parallelizable examples

Feature counts

Serial version (bottlenecked by the completly serial operation)

Parallel version working one element at a time (bottlenecked by the reduce step)

Parallel version working in minibatches

FilesExpand file tree

multiprocessing

Directory actions

More options

Directory actions

More options

Latest commit

History

multiprocessing

Folders and files

parent directory

README.md

Collection of parallelizable examples

Feature counts

Serial version (bottlenecked by the completly serial operation)

Parallel version working one element at a time (bottlenecked by the reduce step)

Parallel version working in minibatches