Make SDG batch size configurable via system profile#3157
Make SDG batch size configurable via system profile#3157mergify[bot] merged 1 commit intoinstructlab:mainfrom
Conversation
cdoern
left a comment
There was a problem hiding this comment.
looks good! Technically, if you are using the default value you don't need to add this to all of the profiles manually. One comment about the default in the click flag.
| default=None, | ||
| help="Number of elements to process in each batch through the SDG pipeline. Enabled by default for the vLLM serving backend, with a batch size of 8 chosen based on experiments to optimize for throughput. Use 0 to disable.", | ||
| cls=clickext.ConfigOption, | ||
| default=DEFAULTS.BATCH_SIZE, |
There was a problem hiding this comment.
the default comes from the config when you set cls=... so no default is necessary here.
There was a problem hiding this comment.
Thanks @cdoern, hmm I think I am missing something. If I run ilab data generate --debug-params. I get all the values of the parameter correct except for the batch size for which I get
batch_size: None [type: None, src: default] (if I remove the default)
576aa76 to
f4e8c50
Compare
| default=None, | ||
| help="Number of elements to process in each batch through the SDG pipeline. Enabled by default for the vLLM serving backend, with a batch size of 8 chosen based on experiments to optimize for throughput. Use 0 to disable.", | ||
| cls=clickext.ConfigOption, | ||
| default=DEFAULTS.BATCH_SIZE, |
There was a problem hiding this comment.
| default=DEFAULTS.BATCH_SIZE, |
removing this line and re-installing I get the following:
ilab data generate --debug-params
Parameters:
model_path: '/Users/charliedoern/.cache/instructlab/models/mistral-7b-instruct-v0.2.Q4_K_M.gguf' [type: str, src: default_map]
num_cpus: 10 [type: int, src: default_map]
chunk_word_count: 1000 [type: int, src: default_map]
num_instructions: -1 [type: int, src: default]
sdg_scale_factor: 30 [type: int, src: default_map]
taxonomy_path: '/Users/charliedoern/.local/share/instructlab/taxonomy' [type: str, src: default_map]
taxonomy_base: 'origin/main' [type: str, src: default_map]
output_dir: '/Users/charliedoern/.local/share/instructlab/datasets' [type: str, src: default_map]
quiet: False [type: bool, src: default]
endpoint_url: None [type: None, src: default]
api_key: 'no_api_key' [type: str, src: default]
yaml_rules: None [type: None, src: default]
server_ctx_size: 4096 [type: int, src: default]
tls_insecure: False [type: bool, src: default]
tls_client_cert: '' [type: str, src: default]
tls_client_key: '' [type: str, src: default]
tls_client_passwd: '' [type: str, src: default]
model_family: None [type: None, src: default]
pipeline: 'full' [type: str, src: default_map]
batch_size: 8 [type: int, src: default_map]
enable_serving_output: False [type: bool, src: default]
gpus: None [type: None, src: default_map]
max_num_tokens: 4096 [type: int, src: default_map]
detached: False [type: bool, src: default]
There was a problem hiding this comment.
Thanks it indeed worked after I reinstalled it
|
@Mergifyio rebase |
✅ Branch has been successfully rebased |
f4e8c50 to
bb2eb90
Compare
Signed-off-by: Nikhil Palaskar <[email protected]>
bb2eb90 to
4a309ce
Compare
|
@mergify backport release-v0.24 |
✅ Backports have been createdDetails
|
|
@mergify backport release-v0.23 |
✅ Backports have been createdDetails
|
…-3157 Make SDG batch size configurable via system profile (backport #3157)
…3207) Currently, the batch size for SDG is only configurable via the CLI, but a single batch size across all hardware profiles is not optimal. Different hardware configurations have varying capabilities, and using a fixed batch size can lead to under-utilization or over-utilization of resources during the SDG process. To ensure efficient performance across different hardware, we should set the batch sizes independently in each system profile. <hr>This is an automatic backport of pull request #3157 done by [Mergify](https://mergify.com). Approved-by: courtneypacheco Approved-by: ktdreyer
Currently, the batch size for SDG is only configurable via the CLI, but a single batch size across all hardware profiles is not optimal. Different hardware configurations have varying capabilities, and using a fixed batch size can lead to under-utilization or over-utilization of resources during the SDG process.
To ensure efficient performance across different hardware, we should set the batch sizes independently in each system profile.