chore: remove transformers cap and raise training floor#3264
chore: remove transformers cap and raise training floor#3264mergify[bot] merged 2 commits intoinstructlab:mainfrom
Conversation
8f3e19e to
4b3cd03
Compare
booxter
left a comment
There was a problem hiding this comment.
Please include transformers cap removal.
4b3cd03 to
b3beb76
Compare
|
@booxter updated and also removed the comment above it |
|
More issues in training library with the transformers?.. |
|
@booxter It looks like the issue is the new transformers version changed their API. You can see in the current implementation of the full train pipeline, it's getting the output by calling So the solution here is to either change the logic to just read the model output as a tuple or rename the parameter. Long-term, we need to just move this function to live in the training library instead so we aren't duplicating this logic. |
a369140 to
842cb12
Compare
842cb12 to
fb5691c
Compare
fb5691c to
bfbe624
Compare
bfbe624 to
dc95760
Compare
6dd265d to
6aef684
Compare
6aef684 to
5abdb89
Compare
5abdb89 to
9237ee4
Compare
9237ee4 to
0199d37
Compare
0199d37 to
19f7074
Compare
19f7074 to
8b19f09
Compare
Signed-off-by: Jaideep Rao <[email protected]>
8b19f09 to
af2995e
Compare
bc9ec6a to
fe4f715
Compare
Currently full training relies on some APIs from instructlab/training which arent actually helping with CPU legacy training. This commit resolves that by moving the necessary function needed for loss correction when training with gradient accumulation to live inside of the full train function, since it is only using a reduced amount of instructlab/instructlab capability. Signed-off-by: Oleg Silkin <[email protected]>
| student_model_arch = get_model_arch(pathlib.Path(params["model_path"])) | ||
| if ctx.obj.config.general.use_legacy_tmpl: | ||
| train_args.use_legacy_tmpl = True | ||
| else: |
There was a problem hiding this comment.
does this change mean that legacy tokenizer config won't be detected anymore?
There was a problem hiding this comment.
We don't need the change or we don't need support for legacy auto-detection? And is this change related to transformers cap removal / training dependency update?
| loss = None | ||
| if isinstance(output, tuple): | ||
| loss = output[0] | ||
| if len(output[0].shape) != 0: |
There was a problem hiding this comment.
nit: if len(loss.shape) != 0? (Or even if loss.shape?)
| loss = output.loss | ||
| if loss is None: | ||
| raise ValueError( | ||
| "Loss is None. Ensure the model's output contains a valid loss." |
There was a problem hiding this comment.
These errors will bubble up to users through CLI error. I am not sure they have the context to interpret the errors meaningfully. As a user, how do I "ensure the model's output contains a valid loss"? Are you asking the user to check their model inputs / config perhaps? If so, this is what should be communicated.
There was a problem hiding this comment.
This applies to other errors here I think?
There was a problem hiding this comment.
@booxter This shouldn't be happening, so if this happens then we want users to report this to us. They are also welcome to look at the internals of the CLI and potentially contribute back. We shouldn't assume that our users are incapable of being technical.
There was a problem hiding this comment.
I see. So it's always a bug in the code and not the inputs from the user? If so, the suggestion to "ensure the model's output" seems misplaced and we may instead want to direct the user to report the issue. (And maybe dump some more info to include with the report?)
There was a problem hiding this comment.
It could be a few things, the model implementation could have changed or the transformers API itself could have changed (as in the case here). So when that happens we just have this as a safeguard
removes the temporary cap placed on the transformers library and bumps training to >=v0.8.1
Checklist:
conventional commits.