fix: localmode subprocess parent process not sending SIGTERM to child#2613
Conversation
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
268357f to
7b07b90
Compare
|
@ahsan-z-khan This PR doesn't contain and new dependencies |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
|
@shreyapandit Hi! can you have a look if the failing tests are relevant to the PR? And assign someone to review it? |
|
@gilinachum any updates? |
Sorry about that. Checking again :\ |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
|
|
||
| def down(self): | ||
| """Placeholder docstring""" | ||
| if os.name != "nt": |
There was a problem hiding this comment.
what is this check for?
There was a problem hiding this comment.
It's checking if the os is a windows machine, since I'm using unix commands to kill the processes
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
|
Hi @ranzvi, thanks for your contribution! Please follow the one time setup for enabling git hooks, and this should allow you to reproduce the errors, and fix the tests cc @ranzvi @gilinachum. The unit tests are failing on linting issues. |
shreyapandit
left a comment
There was a problem hiding this comment.
As outlined in comments
…zvi/sagemaker-python-sdk into fix-localmode-subprocess-termination
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
AWS CodeBuild CI Report
Powered by github-codebuild-logs, available on the AWS Serverless Application Repository |
|
@shreyapandit Thanks for the insights. I fixed the linting issues and executed the unittest locally and they pass. Do you have any idea why the rest of the ci is still failing? |
|
@ranzvi I am restarting your tests |
shreyapandit
left a comment
There was a problem hiding this comment.
LGTM! Thanks for your contribution
…aws#2613) Co-authored-by: Shreya Pandit <[email protected]> Co-authored-by: Jeniya Tabassum <[email protected]>
|
@ranzvi did this fix your issue? Interestingly, this new code introduces an error log for us during Commenting out these lines fixes our issue. For us, I'd be interesting in potentially rolling back these changes if that was OK with you. |
|
@wcarpenter1-godaddy This did fix our issue, we had a problem with the |
|
@ranzvi what os are you on? |
|
@wcarpenter1-godaddy MacOS, but this works on |
Issue #, if available:
Description of changes:
Due to an unknown issue, on some machines executing
predictor.delete_endpoint()simply hangs.This is because the
SIGTERMsent to the subprocess doesn't propagate to the child process.E.g pid
7521doesn't propagateSIGTERMto child7522.This means that
self.container.join()is waiting for a container which hasn't received a signal to stop execution and hangs indefinitely.By sending
SIGTERMto all child processes of thesubprocess.Popenprocess, we can guarantee that the container will be Gracefully stopped...Testing done:
Executed all unit and integration tests
Verified behaviour remains the same on machines without the bug
Verified bug fixed on faulty machines by executing multiple endpoints sequentially
Testing done:
Merge Checklist
Put an
xin the boxes that apply. You can also fill these out after creating the PR. If you're unsure about any of them, don't hesitate to ask. We're here to help! This is simply a reminder of what we are going to look for before merging your pull request.General
Tests
unique_name_from_baseto create resource names in integ tests (if appropriate)By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.