Skip to content

Revised sync mode WebClient/RTMClient to address concurrency issues#662

Merged
seratch merged 11 commits into
slackapi:masterfrom
seratch:new-sync-mode
May 15, 2020
Merged

Revised sync mode WebClient/RTMClient to address concurrency issues#662
seratch merged 11 commits into
slackapi:masterfrom
seratch:new-sync-mode

Conversation

@seratch

@seratch seratch commented Apr 27, 2020

Copy link
Copy Markdown
Contributor

Summary

WebClient and RTMClient with run_async=False have been having many issues such as #497 #530 #569 #630 #631 #633 #645 . This pull request fixes the following issues by revising the internals of WebClient and RTMClient when run_async=False.

The revised WebClient never relies on aiohttp when run_async=False (the default). In the case, the API client simply sends HTTP requests utilizing the Python standard APIs (urllib). If a user would like to fall back to the previous behavior using aiohttp in a blocking way, it's still possible to use it by setting use_sync_aiohttp=True in addition to run_async=False. But I strongly recommend switching to the new one.

RTMClient still tightly depends on asyncio for WebSocket management. Some error handling issues #558 #611 #522 are still unfixed. I'll address those separately.

As I mentioned above, #558 #611 #522 are outside of the scope of this pull request. They may be fixed in the forthcoming pull requests.

Requirements (place an x in each [ ])

@seratch seratch added Priority: High Version: 2x bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented semver:minor area:concurrency Issues and PRs related to concurrency rtm-client web-client labels Apr 27, 2020
@seratch seratch self-assigned this Apr 27, 2020
@codecov

codecov Bot commented Apr 27, 2020

Copy link
Copy Markdown

Codecov Report

Merging #662 into master will decrease coverage by 0.91%.
The diff coverage is 80.75%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master     #662      +/-   ##
==========================================
- Coverage   86.19%   85.28%   -0.92%     
==========================================
  Files          17       17              
  Lines        2413     2568     +155     
  Branches      198      237      +39     
==========================================
+ Hits         2080     2190     +110     
- Misses        262      284      +22     
- Partials       71       94      +23     
Impacted Files Coverage Δ
slack/web/__init__.py 100.00% <ø> (+40.90%) ⬆️
slack/web/base_client.py 75.63% <77.51%> (-5.04%) ⬇️
slack/web/slack_response.py 98.00% <87.50%> (-2.00%) ⬇️
slack/rtm/client.py 83.33% <94.44%> (+0.16%) ⬆️
slack/web/client.py 95.22% <0.00%> (-1.47%) ⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update af79b19...227f949. Read the comment docs.

Comment thread slack/rtm/client.py
callback, rtm_client=self, web_client=web_client, data=data
)

while future.running():

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removing this part addresses #569

@RTMClient.run_on(event="message")
# even though run_async=False, handlers for RTM events can be a coroutine
async def send_reply(**payload):
def send_reply(**payload):

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

coroutines no longer work when run_async=False. I think it's much more valid.

self.web_client = WebClient(
token=self.bot_token,
run_async=False,
loop=asyncio.new_event_loop(), # TODO: this doesn't work without this

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unnecessary as run_async=False no longer uses an event loop internally



# This doesn't work
# Fixed in 2.6.0: This doesn't work

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WebClient w/ run_async=False is now thread-safe.

Comment thread slack/rtm/client.py Outdated
):
self.token = token.strip()
self.run_async = run_async
self.thread_pool_executor = ThreadPoolExecutor(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is not so critical but it's just a minor improvement

Comment thread slack/web/__init__.py Outdated
import slack.version as ver


def get_user_agent():

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extracted to reuse in UrllibWebClient - the method needs to be outside the BaseClient to avoid circular import issues.

Comment thread slack/web/base_client.py
# Using this is no longer recommended - just keep this for backward-compatibility
return self._event_loop.run_until_complete(future)
else:
return self._sync_send(api_url=api_url, req_args=req_args)

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is the new way

Comment thread slack/web/slack_response.py Outdated
"Use WebClient with run_async=False and use_sync_aiohttp=False."
)
raise e.SlackRequestError(msg)
response = self._client._sync_request(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this method is not a coroutine, using sync client also for run_async=True clients. Regarding run_async=True, it works anyways but we can revisit this to make it completely non-blocking in the future.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The changes here fixes #626

Comment thread slack/web/urllib_client.py Outdated
self.default_headers = default_headers
self.web_client = web_client

def api_call(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's also possible to use this method for any API calls. As described in slack_response.py, pagination iterator doesn't work when directly using this class. To use the feature, developers should use WebClient with run_async=False. The reason I gave up supporting the interaction with this class is circular import issues with BaseClient.

Comment thread tests/helpers.py
"status_code": 200,
}
coro.return_value = SlackResponse(**data)
corofunc = Mock(name="mock_rtm_response", side_effect=asyncio.coroutine(coro))

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I removed some existing mock utilities depending on asyncio. The dependency caused the difficulties for detecting potential concurrency issues when run_async=False.

@seratch

seratch commented Apr 28, 2020

Copy link
Copy Markdown
Contributor Author

I've merged a fix for #650 in this pull request.

@seratch seratch mentioned this pull request Apr 28, 2020
9 tasks
self.fail("Raising an error here was expected")
except Exception as e:
self.assertEqual(str(e), "The server responded with: {'ok': False, 'error': 'invalid_auth'}")
self.assertEqual(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#662 fixes both #530 and #613

Comment thread slack/rtm/client.py Outdated
)
else:
self._execute_in_thread(callback, data)
await self._execute_in_thread(

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this change makes handling errors consistent for both run_async=True and False. This addresses both #530 and #613

@juan-vg

juan-vg commented Apr 29, 2020

Copy link
Copy Markdown

Looks good at high level 👍

@aoberoi aoberoi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@stevengill and i did a collaborative review, and i'm submitting it for the both of us. we didn't spend any time looking through the tests, but we did look at all of the implementation.

@seratch this looks like a great step towards resolving many of our concurrency issues. so excited to see this land! there are a few questions and comments in here that i think would be important to address before we land/release this change.

Comment thread slack/rtm/client.py

if inspect.iscoroutinefunction(callback):
if self.run_async or inspect.iscoroutinefunction(callback):
await callback(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if i understand this correctly, when run_async=True but the callback is not a coroutine, then the callback will be invoked with the await keyword. This seems to be a problem (ref):

This means that synchronous and asynchronous functions/callables are different types - you can't just mix and match them. Try to await a sync function and you'll see Python complain, forget to await an async function and you'll get back a coroutine object rather than the result you wanted.

Maybe we should change the or to an and, and also have another case for the situation I described above that throws an explicit error. Or maybe this isn't necessary because we expect developers to understand the runtime error (complaining as the author above put it) and know how to deal with it. IMHO having our own explicit error with a readable description would be easier to debug.

Another solution could be that we want to just call callback without await if we detect that its not a coroutine, no matter what run_async is set to. What do you think about this?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a great catch. In future major releases, we may be able to clearly say "if you go with run_sync=True, all callbacks must be coroutines." but it's not that timing when we release a minor version. Also, we don't need to have this change to resolve the existing concurrency issues.

I will just revert this change.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

reverted by d9aa238

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

with this change reverted, when run_async=False the callback will still be invoked as a coroutine (using the await). that will also lead to problems. it seems like the way to fix this would be to detect when run_async=True and !inspect.iscoroutinefunction(callback) to throw an error.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, letting developers notice doing wrong seems a nice addition. I've added some tests and added an error in 227f949

Comment thread slack/rtm/client.py Outdated
Comment thread slack/rtm/client.py Outdated
)
else:
self._execute_in_thread(callback, data)
await self._execute_in_thread(

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From my understanding, we always block on the return of the callback when run_async=False. If that is the case, why are we using the ThreadPoolExecutor to invoke the callback on another thread? It seems that the executor will only have one worker/thread at any given time (since we always block on their completion within _execute_on_thread()). The same behavior could be accomplished by simply running the callback on the current thread, right?

Is there something about the consistency of _dispatch_event() that changes when we don't await on anything (by calling callback() synchronously on the same thread)?

Comment thread slack/rtm/client.py Outdated
Comment thread slack/rtm/client.py Outdated
Comment thread slack/web/urllib_client.py Outdated
Comment thread slack/web/urllib_client.py Outdated
Comment thread slack/web/slack_response.py Outdated
Comment thread slack/web/base_client.py Outdated
Comment thread slack/web/base_client.py Outdated

# If you see the following errors with #stop() method calls, call `RTMClient#async_stop()` instead
#
# /python3.8/asyncio/base_events.py:641:

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unrelated to the code review suggestions. The tests have been passed but I overlooked this warning for two test cases.

Comment thread slack/rtm/client.py
if (
self.auto_reconnect
and not self._stopped
and error_code != "invalid_auth" # "invalid_auth" is unrecoverable

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By correcting the behavior of run_async=False descried in #530 and #613 , the RTMClient started doing exponential retries with unrecoverable errors. This condition is added to prevent it for the cases with invalid tokens.

@seratch

seratch commented May 13, 2020

Copy link
Copy Markdown
Contributor Author

@aoberoi @stevengill Thanks for your insightful review. I've updated this pull request and now it's ready for view again.

seratch added 10 commits May 14, 2020 13:30
* #530 Fixed by changing _execute_in_thread to be a coroutine
* #569 Resolved by removing a blocking loop (while future.running())
* #645 WebClient(run_async=False) no longer depends on asyncio by default
* #633 WebClient(run_async=False) doesn't internally depend on aiohttp
* #631 When run_async=True, RTM listner can be a normal function and WebClient is free from the event loop
* #630 WebClient no longer depends on aiohttp when run_async=False
* #497 Fixed when run_async=False / can be closed as we don't support run_async=True for this use case (in Flask)
* Get rid of thread pool executor as we no longer need threads internally
* Add async_stop() method for safer termination of RTMClient for the cases having unexpected exceptions in callbacks
* Revert the behavior of run_async=True to allow using non-async methods
* Simplify the Retry-After header value extraction code
* Merge UrllibWebClient's functionalities into BaseClient not to increase unnecesesary complexity such as circular import issues
* Call show_2020_01_deprecation() only once
* Test if values are dict and they're empty
* Rename _sync_request to _request_for_pagination to be clearer
@seratch

seratch commented May 14, 2020

Copy link
Copy Markdown
Contributor Author

I've rebased this branch on the latest master branch. It's ready to merge once I get reviewers' approvals.

@seratch seratch mentioned this pull request May 14, 2020
2 tasks

@aoberoi aoberoi left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just one comment that needs attention here: #662 (comment).

I don't think its critical, but probably worth looking at once more. Approved!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:concurrency Issues and PRs related to concurrency bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented rtm-client semver:minor Version: 2x web-client

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants