Revised sync mode WebClient/RTMClient to address concurrency issues by seratch · Pull Request #662 · slackapi/python-slack-sdk

seratch · 2020-04-27T15:03:46Z

Summary

WebClient and RTMClient with run_async=False have been having many issues such as #497 #530 #569 #630 #631 #633 #645 . This pull request fixes the following issues by revising the internals of WebClient and RTMClient when run_async=False.

The revised WebClient never relies on aiohttp when run_async=False (the default). In the case, the API client simply sends HTTP requests utilizing the Python standard APIs (urllib). If a user would like to fall back to the previous behavior using aiohttp in a blocking way, it's still possible to use it by setting use_sync_aiohttp=True in addition to run_async=False. But I strongly recommend switching to the new one.

RTMClient still tightly depends on asyncio for WebSocket management. Some error handling issues #558 #611 #522 are still unfixed. I'll address those separately.

Sync client swallows auth errors #530 Fixed by changing _execute_in_thread to be a coroutine
Python RTMClient Causes 100% CPU #569 Resolved by removing a blocking loop (while future.running())
Unclosed client session #645 WebClient(run_async=False) no longer depends on asyncio by default
Web client and rtm client should be fully split, and web client should not use asyncio #633 WebClient(run_async=False) doesn't internally depend on aiohttp
AsyncIO loop is not being shared from RTMClient to WebClient #631 When run_async=True, RTM listner can be a normal function and WebClient is free from the event loop
Getting concurrent.futures._base.TimeoutError while using channels_invite api method #630 WebClient no longer depends on aiohttp when run_async=False
How to make simultaneous Slack API calls without reinitializing the client? #497 Fixed when run_async=False, with this fix, the issue can be closed as we don't support run_async=True for this use case (in Flask)
asyncio RuntimeError stacktrace in SlackResponse, when getting paginated responses #626 Fixed by changing the internals of SlackResponse to always use UrllibWebClient.

As I mentioned above, #558 #611 #522 are outside of the scope of this pull request. They may be fixed in the forthcoming pull requests.

Requirements (place an `x` in each `[ ]`)

I've read and understood the Contributing Guidelines and have done my best effort to follow them.
I've read and agree to the Code of Conduct.

codecov · 2020-04-27T15:12:48Z

Codecov Report

Merging #662 into master will decrease coverage by 0.91%.
The diff coverage is 80.75%.

@@            Coverage Diff             @@
##           master     #662      +/-   ##
==========================================
- Coverage   86.19%   85.28%   -0.92%     
==========================================
  Files          17       17              
  Lines        2413     2568     +155     
  Branches      198      237      +39     
==========================================
+ Hits         2080     2190     +110     
- Misses        262      284      +22     
- Partials       71       94      +23

Impacted Files	Coverage Δ
slack/web/__init__.py	`100.00% <ø> (+40.90%)`	⬆️
slack/web/base_client.py	`75.63% <77.51%> (-5.04%)`	⬇️
slack/web/slack_response.py	`98.00% <87.50%> (-2.00%)`	⬇️
slack/rtm/client.py	`83.33% <94.44%> (+0.16%)`	⬆️
slack/web/client.py	`95.22% <0.00%> (-1.47%)`	⬇️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update af79b19...227f949. Read the comment docs.

seratch · 2020-04-27T15:05:50Z

-                callback, rtm_client=self, web_client=web_client, data=data
-            )
-
-            while future.running():


Removing this part addresses #569

seratch · 2020-04-27T15:07:16Z

        @RTMClient.run_on(event="message")
-        # even though run_async=False, handlers for RTM events can be a coroutine
-        async def send_reply(**payload):
+        def send_reply(**payload):


coroutines no longer work when run_async=False. I think it's much more valid.

seratch · 2020-04-27T15:07:46Z

            self.web_client = WebClient(
                token=self.bot_token,
                run_async=False,
-                loop=asyncio.new_event_loop(),  # TODO: this doesn't work without this


unnecessary as run_async=False no longer uses an event loop internally

seratch · 2020-04-27T15:08:19Z



-# This doesn't work
+# Fixed in 2.6.0: This doesn't work


WebClient w/ run_async=False is now thread-safe.

seratch · 2020-04-27T15:08:43Z

    ):
        self.token = token.strip()
        self.run_async = run_async
+        self.thread_pool_executor = ThreadPoolExecutor(


this is not so critical but it's just a minor improvement

seratch · 2020-04-27T15:12:20Z

+import slack.version as ver
+
+
+def get_user_agent():


Extracted to reuse in UrllibWebClient - the method needs to be outside the BaseClient to avoid circular import issues.

seratch · 2020-04-27T15:12:59Z

+                # Using this is no longer recommended - just keep this for backward-compatibility
+                return self._event_loop.run_until_complete(future)
+        else:
+            return self._sync_send(api_url=api_url, req_args=req_args)


this is the new way

seratch · 2020-04-27T15:16:04Z

+                        "Use WebClient with run_async=False and use_sync_aiohttp=False."
+                    )
+                    raise e.SlackRequestError(msg)
+                response = self._client._sync_request(


As this method is not a coroutine, using sync client also for run_async=True clients. Regarding run_async=True, it works anyways but we can revisit this to make it completely non-blocking in the future.

The changes here fixes #626

seratch · 2020-04-27T15:17:48Z

+        self.default_headers = default_headers
+        self.web_client = web_client
+
+    def api_call(


It's also possible to use this method for any API calls. As described in slack_response.py, pagination iterator doesn't work when directly using this class. To use the feature, developers should use WebClient with run_async=False. The reason I gave up supporting the interaction with this class is circular import issues with BaseClient.

seratch · 2020-04-27T15:23:05Z

-        "status_code": 200,
-    }
-    coro.return_value = SlackResponse(**data)
-    corofunc = Mock(name="mock_rtm_response", side_effect=asyncio.coroutine(coro))


I removed some existing mock utilities depending on asyncio. The dependency caused the difficulties for detecting potential concurrency issues when run_async=False.

seratch · 2020-04-28T06:20:15Z

I've merged a fix for #650 in this pull request.

seratch · 2020-04-28T09:10:52Z

            self.fail("Raising an error here was expected")
        except Exception as e:
-            self.assertEqual(str(e), "The server responded with: {'ok': False, 'error': 'invalid_auth'}")
+            self.assertEqual(


#662 fixes both #530 and #613

seratch · 2020-04-28T09:12:22Z

                    )
                else:
-                    self._execute_in_thread(callback, data)
+                    await self._execute_in_thread(


this change makes handling errors consistent for both run_async=True and False. This addresses both #530 and #613

juan-vg · 2020-04-29T13:57:29Z

Looks good at high level 👍

aoberoi

@stevengill and i did a collaborative review, and i'm submitting it for the both of us. we didn't spend any time looking through the tests, but we did look at all of the implementation.

@seratch this looks like a great step towards resolving many of our concurrency issues. so excited to see this land! there are a few questions and comments in here that i think would be important to address before we land/release this change.

aoberoi · 2020-05-12T21:40:00Z


-                if inspect.iscoroutinefunction(callback):
+                if self.run_async or inspect.iscoroutinefunction(callback):
                    await callback(


if i understand this correctly, when run_async=True but the callback is not a coroutine, then the callback will be invoked with the await keyword. This seems to be a problem (ref):

This means that synchronous and asynchronous functions/callables are different types - you can't just mix and match them. Try to await a sync function and you'll see Python complain, forget to await an async function and you'll get back a coroutine object rather than the result you wanted.

Maybe we should change the or to an and, and also have another case for the situation I described above that throws an explicit error. Or maybe this isn't necessary because we expect developers to understand the runtime error (complaining as the author above put it) and know how to deal with it. IMHO having our own explicit error with a readable description would be easier to debug.

Another solution could be that we want to just call callback without await if we detect that its not a coroutine, no matter what run_async is set to. What do you think about this?

This is a great catch. In future major releases, we may be able to clearly say "if you go with run_sync=True, all callbacks must be coroutines." but it's not that timing when we release a minor version. Also, we don't need to have this change to resolve the existing concurrency issues.

I will just revert this change.

reverted by d9aa238

with this change reverted, when run_async=False the callback will still be invoked as a coroutine (using the await). that will also lead to problems. it seems like the way to fix this would be to detect when run_async=True and !inspect.iscoroutinefunction(callback) to throw an error.

Thanks, letting developers notice doing wrong seems a nice addition. I've added some tests and added an error in 227f949

aoberoi · 2020-05-12T22:17:55Z

                    )
                else:
-                    self._execute_in_thread(callback, data)
+                    await self._execute_in_thread(


From my understanding, we always block on the return of the callback when run_async=False. If that is the case, why are we using the ThreadPoolExecutor to invoke the callback on another thread? It seems that the executor will only have one worker/thread at any given time (since we always block on their completion within _execute_on_thread()). The same behavior could be accomplished by simply running the callback on the current thread, right?

Is there something about the consistency of _dispatch_event() that changes when we don't await on anything (by calling callback() synchronously on the same thread)?

seratch · 2020-05-13T03:22:54Z

+
+            # If you see the following errors with #stop() method calls,  call `RTMClient#async_stop()` instead
+            #
+            # /python3.8/asyncio/base_events.py:641:


This is unrelated to the code review suggestions. The tests have been passed but I overlooked this warning for two test cases.

seratch · 2020-05-13T05:39:44Z

+                if (
+                    self.auto_reconnect
+                    and not self._stopped
+                    and error_code != "invalid_auth"  # "invalid_auth" is unrecoverable


By correcting the behavior of run_async=False descried in #530 and #613 , the RTMClient started doing exponential retries with unrecoverable errors. This condition is added to prevent it for the cases with invalid tokens.

seratch · 2020-05-13T06:17:55Z

@aoberoi @stevengill Thanks for your insightful review. I've updated this pull request and now it's ready for view again.

* #530 Fixed by changing _execute_in_thread to be a coroutine * #569 Resolved by removing a blocking loop (while future.running()) * #645 WebClient(run_async=False) no longer depends on asyncio by default * #633 WebClient(run_async=False) doesn't internally depend on aiohttp * #631 When run_async=True, RTM listner can be a normal function and WebClient is free from the event loop * #630 WebClient no longer depends on aiohttp when run_async=False * #497 Fixed when run_async=False / can be closed as we don't support run_async=True for this use case (in Flask)

* Get rid of thread pool executor as we no longer need threads internally * Add async_stop() method for safer termination of RTMClient for the cases having unexpected exceptions in callbacks * Revert the behavior of run_async=True to allow using non-async methods * Simplify the Retry-After header value extraction code

* Merge UrllibWebClient's functionalities into BaseClient not to increase unnecesesary complexity such as circular import issues * Call show_2020_01_deprecation() only once * Test if values are dict and they're empty * Rename _sync_request to _request_for_pagination to be clearer

seratch · 2020-05-14T04:37:35Z

I've rebased this branch on the latest master branch. It's ready to merge once I get reviewers' approvals.

aoberoi

Just one comment that needs attention here: #662 (comment).

I don't think its critical, but probably worth looking at once more. Approved!

seratch added Priority: High Version: 2x bug M-T: A confirmed bug report. Issues are confirmed when the reproduction steps are documented semver:minor area:concurrency Issues and PRs related to concurrency rtm-client web-client labels Apr 27, 2020

seratch requested review from aoberoi, shaydewael and stevengill April 27, 2020 15:03

seratch self-assigned this Apr 27, 2020

seratch commented Apr 27, 2020

View reviewed changes

seratch added this to the 2.6.0 milestone Apr 27, 2020

seratch mentioned this pull request Apr 28, 2020

Deprecation warnings to channels/groups/mpim/im API method calls #650

Closed

9 tasks

seratch mentioned this pull request Apr 28, 2020

Allow boolean kwargs #560

Closed

9 tasks

seratch commented Apr 28, 2020

View reviewed changes

seratch mentioned this pull request Apr 30, 2020

Fix #611 - stop propagating user exceptions to the connection management layer #665

Closed

2 tasks

This was referenced May 7, 2020

Fix #670 by removing all None values from dict #671

Merged

Fix #611 - stop propagating user exceptions to the connection management layer #679

Merged

seratch added a commit that referenced this pull request May 11, 2020

Add test data files for #662

057b0b7

This was referenced May 11, 2020

Fix #650 Deprecation warnings to channels/groups/mpim/im API method calls #680

Merged

Fix #560 Allow bool kwargs #681

Merged

aoberoi suggested changes May 13, 2020

View reviewed changes

seratch commented May 13, 2020

View reviewed changes

seratch mentioned this pull request May 14, 2020

Web client and rtm client should be fully split, and web client should not use asyncio #633

Closed

9 tasks

seratch added 10 commits May 14, 2020 13:30

Modify unit tests not to depend on asyncio

86a15f9

Change marks for fixed integration tests

f9de16b

Change marks for fixed integration tests

baef25d

Add test cases for pagination with use_sync_aiohttp=True

8ed73d2

Add invalid_auth handling in RTMClient

2bea8d7

Improve debug logging

130f3a0

Add a formatter change

c8866d4

seratch mentioned this pull request May 14, 2020

Add v2.6.0rc1 release note #687

Merged

2 tasks

aoberoi approved these changes May 15, 2020

View reviewed changes

Raise an error when trying to run a normal function with run_async=True

227f949

seratch merged commit 58134fe into slackapi:master May 15, 2020

seratch deleted the new-sync-mode branch October 18, 2020 22:49

davies-w mentioned this pull request Nov 8, 2023

How do I get the actual response when an error raised? #1422

Closed

Uh oh!

Conversation

seratch commented Apr 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Requirements (place an x in each [ ])

Uh oh!

codecov Bot commented Apr 27, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seratch commented Apr 28, 2020

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

juan-vg commented Apr 29, 2020 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

aoberoi left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

seratch commented May 13, 2020

Uh oh!

seratch commented May 14, 2020

Uh oh!

aoberoi left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

seratch commented Apr 27, 2020 •

edited

Loading

Requirements (place an `x` in each `[ ]`)

codecov Bot commented Apr 27, 2020 •

edited

Loading

juan-vg commented Apr 29, 2020 •

edited

Loading

aoberoi left a comment •

edited

Loading