Skip to content

Handle timeout required by sdbus API#73

Closed
ofacklam wants to merge 1 commit into
python-sdbus:masterfrom
ofacklam:master
Closed

Handle timeout required by sdbus API#73
ofacklam wants to merge 1 commit into
python-sdbus:masterfrom
ofacklam:master

Conversation

@ofacklam
Copy link
Copy Markdown

This PR adds correct handling of the timeout behavior mandated by the sdbus API.

According to the docs, two conditions must be met:

  • the file descriptor returned by sd_bus_get_fd() should be polled for the events indicated by sd_bus_get_events()
  • and the I/O call should block for that up to the timeout returned by sd_bus_get_timeout()

However, since loop.add_reader doesn't support a timeout parameter, we create a separate timeout event with loop.call_later.

More precisely, depending on the return value of sd_bus_get_timeout():

  • if UINT64_MAX: no timeout is created
  • if 0: processing called immediately with loop.call_soon
  • else: timeout created with loop.call_later

@igo95862
Copy link
Copy Markdown
Collaborator

Hello @ofacklam

I am aware that the sd-bus timeout is not implemented correctly right now: #53 (reply in thread)

However, I want to do more research in to how it is best to integrate with Python asyncio in a generic way.

@ofacklam
Copy link
Copy Markdown
Author

Hi @igo95862 !

Thanks for the quick reply and for the library in general, really handy!

Personally I didn't see any other way to integrate this with asyncio. If I understand correctly, the problem is not that you want to control the asyncio event loop's select policy, but rather you want to tell it to call your sdbus_process callback at the latest at that point in time.

However I would be glad to discuss other solutions as well :)

In our current project we are using the library with this patch since we were running into issues with the newer version 0.12.0. There are cases were both sd_bus_get_events() and sd_bus_get_timeout() return 0 to signal that data is already available and processing should be done immediately. In that case the current implementation deadlocks since no reader or writer will be registered anymore.

@igo95862
Copy link
Copy Markdown
Collaborator

I didn't know there was a deadlock which makes it a much bigger issue.

Looking at the sd-bus source code:

https://github.com/systemd/systemd/blob/941a12dcba57f6673230a9c413738c51374d2998/src/libsystemd/sd-bus/sd-bus.c#L2610

Looks like sd_bus_get_events returns POLLIN if its read queue is empty.

https://github.com/systemd/systemd/blob/941a12dcba57f6673230a9c413738c51374d2998/src/libsystemd/sd-bus/sd-bus.c#L2651

sd_bus_get_timeout returns zero when the read queue is not empty.

I wonder why calling the sd_bus_process repeatedly until it returns zero can leave some messages in the read queue?

@ofacklam
Copy link
Copy Markdown
Author

Looking at the code again, it seems I did some synchronous calls on the same connection previously (a kind of app initialization logic) before starting the async calls. That might be the reason behind the strange behavior, and I guess it's not really an expected use case. So the issue is probably less severe than I thought.

Still I would argue that relying on an implementation detail of sdbus regarding timing behavior seems somewhat dangerous.

@igo95862
Copy link
Copy Markdown
Collaborator

igo95862 commented Aug 2, 2024

Looking at the code again, it seems I did some synchronous calls on the same connection previously (a kind of app initialization logic) before starting the async calls.

Is there some initialization that cannot be done with async code? Will the deadlock still happen if all code is properly async?

I will look in to this timeout integration this weekend. I have a few ideas how to make it more native.

@igo95862
Copy link
Copy Markdown
Collaborator

igo95862 commented Aug 4, 2024

I looked in to the Linux APIs and asyncio and I have an interesting idea that I want to try.

I also want to setup benchmarks. One of my concerns is that constantly arming and disarming asyncio timer handles will have a negative effect on performance.

@igo95862
Copy link
Copy Markdown
Collaborator

I added initial benchmarks in 7e970a6.

There definitely is performance impact of this change:

Before:

sdbus_async_ping: Mean +- std dev: 36.5 us +- 3.2 us
sdbus_async_ping_gather: Mean +- std dev: 14.0 us +- 1.0 us

After:

sdbus_async_ping: Mean +- std dev: 41.6 us +- 3.8 us
sdbus_async_ping_gather: Mean +- std dev: 17.3 us +- 0.4 us

My plan is to use timerfd because it seems to be reconfigurable meaning the timer can be set multiple times without having to create any new Python objects.

@ofacklam
Copy link
Copy Markdown
Author

Your approach with timerfd sounds very promising. I'm looking forward to trying that out. Let me know if I can provide any assistance :)

@igo95862
Copy link
Copy Markdown
Collaborator

igo95862 commented Aug 18, 2024

I created initial draft in 6bee560. @ofacklam could you give it a test.

The performance impact seems to be much smaller:

.....................
sdbus_async_ping: Mean +- std dev: 37.0 us +- 2.3 us
.....................
sdbus_async_ping_gather: Mean +- std dev: 14.8 us +- 0.8 us

I also added a test there I set bus timeout value to 0.01 seconds and check that the DbusNoReplyError gets raised in time. This ensures that the sd-bus timeout is properly used.

Would be interesting adding a test that replicates your deadlock.

@ofacklam
Copy link
Copy Markdown
Author

Hi @igo95862

I cherry-picked your commit 6bee560 onto the 0.12.0 release.

With this change, my "deadlock" seems to have disappeared, so it solves the problem just as well as my proposal 🎉

Regarding normal timeouts of method calls, I can't forcibly reproduce it on my end (it only happens due to very rare hardware issues), however your unit test seems to cover that quite well.

Thanks for that fix, it looks really nice! Just one note: you rely on timerfd == 0 for checking whether it's initialized. Are you sure 0 can never be a valid timerfd value?

@igo95862
Copy link
Copy Markdown
Collaborator

Just one note: you rely on timerfd == 0 for checking whether it's initialized. Are you sure 0 can never be a valid timerfd value?

Looks like if you close stdin file descriptor 0 then 0 can be reused for any other file descriptor so it definitely can be an issue.

I originally wanted to use -1 but I encountered issues reliably setting that in SdBus init method. I will revisit this before releasing final version.

@igo95862
Copy link
Copy Markdown
Collaborator

igo95862 commented Sep 1, 2024

I added a PyObject* pointer to the bus struct which holds the timer fd Python integer in dcc61aa. This way it can reliably tell if the timer was initialized by comparing it to NULL. The struct will have both Python and C integer representation so that it can be easily accessed from both.

I think I try to make more progress this week and release the 0.12.1 soon with this fix.

@igo95862
Copy link
Copy Markdown
Collaborator

Alright I merged the fe49edb that addresses the bus timeouts.

@igo95862 igo95862 closed this Sep 22, 2024
@ofacklam
Copy link
Copy Markdown
Author

The latest commit looks very nice, I'm looking forward to testing with it.
I just left one comment on the commit regarding cleanup :)

@igo95862
Copy link
Copy Markdown
Collaborator

I will probably switch to the sd_event_get_fd in the future version as it provides a single file descriptor specifically designed to integrate with existing event loops.

https://man.archlinux.org/man/sd_bus_attach_event.3.en

https://man.archlinux.org/man/sd_event_get_fd.3.en

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants