Add best practices when using load balancer#325
Conversation
|
I wouldn’t list that as required; I’d rather present it as the easiest path. Session management is ultimately at the discretion of whoever implements the MCP server. Using Redis to store sessions and share it across instances can also solve the problem, but it introduces a more complex architecture. |
The session management mentioned here is not simply sharing some data. According to the MCP protocol definition, a complete session includes three handshakes (the client sends a request to the SSE endpoint, the server returns the message endpoint information through SSE, and the client sends a notification to the message endpoint) to establish a channel (the client through the message endpoint, and the server through SSE), and this channel is what we call a session. The problem now is that if the load balancer sends the message request to a server that has no SSE connection (that is, it cannot recognize the sessionId), then this request will fail. |
|
According to the current spec: https://modelcontextprotocol.io/specification/2025-03-26/basic/transports#session-management, the session ID is returned during the init phase. This implies that the ID is not negotiated anywhere else in the protocol. Unless I’m missing something? |
Yes, so this session stickiness is required means, load balancer which claims to support MCP would be best to support it without MCP sessionId through parsing SSE event, likes sticky cookie in nginx. |
|
I agree with you on LB/Gateway — ultimately, it's not required, just recommended. That said, the solution I mentioned earlier works fine as well. Where I’ve landed with this discussion is that we need an MCP Gateway specification, and it should be defined here. |
Yes, we need a specification to guide how load balancers or gateways support MCP. I have seen many different implementations so far, and their common point is that they all use the gateway as the MCP Server and communicate with business services through Redis PUB/SUB or other event buses or message middleware, which is not friendly to the standard uniform ecosystem. |
|
We have the same problem here, the redis pub/sub is doable, but it's too much to connect an existing application that is running through 10/20 pods ONLY to support the mcp protocol. For now, we resolved making a single pod to handle mcp connections, commonly, this is more than enough to support internal chat calls. Also, it's good to mention that HTTP stateless proposal is coming, and hopefully will make it a lot more easier: |
|
This is a really good discussion - do you think it would be a good idea to collaborate on this within the Hosting Working Group to publish the best practices for the community here?: https://github.com/modelcontextprotocol-community/working-groups |
I will see it latter after my holiday :) |
|
so |
A working solution is using sticky cookie https://nginx.org/en/docs/http/ngx_http_upstream_module.html#sticky_cookie to make sure client persist the selected instance. As special remainder: client should support cookiejar to persist the information between requests |
|
I agree with what @Joffref said:
So I will close, but thank you for the pull request! 😃 |
Motivation and Context
When the MCP Server provides services with multiple replicas (especially as an external provider), there is usually a load balancer. Usually, unless a persistent connection is used, the load balancer assumes that the request is stateless. However, the MCP protocol is a stateful protocol. When using HTTP (SSE) communication, it is necessary to ensure that the instance that receives the message request is the instance that establishes the SSE connection.
Therefore, as a best practice, when using a load balancer, session stickiness must be guaranteed, and when the instance that establishes the SSE connection does not exist, the request must be rejected (which means that consistent hashing is not suitable in this scenario).
In addition, since common session stickiness is implemented based on cookies, cookie records may be necessary for MCP Clients.
How Has This Been Tested?
As above, for the MCP Server provides services with multiple replicas, using openresty as the loadbalancer to compare Round Robin and Cookie-based Session Stickiness, and Transport with HTTP Client which supports cookie jar. Round Robin couldn't work when multi-sessions, but Cookie-based Session Stickiness could working well.
Breaking Changes
None
Types of changes
Checklist
Additional context