This project is a simple proxy for Gitlab API. It is a Spring Boot application that forwards requests to Gitlab API and returns the response to the client.
The main goal of this project is to provide quick responses to the client by caching the responses from Gitlab API. Refreshing the cache can be done by using a refresh flag in the request.
Currently, the proxy has only one endpoint /groups that returns the list of groups from Gitlab API.
Client layer is split into two parts: client and retriable client. The client is responsible for preparing the requests to Gitlab API and caching the responses. The retriable client is responsible for retrying the requests in case of errors. Retries are done using Spring Retry.
Proxy gets a full list of groups from Gitlab API and caches the response. It uses keyset pagination to get the list because the number of groups is more than 50000 and offset based pagination is not supported by Gitlab API for larger lists.
The whole cycle trough groups takes a long time. Approximate number of groups is above 600000. Given 1-2 sec per request of 100 groups results in several hours to get the full list. The entire list is not needed for most of the use cases. More often use case is to get the list of groups filtered by a specific keyword. Moreover it should be paginated. Currently, the proxy supports filtering by name. This is done by adding a query parameter to the request. But for this traversal of the entire list is needed, which results in O(n) time complexity. This can be optimized by adding of an index. The solution is to add treeset and store the group entries in the cache individually using the group name as the key. This way the search can be done in O(log(n)) time.
Proxy can work in three modes: normal, fallback, and bulkhead. It starts in bulkhead mode and switches to ready mode when the cache is fully populated.
Client will constantly cycle trough the groups in background and update the treeset. The treeset will be used for giving the filtered and paginated results to the client. When an entry goes obsolete, it will not show up in the results and after configured time it will be removed from the treeset by special listener of eviction events. In case of a refresh request, the direct request to Gitlab API will be made and the entries will be also updated.
If an error occurs while accessing Gitlab API, proxy will return the last known state of the cache. After specified time if the service is still unavailable, the proxy will switch to bulkhead mode. This time should not exceed the eviction time minus the time elapsed since the start of last successful full refresh. Alternatively, evicted entries can be marked as stale and the proxy can continue to serve the stale data. Refresh is not possible in this mode.
If an error occurs while accessing Gitlab API and there is no previous successful full refresh, i.e. the cache is not fully populated, then the proxy switches to a bulkhead mode. In this mode proxy will pass the requests directly to Gitlab API, as this is the only way to get the full data. The full cache is not used in this mode. But single pages and filtered request caching can be added to improve the performance.
In both fallback and bulkhead mode, the proxy will try to access Gitlab API in the background and switch back to normal mode when the service is available again. Responsible should be notified about the error.
Docker images can be built using the Dockerfile or using gradle. The proxy can be run using docker-compose or using docker.
The proxy can also be used to add more features like rate limiting, security, monitoring, etc. It is also a simple project that can be used as a starting point to build more complex projects. See the section Further improvements for more details.
- Gitlab Proxy
- Next steps
- Table of Contents
- Example of a target architecture
- Build and run the application
- Test the application
- Debug
- Further improvements
- Distributed cache scenario
- Add more features like rate limiting, etc
- Add more endpoints to the proxy like /projects, /users, etc
- Add more tests like integration tests, contract tests, etc
- Add more security like authentication, authorization, etc
- Add more monitoring like metrics, alerts, etc
- Add more CI/CD like pipelines, deployments, etc
- Add more logging like structured logs, log aggregation, etc
- Add more error handling like circuit breakers, rate limiters, etc
- Add more performance improvements like async, batching, etc
- Add more configurations like timeouts, retries, etc
- Add more environments like dev, test, prod, etc
- Add more examples like how to use the proxy in a real project
- Add more comments in the code
- Add built-in support for OpenAPI, Swagger, etc
- Further features
It will be run from ghcr.io image.
docker compose updocker buildx build --platform amd64 -t gitlab-proxy .gradle bootBuildImage --imageName=gitlab-proxydocker run -p 8080:8080 --platform amd64 -ti gitlab-proxycurl http://localhost:8080/groupsdocker buildx build --platform amd64 --progress=plain -t gitlab-proxy --no-cache .docker run -p 8080:8080 --platform amd64 -ti --entrypoint /bin/sh gitlab-proxyThe proxy can be used in a distributed cache scenario. For example, the proxy can be used to cache the responses from Gitlab API in a Redis cluster. It can be done using a tool like Redisson, etc. Alternatively distributed synchronization can be achieved using Terracotta, which provides clustering capabilities for Ehcache.
Rate limiting can be used to prevent abuse of the proxy. For example, the proxy can limit the number of requests per second, per minute, per user, per IP, etc. It can be done using a tool like RateLimiter, etc.
It can be done using a tool like Spring Security, OAuth, etc.
It can be done using a tool like Prometheus, Grafana, etc. For example, the proxy can expose metrics like the number of requests, the response time, the number of errors, etc. It can be done using a tool like Micrometer, etc.
It can be done using a tool like Jenkins, Gitlab CI, Github Actions, etc.
It can be done using a tool like Logback, Log4j, etc. Structured logs is important because it makes it easier to search, filter, etc. Further this logs can be sent to a log aggregator like ELK, Splunk, etc. Errors can be aggregated using a tool like Sentry, Rollbar, etc.
It can be done using a tool like Resilience4j, Hystrix, etc. Resilience4j is a lightweight fault tolerance library inspired by Netflix Hystrix, but designed for functional programming. Resilience4j provides higher-order functions (decorators) to enhance any functional interface, lambda expression or method reference with a Circuit Breaker, Rate Limiter or Bulkhead. Bulkhead is a pattern used to prevent a single failing component from bringing down the entire system.
It can be done using a tool like Reactor, RxJava, etc. Reactor is a fourth-generation Reactive library for building non-blocking applications on the JVM based on the Reactive Streams Specification. It could be used to improve the performance of the proxy by making the requests to Gitlab API asynchronous. Batch requests can be used to reduce the number of requests to Gitlab API. For example, instead of making 10 requests to Gitlab API to get the details of 10 groups, the proxy can make a single request to Gitlab API to get the details of 10 groups.
It can be done using a tool like Spring Cloud Config, Consul, etc. Spring Cloud Config provides server and client-side support for externalized configuration in a distributed system. With the Config Server, you have a central place to manage external properties for applications across all environments. The concepts on both client and server map identically to the Spring Environment and PropertySource abstractions, so they fit very well with Spring applications but can be used with any application. Another option is to use Consul. Consul is a distributed, highly available, and data center-aware solution to connect and configure applications across dynamic, distributed infrastructure. Consul provides a flexible solution for service discovery, configuration, and segmentation that can be used with any application.
It can be done using a tool like Docker, Kubernetes, etc. Another option is to use Spring Profiles.
For example, how to use the proxy in a frontend application, how to use the proxy in a backend application, etc.
Is always good to have comments in the code to explain why the code is there, what the code is doing, etc.
It can be done using a tool like Springdoc, Swagger, etc. It is important to have a documentation of the API to make it easier to use the API.
Pagination can be used to limit the number of groups returned by the proxy. For example, the proxy can return the first 10 groups, the next 10 groups, etc. It can be done using a tool like Pageable, etc.
Sorting can be used to sort the groups returned by the proxy. For example, the proxy can sort the groups by name, by id, etc. It can be done using a tool like Sort, etc.
Filtering can be used to filter the groups returned by the proxy. For example, the proxy can return only the groups that contain a specific word in the name, etc. It can be done using a tool like Specification, etc.
