Skip to content

foobarops/gitlab-proxy

Repository files navigation

Gitlab Proxy

This project is a simple proxy for Gitlab API. It is a Spring Boot application that forwards requests to Gitlab API and returns the response to the client.

The main goal of this project is to provide quick responses to the client by caching the responses from Gitlab API. Refreshing the cache can be done by using a refresh flag in the request.

Currently, the proxy has only one endpoint /groups that returns the list of groups from Gitlab API. Client layer is split into two parts: client and retriable client. The client is responsible for preparing the requests to Gitlab API and caching the responses. The retriable client is responsible for retrying the requests in case of errors. Retries are done using Spring Retry.

Performance considerations

Proxy gets a full list of groups from Gitlab API and caches the response. It uses keyset pagination to get the list because the number of groups is more than 50000 and offset based pagination is not supported by Gitlab API for larger lists.

The whole cycle trough groups takes a long time. Approximate number of groups is above 600000. Given 1-2 sec per request of 100 groups results in several hours to get the full list. The entire list is not needed for most of the use cases. More often use case is to get the list of groups filtered by a specific keyword. Moreover it should be paginated. Currently, the proxy supports filtering by name. This is done by adding a query parameter to the request. But for this traversal of the entire list is needed, which results in O(n) time complexity. This can be optimized by adding of an index. The solution is to add treeset and store the group entries in the cache individually using the group name as the key. This way the search can be done in O(log(n)) time.

Modes of the proxy:

Proxy can work in three modes: normal, fallback, and bulkhead. It starts in bulkhead mode and switches to ready mode when the cache is fully populated.

Ready

Client will constantly cycle trough the groups in background and update the treeset. The treeset will be used for giving the filtered and paginated results to the client. When an entry goes obsolete, it will not show up in the results and after configured time it will be removed from the treeset by special listener of eviction events. In case of a refresh request, the direct request to Gitlab API will be made and the entries will be also updated.

Fallback (to be implemented)

If an error occurs while accessing Gitlab API, proxy will return the last known state of the cache. After specified time if the service is still unavailable, the proxy will switch to bulkhead mode. This time should not exceed the eviction time minus the time elapsed since the start of last successful full refresh. Alternatively, evicted entries can be marked as stale and the proxy can continue to serve the stale data. Refresh is not possible in this mode.

Bulkhead

If an error occurs while accessing Gitlab API and there is no previous successful full refresh, i.e. the cache is not fully populated, then the proxy switches to a bulkhead mode. In this mode proxy will pass the requests directly to Gitlab API, as this is the only way to get the full data. The full cache is not used in this mode. But single pages and filtered request caching can be added to improve the performance.

In both fallback and bulkhead mode, the proxy will try to access Gitlab API in the background and switch back to normal mode when the service is available again. Responsible should be notified about the error.

Docker images can be built using the Dockerfile or using gradle. The proxy can be run using docker-compose or using docker.

The proxy can also be used to add more features like rate limiting, security, monitoring, etc. It is also a simple project that can be used as a starting point to build more complex projects. See the section Further improvements for more details.

Table of Contents

Example of a target architecture:

Diagram Edit

Build and run the application

Run using docker-compose:

It will be run from ghcr.io image.

docker compose up

Build and run using docker:

Build

Build docker image using the Dockerfile:

docker buildx build --platform amd64 -t gitlab-proxy .

Build docker image with gradle:

gradle bootBuildImage --imageName=gitlab-proxy

Run:

docker run -p 8080:8080 --platform amd64 -ti gitlab-proxy

Test the application

curl http://localhost:8080/groups

Debug

Debug building of the docker image:

docker buildx build --platform amd64 --progress=plain -t gitlab-proxy --no-cache .

Inspect built container:

docker run -p 8080:8080 --platform amd64 -ti --entrypoint /bin/sh gitlab-proxy

Further improvements

Distributed cache scenario

The proxy can be used in a distributed cache scenario. For example, the proxy can be used to cache the responses from Gitlab API in a Redis cluster. It can be done using a tool like Redisson, etc. Alternatively distributed synchronization can be achieved using Terracotta, which provides clustering capabilities for Ehcache.

Add more features like rate limiting, etc

Rate limiting can be used to prevent abuse of the proxy. For example, the proxy can limit the number of requests per second, per minute, per user, per IP, etc. It can be done using a tool like RateLimiter, etc.

Add more endpoints to the proxy like /projects, /users, etc

Add more tests like integration tests, contract tests, etc

Add more security like authentication, authorization, etc

It can be done using a tool like Spring Security, OAuth, etc.

Add more monitoring like metrics, alerts, etc

It can be done using a tool like Prometheus, Grafana, etc. For example, the proxy can expose metrics like the number of requests, the response time, the number of errors, etc. It can be done using a tool like Micrometer, etc.

Add more CI/CD like pipelines, deployments, etc

It can be done using a tool like Jenkins, Gitlab CI, Github Actions, etc.

Add more logging like structured logs, log aggregation, etc.

It can be done using a tool like Logback, Log4j, etc. Structured logs is important because it makes it easier to search, filter, etc. Further this logs can be sent to a log aggregator like ELK, Splunk, etc. Errors can be aggregated using a tool like Sentry, Rollbar, etc.

Add more error handling like circuit breakers, rate limiters, etc

It can be done using a tool like Resilience4j, Hystrix, etc. Resilience4j is a lightweight fault tolerance library inspired by Netflix Hystrix, but designed for functional programming. Resilience4j provides higher-order functions (decorators) to enhance any functional interface, lambda expression or method reference with a Circuit Breaker, Rate Limiter or Bulkhead. Bulkhead is a pattern used to prevent a single failing component from bringing down the entire system.

Add more performance improvements like async, batching, etc

It can be done using a tool like Reactor, RxJava, etc. Reactor is a fourth-generation Reactive library for building non-blocking applications on the JVM based on the Reactive Streams Specification. It could be used to improve the performance of the proxy by making the requests to Gitlab API asynchronous. Batch requests can be used to reduce the number of requests to Gitlab API. For example, instead of making 10 requests to Gitlab API to get the details of 10 groups, the proxy can make a single request to Gitlab API to get the details of 10 groups.

Add more configurations like timeouts, retries, etc

It can be done using a tool like Spring Cloud Config, Consul, etc. Spring Cloud Config provides server and client-side support for externalized configuration in a distributed system. With the Config Server, you have a central place to manage external properties for applications across all environments. The concepts on both client and server map identically to the Spring Environment and PropertySource abstractions, so they fit very well with Spring applications but can be used with any application. Another option is to use Consul. Consul is a distributed, highly available, and data center-aware solution to connect and configure applications across dynamic, distributed infrastructure. Consul provides a flexible solution for service discovery, configuration, and segmentation that can be used with any application.

Add more environments like dev, test, prod, etc

It can be done using a tool like Docker, Kubernetes, etc. Another option is to use Spring Profiles.

Add more examples like how to use the proxy in a real project

For example, how to use the proxy in a frontend application, how to use the proxy in a backend application, etc.

Add more comments in the code

Is always good to have comments in the code to explain why the code is there, what the code is doing, etc.

Add built-in support for OpenAPI, Swagger, etc

It can be done using a tool like Springdoc, Swagger, etc. It is important to have a documentation of the API to make it easier to use the API.

Further features

Further features on endpoint /groups

Add pagination

Pagination can be used to limit the number of groups returned by the proxy. For example, the proxy can return the first 10 groups, the next 10 groups, etc. It can be done using a tool like Pageable, etc.

Add sorting

Sorting can be used to sort the groups returned by the proxy. For example, the proxy can sort the groups by name, by id, etc. It can be done using a tool like Sort, etc.

Add filtering

Filtering can be used to filter the groups returned by the proxy. For example, the proxy can return only the groups that contain a specific word in the name, etc. It can be done using a tool like Specification, etc.

Links

https://docs.gitlab.com/ee/api/groups.html

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors