Skip to content

Dont NOt Merge : POC code to show how objects and not just CFs could be used as returns values#3323

Closed
bbakerman wants to merge 7 commits into
masterfrom
support_objects_or_cfs
Closed

Dont NOt Merge : POC code to show how objects and not just CFs could be used as returns values#3323
bbakerman wants to merge 7 commits into
masterfrom
support_objects_or_cfs

Conversation

@bbakerman
Copy link
Copy Markdown
Member

@bbakerman bbakerman commented Sep 4, 2023

This is a POC to show how we could return Objects and not just CompletableFutures from the data fetchers and then beyond

This was driven from my investigations into Java 21 virtual threads where a DF can in theory return a value thats async BUT is materialised by the time the DF returns (eg it ran on a VT inside the data fetcher)

CFs are cheapish but pure objects are (in theory) cheaper.

The price of this PR is the type safety in our code and in callbacks such as Instrumentation

We have to handle an Object return type as CompletableFuture<Foo> | Foo - so the compiler is not helping out code.

This also breaks the API of the ExecutionStrategy BUT I am comfortable with that. I think on reflection that ExecutionStrategy should be @Internal code

I will spend some time to see if this leads to better performance.

ALGORITHM

The general alo here is that at key points we have either a List<CompleteableFuture<Foo>> or a List<Foo> - or the single arity versions of of those - and since Java can't represent a union type this ends up being methods like

protected /* CompletableFuture<FieldValueInfo> | FieldValueInfo */ Object resolveFieldWithInfo(...)

So when these values need to be consumed you end up with 2 paths via if (obj instanceOf CompleteableFuture) else

I have called the non promise a materialised object - I am not sure this is the right nomclementure.

The other key part is our async wait code. Since we cant know up front if a result object will be async or materialised we have to use a mechanism to discover that.

In theory we would have a polymorphic list of values like CF1, object2, CF3, CF4, object5, object6, object7 say. That list would require us to async wait on 3 objects while having 4 other materialised values we dont have to.

The Async.CombinedBuilder code was retrofited with an awaitPolymorphic() (I don't love the naming). if ANY of the objects to wait for a CFs then it will truly await then a return a CompleteableFuture<List<Foo>>. If they are all materialised it can skip that and returns a List<Foo>>.

Again the consumer of that needs to check via instanceOf if its a CF or not. If its not it can be consumed direct otherwise it must be consumed as a CF chain.

Just like CompleteableFuture this is viral - the method return types become Object - which is bad for clean code understanding but not impossible to understand.

POSSIBLE IMPROVEMENTS

Right now we wrap completed values into ExecutionResult objects and then unwrap them and throw that away. I think this is becase when we complete a Object type we call back to executionStrategy.execute() and this returns a ExecutionResult

This is wasteful for scalars, enums and even lists in maybe ways. This does however provide a common shape to FieldValueInfo where the value is today a CF<ExecutionResult> (or a possible ExecutionResult in this PR)

A ExecutionResult can carry errors and data BUT we never use that so again a waste. This would be a memory saving if we stopped wrapping values in ERs

IMPACT

The BIG code impact is in InstrumentationContext which I didn't change (yet)

public interface InstrumentationContext<T> {
    void onDispatched(CompletableFuture<T> result);
    void onCompleted(T result, Throwable t);
}

Nominally the field fetch context is no longer CompletableFuture but my union of Object or CF

DataLoader code needs this onDispatched to work BUT it does not use the value at all.

This calls for a breaking change but how ? TBD

Summary

So there is real food for thought here.

It's faster if we don't use CFs but its only a little bit faster and the code is more complicated and the InstrumentationContext is a real pain in a breaking code sense

There is less memory pressure if we dont allocated CFs which is something. I didn't spend time on that memory pressure but its a 3-4 CFs per returned field value (as we compose CFs)

@bbakerman bbakerman added breaking change requires a new major version to be relased Not to be merged spikes or other stuff that should never or not yet to be merged labels Sep 4, 2023
return (FetchedValue) join;
} else {
return (FetchedValue) fetchedField;
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic, you are planing some great thing!

@bbakerman
Copy link
Copy Markdown
Member Author

bbakerman commented Sep 4, 2023

I have done some benchmarks on my machine.

if anything... its slower in ms/op but faster in thorough put ops/s. But not much and not to any great degree

Master - before change

Benchmark                                    Mode  Cnt    Score    Error  Units
BenchMark.benchMarkSimpleQueriesThroughput  thrpt   15  176.481 ± 11.509  ops/s
BenchMark.benchMarkSimpleQueriesAvgTime      avgt   15    5.426 ±  0.283  ms/op


This branch - with changes

Benchmark                                    Mode  Cnt    Score    Error  Units
BenchMark.benchMarkSimpleQueriesThroughput  thrpt   15  177.025 ± 13.248  ops/s
BenchMark.benchMarkSimpleQueriesAvgTime      avgt   15    6.161 ±  0.407  ms/op

This branch - with changes taking out the onDispatch() CF creation

BenchMark.benchMarkSimpleQueriesThroughput  thrpt   15  178.441 ± 7.796  ops/s
BenchMark.benchMarkSimpleQueriesAvgTime      avgt   15    6.119 ± 2.048  ms/op

The above benchmarks a test of sync only results. There are no async in it at all.

I had expected it to be quicker but its not - interesting...

Suspect

I suspect the culprint might be the completeObject / completeList methods which jump is back into CF land

If you have a List<Foo> source object to run over on graphql then it ends up runnig the list as a set of field fetches in CF land

CompletableFuture<List<ExecutionResult>> resultsFuture = Async.each(fieldValueInfos, FieldValueInfo::getFieldValue);

So while the each object in the list might be fetched as a non CF object - it ends up being "completed" as a CF no matter what.

I think we would have to take this PR further and see if we can have a polymorphic list of CFs or materialised objects come back in the completeValue methods

Taking it further

I took this further and made the completeList / completeObject all returns a Object or CF. At present that does not pass all tests

The benchmarking is better however

Benchmark                                    Mode  Cnt    Score   Error  Units
BenchMark.benchMarkSimpleQueriesThroughput  thrpt   15  178.598 ± 6.397  ops/s
BenchMark.benchMarkSimpleQueriesAvgTime      avgt   15    5.441 ± 0.220  ms/op

but not amazingly better.

if (exception != null) {
ExecutionResult executionResult = handleNonNullException(executionContext, overallResult, exception);
completeListCtx.onCompleted(executionResult, exception);
return;
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is the completeList code - before it did

CompletableFuture<List> resultsFuture = Async.each(fieldValueInfos, FieldValueInfo::getFieldValue);

but this now gets a bit more complicated because the items in the list can be CFs or not.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we unwrap that Async.each and do our CF or Object code

@@ -662,7 +737,7 @@ protected CompletableFuture<ExecutionResult> completeValueForEnum(ExecutionConte
} catch (NonNullableFieldWasNullException e) {
return exceptionallyCompletedFuture(e);
}
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

notice here we still use CFs - why? because they can represent exceptions in themselves.

So if the serlisation fails - we returns an exceptional CF otherwise the materialised values

// Calling this from the executionContext to ensure we shift back from mutation strategy to the query strategy.

return executionContext.getQueryStrategy().execute(executionContext, newParameters);
return executionContext.getQueryStrategy().executePolymorphic(executionContext, newParameters);
Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This means that a materialised object can be completed via recursively calling back to the ES but via a special method that allows CFs or materialized values

@bbakerman
Copy link
Copy Markdown
Member Author

All tests now passing

@rrva
Copy link
Copy Markdown

rrva commented Sep 6, 2023

Awesome to see work in this area 👍 💯

@rrva
Copy link
Copy Markdown

rrva commented Sep 6, 2023

I can provide real-world benchmarks on this branch vs master if it executes my queries

@rrva
Copy link
Copy Markdown

rrva commented Sep 6, 2023

I can provide real-world benchmarks on this branch vs master if it executes my queries

Heh, "real-world" and benchmark in the same sentence. What I meant: I can run our load tests on it which involve real queries and provide some comparative indications

@bbakerman
Copy link
Copy Markdown
Member Author

I can provide real-world benchmarks on this branch vs master if it executes my queries

Heh, "real-world" and benchmark in the same sentence. What I meant: I can run our load tests on it which involve real queries and provide some comparative indications

Please do - we would love to see this in a more real world situation

1 similar comment
@bbakerman
Copy link
Copy Markdown
Member Author

I can provide real-world benchmarks on this branch vs master if it executes my queries

Heh, "real-world" and benchmark in the same sentence. What I meant: I can run our load tests on it which involve real queries and provide some comparative indications

Please do - we would love to see this in a more real world situation

@bbakerman
Copy link
Copy Markdown
Member Author

I would like to change this PR again to remove the ExecutionResult wrapping that happens. I suspect this was caused by the old reentrant1 call back to the executionStrattegy when resolving an object but this PR changes that but having a special method method for callback that knows its inside the gate.

I think this would also reduce memory pressure by not allocating an wrapper object for enums and scalars and so on- that is one for every value

@rrva
Copy link
Copy Markdown

rrva commented Oct 6, 2023

I can provide real-world benchmarks on this branch vs master if it executes my queries

Heh, "real-world" and benchmark in the same sentence. What I meant: I can run our load tests on it which involve real queries and provide some comparative indications

Please do - we would love to see this in a more real world situation

We measured this branch using some load tests involving real queries and compared this to 21.1. With a quick load test we typically use (10mins), I am afraid we also do not see any notable improvements in throughput or latency up to P99. However, less CompletableFutures are created as expected (about a third). GC pauses go down.

graphql-java 21.1 baseline

memory allocation distribution

image

GC pressure and GC pauses

image image

this branch

memory allocation distribution

image

GC pressure and GC pauses

image image

@rrva
Copy link
Copy Markdown

rrva commented Oct 6, 2023

I would need to spend more time load testing or measure latency up to P99.9 to really see a difference in throughput/latency. GC is coping right now. But we do see differences in the GC metrics.

@rrva
Copy link
Copy Markdown

rrva commented Oct 6, 2023

I would need to spend more time load testing or measure latency up to P99.9 to really see a difference in throughput/latency. GC is coping right now. But we do see differences in the GC metrics.

These tests were on Java 21, 16Gb heap, half of which can be used for query processing and G1 GC.

@bbakerman
Copy link
Copy Markdown
Member Author

Wow thanks so much for these numbers. This really helps put some perspective on the work.

Reducing memory pressure is always a good thing however I think most people would trade more memory usage for less latency and in this case it's just not there.

This code is problematic in the sense of code maintenance. It's entering into a scripting language world where where an Object value can be actually be a CompletableFuture<Foo> | Foo and needs to be checked with instanceof whenever used. In fact this is how the graphql-js reference code base does it work, returns a Union type that needs to be checked when used.

It also breaks Instrumentation quite a lot because nominally they are

public interface InstrumentationContext<T> {
    void onDispatched(CompletableFuture<T> result);
    void onCompleted(T result, Throwable t);

So in theory one could "do work on the passed in CF during onDispatched but this code doesn't want to give them out and it defeats the purpose of allocated one for instrumentation when we took such efforts to avoid them in the engine.

The Instrumentation by design of course always locks in your implementation so maybe that's just par for the course.

I am more bullish on #3330 as a way to reduce memory pressure because there truly is a completely redundant ExecutionResult wrapper on each value. I am more bullish on this breaking change as achievable BUT the combination would both reduce memory pressure.

@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Dec 7, 2023

Hello, this pull request has been inactive for 60 days, so we're marking it as stale. If you would like to continue working on this pull request, please make an update within the next 30 days, or we'll close the pull request.

@github-actions github-actions Bot added the Stale label Dec 7, 2023
@samuelAndalon
Copy link
Copy Markdown

@bbakerman any chance we could release an snapshot version of this branch ? i am very interested in trying this in graphql-kotlin, we have graphql servers with high load of traffic + very large queries, recently turned off Datadog automatic instrumentation as it was creating spans for every single field, and for very large queries it was creating a lot of objects causing the minor collection time of GC to take a lot, after turning off the DD instrumentation saw this improvement

image

i suspect there will be even more improvement by avoiding creating CFs for fields that are already in memory after being synchronously resolved.

@bbakerman
Copy link
Copy Markdown
Member Author

any chance we could release an snapshot version of this branch ?

I am asking the team if we are able to do this.

@bbakerman
Copy link
Copy Markdown
Member Author

@samuelAndalon

I have kicked off a custom release : https://github.com/graphql-java/graphql-java/actions/runs/7927320202

The artifact will be called 0.0.0-2024-02-16T00-00-00-support_objects_or_cfs

@bbakerman
Copy link
Copy Markdown
Member Author

@samuelAndalon
Copy link
Copy Markdown

samuelAndalon commented Feb 16, 2024

oh right, so i tried the snapshot, and was able to see that now we are not creating CFs for values that are already in memory and those can be synchronously resolved, the problem here, as properly mentioned in the code comments, is the InstrumentationContext interface, which will cause issues with the other hooks, and some of the default graphql-java instrumentations, like batching.

What i would like to propose is to create a new InstrumentationContext that will be used for beginField, beginFieldFetch, beginFieldComplete and beginFieldListComplete, all these can use and share a custom interface, just like ExecutionStrategyInstrumentationContext, this is going to be problematic, yes, it is going to be a breaking chance, yes, but i absolutely feel that the performance of the engine will improve.

In terms of if the instrumentations are going to break, i dont think they will, batching instrumentation only needs to know if a field was dispatched, it actually does not use the value.

@bbakerman
Copy link
Copy Markdown
Member Author

Yup this code is very much a Proof of Concept (POC)

And yes I think the InstrumentationContext should be changed anyway to not send in the CF to be completed. Semantically the same thing happens onCompleted

public interface InstrumentationContext<T> {

    void onDispatched(CompletableFuture<T> result);

    void onCompleted(T result, Throwable t);

}

could become

public interface InstrumentationContext<T> {

    void onDispatched();

    void onCompleted(T result, Throwable t);

}

in theory if you were composing on that CF during onDispatched, you could do the same via onCompleted

@dondonz
Copy link
Copy Markdown
Member

dondonz commented Apr 16, 2024

Merged into master in separate PR

@dondonz dondonz closed this Apr 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

breaking change requires a new major version to be relased Not to be merged spikes or other stuff that should never or not yet to be merged Stale

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants