Skip to content

New ref_ptr#1702

Closed
AnyOldName3 wants to merge 3 commits intovsg-dev:masterfrom
AnyOldName3:new-ref_ptr
Closed

New ref_ptr#1702
AnyOldName3 wants to merge 3 commits intovsg-dev:masterfrom
AnyOldName3:new-ref_ptr

Conversation

@AnyOldName3
Copy link
Copy Markdown
Contributor

This PR implements a new kind of ref_ptr that attempts to combine the best of both worlds from std::shared_ptr and classic intrusive reference-counted smart pointers like the existing implementation of vsg::ref_ptr.

vsg::ref_ptr has the advantages that:

  • It's half the size of std::shared_ptr as it only needs to hold a pointer to the object itself, not a separate reference count. This is good for cache locality as each cache line can fit twice as many pointers.
  • If you're given a plain C pointer to a reference-counted object, you can expect it to work properly if you store it in a ref_ptr, whereas std::shared_ptr would make a second control block unless you explicitly used std::enable_shared_from_this, which makes it intrusive after all.

std::shared_ptr has the advantages that:

  • It's a standard C++ feature, so people already know how to use it, and someone else maintains it.
  • It works with any normal type, even if it wasn't written with shared_ptr in mind.
  • std::weak_ptr can be locked using only atomic operations instead of requiring a mutex. On typical desktop hardware, this is far, far faster, e.g. on x86, correctly-aligned operations on less than eight bytes are always atomic, so with relaxed ordering, atomic operations can be free.
    • We've seen at least one VSG-based app where most of the frametime was burned on locking vsg::observer_ptrs because they'd been ported from std::weak_ptr, and no one expected such a huge performance difference, so no one checked whether it was causing problems.
  • It supports custom allocators per instance, and type-erases them so they can be passed to functions that expect a regular shared_ptr instead of a weird subclass.
    • When doing the intersector optimisation work, it didn't take long for waiting for allocations of runtime-sized temporaries to be the dominant factor. Swapping some from using the vsg::Allocator for the type being allocated to an appropriate std::pmr::memory_resource subclass gave significant improvements. I bodged that by making a vsg::ref_ptr subclass that would deallocate using the right memory resource, but that's not suitable for real-world usage as if any other ref_ptr to the objects had been the last one to be destroyed, it'd use the wrong deallocation function. With std::shared_ptr I could have just called std::allocate_shared with a std::pmr::polymorphic_allocator and everything would work properly.
    • In other situations, an application may know something about access patterns etc. that makes its own strategy better than relying on the VSG affinity system, e.g. knowing that some instances are only temporary, so avoiding fragmenting the memory used to hold instances used over multiple frames, or having multiple threads construct things in parallel that aren't going to be accessed at the same time but which have the same affinity, so would otherwise end up co-located. With the ability to set the allocator per instance rather than per type, these situations can be addressed.
  • It supports custom deleters per instance, and type-erases them so they can be passed to functions that expect a regular shared_ptr instead of a weird subclass.
    • It might be a more idiomatic way to do the things that vsg::Object::attemptDelete does (although I've not seen that actually be used for anything in a VSG app and it's unused in the examples, so can't be sure).
    • This can be used, e.g. to return objects to a pool for reuse rather than actually destroying them.
    • Most of the use cases I can think of or have used STL smart pointer custom deleters for in the past aren't particularly relevant for the VSG, e.g. the go-to example is managing an object returned from a C API that needs a particular function called to destroy it without having to wrap it in a RAII class with a destructor.
  • It supports fancy tricks like aliased pointers, so e.g. a function can take a std::shared_ptr<int> as a parameter and be passed a pointer to an int that's really a member of an instance of a class managed by a shared pointer, and keep the class instance alive.

The implementation in this PR:

  • Keeps the sizeof(vsg::ref_ptr<T>) == sizeof(T*) property by not pointing to the control block in the pointer instance but instead doing it from the pointed-to instance. This doesn't make the pointed-to type bigger because there were already four bytes of padding after the reference count in vsg::Object, so holding a pointer instead of an atomic integer just fills in that padding.
  • Keeps the adoption of existing instances working, as there's no new control block created when making a ref_ptr to something that's already reference counted.
  • Resembles the STL smart pointers a little more closely, so should existing C++ programmers adopting the VSG should be surprised a bit less. Also, I used existing STL code as a reference point, so hopefully that means we'll benefit from the fact that it's been so widely used that all bugs should have been found and fixed.
  • Makes vsg::observer_ptrs convertible to vsg::ref_ptr without locking a mutex, using atomic operations instead. This should be much faster in apps that use a lot of observer_ptrs (or allow apps that don't use them to start doing so).
  • Supports custom allocators per instance.
    • In my testing, this is as fast as the dodgy PMR-specific subclass but without any of the fragility or huge potential for misuse.
    • When not using a custom allocator, the overhead should be zero over the existing implementation - there's a virtual function call that wasn't there before, but the existing virtual call to attemptDelete is gone (because I don't believe that feature is necessary any more).
  • Supports custom deleters per instance.
    • This came basically for free once the support for custom allocators was implemented, as it's just calling operator() on a template argument.
    • This Should™ be a suitable alternative to the attemptDelete system, but again, there don't seem to be any public use cases for it, so I can't demonstrate how to port them. Obviously, we need to discuss this.
  • Does not support aliased pointers - you obviously can't have an intrusive reference count in a type like int, so the main use case can't work.

There are some limitations and unanswered questions, some of which are potentially easily addressed and/or already existed:

  • To adopt objects created with new, I had to make ref_ptr able to directly access the _referenceCount member, so the ref/unref methods of vsg::Object became apparently redundant, and I've commented them out for now. I'm not entirely sure adopting objects created with new is particularly important, though, as everything could just use vsg::Whatever::create or the new vsg::make_referenced<Whatever>. Forcing that would be a breaking change, so I didn't want to include it if I didn't have to, but it might be a good idea. It could also make it less error-prone to stack-allocate VSG classes, as a null reference count could be interpreted as meaning an instance isn't heap-allocated. That's something that could be checked at runtime, and if anything was misused, it'd be an easily-debugged immediate segfault rather than delayed memory corruption like the existing implementation would cause.

  • There's some heavily templated code that makes things work. By modern C++ standards, it's not that bad (and I've gone out of my way to make it more readable than some of the STL code it was based on), but it's more complicated than it was before.

    • I left out some empty base class optimisation/compressed pair optimisations all STL implementations use as they make things much less readable. If the VSG used C++20, then they could be replaced by the [[no_unique_address]] attribute, which is much more readable. This shouldn't be a big deal for now, as it's only relevant when an empty type is used as a template parameter, e.g. passing in a plain function as a deleter or using a stateless allocator, and for now, the use case I'm addressing has a std::pmr::polymorphic_allocator, which isn't stateless.

    • I've tucked some of the templatey stuff away into a vsg::detail namespace so it doesn't pollute the main vsg namespace.

    • The code ended up triggering an intentionally-enabled warning with MSVC (but it does the thing the warning warns about on purpose because it's the right thing in this specific situation), and a buggy warning with GCC, so there are some #pragmas to suppress these, which makes some things look more complicated than they are.

    • Because many VSG classes have a non-public destructor so they can't be stack allocated:

      • lots of the C++ machinery around allocators doesn't work, so things that should be one-liners are more complicated
      • lots of the things to simplify that only arrived with C++20, so they're complicated two times over
      • there's some machinery to make sure the destructor can be called where it's needed.

      As I think the justification for the vsg::Object destructor being protected has been reduced with this PR, I'd be in favour of making it public, and then these could be tidied up.

  • std::enable_shared_from_this doesn't work when called from an object's constructor (it must have been fully constructed and adopted by a std::shared_ptr first). On the other hand, vsg::Object subclasses can and do pass this to ref_ptr/observer_ptr in their constructors. There's no entirely automatic way to preserve this ability (or at least not one that's reliable - there are lots of possible hacks that would only work sometimes).

    The most sensible compromise was to make a custom type trait that controls whether the reference count is passed to the constructor, as that way, only a handful of classes needed to be changed and the rest would keep working exactly as they always have. I've gone through and adjusted all the affected classes I can find in the public VSG projects, which all turned out to be in the main VSG repo, so this shouldn't cause problems. Affected applications can copy the change from DeviceMemory etc. as it's not complicated.

    The two biggest potential issues are:

    • it's not obvious what will be affected at a glance as a basic search for this obviously gives lots of places it's not used in a constructor. I grepped with PCRE subroutines to search for the this keyword only within constructors, so believe I've got everything, but I'm pretty sure some people's brains would melt if they tried to read the regular expressions I wrote. If a search like that isn't done, then problems will only present themselves at runtime, which isn't ideal.
    • In the vsg::Object copy constructor, there was a branch where this is used and assigned to a ref_ptr. As that's the base class for everything, if I kept that, we'd need to take the ref count in the copy constructors for everything, which would be a huge pain. After investigating the branch, though, it looks like it's only reachable if a CopyOp is going to assign the same object to the same pointer once its constructor has finished anyway, so I don't think it's accomplishing anything productive. I've therefore commented it out for now, but it was originally added after the CopyOp stuff that I think makes it redundant. It would definitely be good to have a second pair of eyes on this.
  • vsg::Auxiliary is managed with a plain C pointer and explicit reference adjustment in vsg::Object, which I think is a bit odd as it looks like a vsg::ref_ptr would be fine. If that's right, then some things could be simplified a little by making that change.

  • As there's now a std::shared_ptr-style control block rather than just an atomic uint in each object, that obviously needs to live somewhere. The obvious solution is to allocate it as its own object, but that means an extra allocator call, and it was the allocator taking too long that motivated me to start this. std::make_shared and std::allocate_shared use an alternative approach - they define a templated struct that includes both the reference count and the managed object, which means a single allocator call can deal with both.
    Technically that's not required by the standard, but it does recommend it and all extant implementations do so, so the organisations maintaining STL implementations seem to have decided that it's faster in the general case.

    The VSG isn't necessarily the general case, though. Putting the control block in the same allocation as the managed object would, on the face of it, be beneficial if the reference count is typically accessed when the object is (e.g. it's usually assigned to a fresh ref_ptr when it's accessed) as they'll land in the same cache line, or when allocation itself is the most expensive operation to happen to objects, and be counterproductive if other objects are typically accessed at the same time as other objects without accessing the ref count (e.g. the record traversal) as the ref count being in the same cache line means something else isn't.

    The vsggroups example exists and on paper, would be an excellent tool for testing this kind of change.
    However, as my post the other day explains, its results are very sensitive to things that they ideally wouldn't be. Initially, testing on Windows with MSVC in the RelWithDebInfo configuration, I was seeing no performance degradation during traversal from collocating the control block, which was a pleasant surprise, but when I came to do final thorough testing, I noticed some odd results, e.g.:

    • MSVC RelWithDebInfo was actually consistently faster with this PR...
    • but MSVC Release was consistently slower (even though the sequence of instructions executed and memory locations accessed was identical to RelWithDebInfo).
    • GCC on Windows and Linux showed about the impact I'd expected, so I tried with separately-allocated control blocks...
    • which left GCC still showing a performance change (although now sometimes an improvement) despite the instructions executed and memory addresses accessed during traversal being the same with and without the PR...
    • and the gap between different MSVC builds shrinking, but not disappearing.

    This prompted my investigation into the vsggroups example, and it turned out that I could induce impacts much bigger than this PR produces by doing things that shouldn't make any difference to anything, e.g. adding some functions to various source files and only calling them after the traversal benchmark has finished.

    As far as I can tell, the factor causing the differences is really just whether the compiler/linker happen to put functions called during traversal into adjacent memory addresses that end up mapping to the same instruction cache line. This is really easy to influence in a small benchmark app like vsggroups (usually making things worse). However, in a real application with many more functions, it's much less likely to get lucky enough to have a meaningful number of functions that are called consecutively and small enough to share a cache line actually end up in the same cache line, and especially not also end up the only functions executed on a hot path.

    Depending on the machine I test on, after I've attempted to exclude the effect from binary layout, the impact on pure traversal time of collocating the control block seems to be around 3% on a machine with fast RAM and 5% on a machine with slower RAM, but these are fairly rough figures because the binary layout effects can be double digit percentages, so it's hard to isolate what's what and whether one result is more or less directly comparable to the baseline than another is.

    I've therefore opted not to collocate the control block by default so the cache pressure from nodes themselves is unchanged by this PR, and for simplicity, it's allocated with a regular new call. That will have a (likely small) performance impact in some specific cases, e.g. allocating a lot of VSG objects (but it's already been decided that it's okay for doing that to not be super fast, hence it being fine that the VSG allocator is significantly slower than plain malloc), or when a lot of things are refed/unrefed in quick succession. Sometimes, the impact will be positive, though, e.g. if a lot of objects are unrefed on different threads, as even on architectures like x86 where small memory accesses are always atomic, they're relaxed atomics rather than strict, so writes still need to stall the thread if any other core has the target address cached. Reducing the chances that atomics accessed without std::memory_order_relaxed end up in the same cache line can therefore be helpful.

    If we knew that decrementing reference counts could afford to be slow, or could predict which objects were likely to be unrefed from one thread while another thread was using them, there'd be easy gains here by allocating the reference count with std::pmr::synchronised_pool_resource, so that might be worth pursuing. It might not make any difference one way or the other to real apps, though, so there's no point until we've confirmed it matters with a profiler.

  • There might be places in the VSG where using the new abilities this provides ends up giving a performance improvement, e.g. anywhere temporary vsg::Object subclass instances are made and then discarded before the next frame. I've only investigated using it with vsg::Intersector and it won't give a statistically significant improvement there without the other changes.

Obviously, this is a fairly invasive change as it affects some of the core classes that everything uses, so it needs more careful consideration than a typical PR.

Possible alternatives

Before doing all this, I experimented with an alternative where vsg::Auxiliary held a std::optional<std::function>, which could optionally be used as a deleter instead of the delete operator.
That was much less invasive, but:

  • There are comments that imply a vsg::Auxiliary instance isn't necessarily unique to a particular vsg::Object instance, and that opens a can of worms with using the wrong deleter with the wrong memory. Potentially, this could be shored up a bit, though.
  • Performing a second allocation for the vsg::Auxiliary becomes mandatory, and that undid the performance improvement from using a specialised allocator in the first place. Potentially, something could be implemented to allow constructing a vsg::Auxiliary in the same allocation as its vsg::Object, though.
  • vsg::Auxiliary has a std::map member, and the C++ specification permits std::map to require a node to be allocated even for an empty map to represent the end node. libc++ and libstdc++ put the end node within the std::map instance, but the Microsoft STL allocates it on the heap, and Windows with MSVC is a target platform of the client app I was originally trying to optimise, so this third allocation ends up making everything slower than it started. Potentially, this could be mitigated by making vsg::Auxiliary::userObjects a std::optional<ObjectMap> instead of an ObjectMap, though, and only initialise it if it's used.
  • As this is a feature intended for things that are particularly performance-sensitive, it's a bit backwards to force them to opt into extra operations and an associated performance penalty.
  • It doesn't address the secondary goal of making vsg::observer_ptr stop being surprisingly much slower than std::weak_ptr, which we know has caught people out in the past.

It's worth having a brief mention of the idea of using std::shared_ptr instead of vsg::ref_ptr, but:

  • We know it's slower, even if we avoid using std::make_shared/std::allocate_shared to mitigate the putting-48-bytes-of-control-block-between-every-pair-of-nodes problem that undermines the vsg::IntrusiveAllocator gains.
  • Now the VSG does things that involve passing this pointers to things during constructors, those are going to be a pain to unpick.

We therefore wouldn't want to switch to std::shared_ptr without a compelling reason, and this PR reduces the number of reasons.

Next steps

I'd hoped to make a quick proof-of-concept for this and have a discussion about it at an earlier stage, but between solving edge cases that needed to work before most of the examples would run and then investigating why the performance numbers were weird, getting it to a stage where it was worth discussing has taken longer than I'd have liked.

In terms of testing this, on paper, it should be reliable, but I've only got limited access to real VSG apps, so don't want to make a blanket statement that everything will just work. The impact on framerates should be zero for apps that don't do a lot of allocations and deallocations within a frame, and if they do that, there should be new optimisations possible using the new features here that will allow better performance than before. Operations like initial loading that do a lot of allocations are expected to be a little slower, but hopefully not too much. If it's a problem, then we can look at changing how control blocks are allocated. Any traversal performance changes reported by vsggroups are more likely to be caused by its unreliability than this PR.

There's obviously some tidying to do once some of the points to discuss have been resolved, too.

@robertosfield
Copy link
Copy Markdown
Collaborator

Feels like you have too much time on your hands and not enough useful work to keep you occupied. Extra complexity and I'm not clear on what the actual benefit is.

Possibly faster observer_ptr<> to ref_ptr<> conversion but folks shouldn't be writing code with lots of observer_ptr<> that are being accessed in tight loop. I'd rather figure why folks are over-using observer_ptr<>/weak_ptr<> and understand that usage case, create a unit test for it then look at bottlenecks of the existing solution then looking at optimizing if required.

As a general note vsggroups was an interesting performance test in the early days of the VSG project before I had any proper large scene graphs to test against, things have changed now I have access to big representative scene graphs. Creating such scene graph procedural would be really cool.

Another note about vsggroups, before I changed the default aliasing in the vsg::Allocator to 8 bytes it was 4 on x86_64 machines and resulted in slightly more objects fitting in the cache and this helped vsggroups show better performance gap between shared_ptr<> and ref_ptr<>, as the original shared_ptr<> test case didn't use vsg::Allocator it wasn't affected by the change to defaulting to an alignment of 8.

For real scene graphs you see a significant improvement in performance when using vsg::Allocator over std::new/delete and this is where it's critical.

If we want to look at performance of ref_ptr<> then the optimization we should be looking at using techniques like Tracy uses to store smaller than 8 byte pointers. Potentially we could use 4 bytes and then shift the value in ref_ptr<> by the alignment to give the final place in memory and still have a larger enough addressable space for most VSG applications. Making this an option via CMake would allow users to decide when then want to use this optimization and accept smaller addressable space.

@AnyOldName3
Copy link
Copy Markdown
Contributor Author

The core problem that I'm trying to solve is the same as it was months ago. Some scenegraph operations, like intersection, inherently require things to be allocated and deallocated, and in the specific case of intersection, once I'd altered the implementation to only try intersecting a ray with the relevant subset of triangles in a mesh, more time was spent allocating and deallocating things than everything else put together. Some of them didn't really need allocating and deallocating and could just be reused, so I made those changes, but the rest needed allocation to stop being so tremendously slow.

As the VSG currently is, there's no way to opt an individual instance of a vsg::Object subclass out of the VSG allocator. This PR adds one. The intersection optimisation branch has a different one, but it's a hack to provide a proof of concept rather than something reliable enough to use in production code.

The basic idea here didn't take long to come up with or to implement, it's just do exactly what std::shared_ptr does, but hold the pointer to the control block in the object rather than every single pointer because the documentation justifying ref_ptr's existence blames that. The complexity is either identical to a typical std::shared_ptr implementation, and so nothing that should confuse anyone, or to work around VSG-specific oddities that make it incompatible with std::shared_ptr (and the PR description suggests things we could do to make the VSG more normal that would make these unnecessary).

Possibly faster observer_ptr<> to ref_ptr<> conversion but folks shouldn't be writing code with lots of observer_ptr<> that are being accessed in tight loop. I'd rather figure why folks are over-using observer_ptr<>/weak_ptr<> and understand that usage case, create a unit test for it then look at bottlenecks of the existing solution then looking at optimizing if required.

Speeding up observer_ptr was essentially just a side effect of being more like the STL pointers rather than the main goal, but it's been on my mind for a while as it's a potent footgun.

It's not possible to overuse std::weak_ptr, as in general, it's just as fast as std::shared_ptr. Overuse is specific to vsg::observer_ptr, and that's not something that's documented anywhere - everything describing it just says it's like weak_ptr, so the only way to find out that they have to be used sparingly is to introduce a performance problem, notice you've done so, and then have to undo it. We know people have been caught out by this.

For the STL type, I don't want to manually deal with the lifetime is a completely valid use case, as weak_ptr doesn't have any particularly negative performance implications versus shared_ptr unless it's really egregiously misused in ways where it would also be misuse to use shared_ptr.

For real scene graphs you see a significant improvement in performance when using vsg::Allocator over std::new/delete and this is where it's critical.

I'll have to take your word for this in the general case. Obviously, it's faster in simple apps like vsgviewer, but for the limited few real VSG applications I've had access to, so little time is spent doing the record traversal compared to application logic that doubling the time the record traversal took wouldn't have a statistically significant impact on frame times, so ditching the custom allocator system and switching to std::shared_ptr wouldn't hurt at all. For one of them, the initial startup time was impacted enough by the allocator being slower than malloc/new that it would even be an improvement. I don't expect that the apps I've seen are representative of everything, but we only need one outlier to know it's not a universal truth that an optimisation is always beneficial.

If we want to look at performance of ref_ptr<> then the optimization we should be looking at using techniques like Tracy uses to store smaller than 8 byte pointers. Potentially we could use 4 bytes and then shift the value in ref_ptr<> by the alignment to give the final place in memory and still have a larger enough addressable space for most VSG applications. Making this an option via CMake would allow users to decide when then want to use this optimization and accept smaller addressable space.

That wouldn't do anything about the situations that this PR aims to improve, where ref_ptr's existing differences to off-the-shelf smart pointers make it slower, so the things that I'm trying to make less slow would still be just as slow.

Whether or not that idea has merit, it's still too easy to get into situations where the VSG allocator is unavoidable and unacceptably slow. All the STL smart pointers and containers recognise that applications can have knowledge about allocation and usage patterns and accept a more suitable allocator to be specified per instance. That's true of most C++ libraries where performance is a goal (unless they manage to avoid allocation and deallocation altogether). Obviously, temporaries have different allocation and usage patterns to any of the existing VSG allocator affinities, and there are plenty of types that are used both as long-lived objects and as temporaries, so a solution can't just rely on affinities per type. The problem I'm hitting is a common problem that high-performance C++ applications hit, so it's got well-known and widely used solutions, including in the standard library, it's just they're not usable because of existing VSG-specific complexities like ref_ptr and the allocator affinity system via operator new overloading.

@robertosfield
Copy link
Copy Markdown
Collaborator

FYI, the allocator groups by AllocatorAffinity, there isn't any reason why this can't be extended to specifically handle frequently allocated/deallocated object types and then handle these in a special way. This would require minimal changes to existing code.

As a general rule though, avoiding lots of allocations/deallocations is best way to get good performance.

@AnyOldName3
Copy link
Copy Markdown
Contributor Author

As I said, the specific things that are a problem (vsg::ArrayState and potentially user-defined subclasses) can't avoid being allocated, and can't be assigned an affinity based on their type. What you're suggesting would solve related, hypothetical problems, but not the real problem that I've encountered and am fixing here.

@robertosfield
Copy link
Copy Markdown
Collaborator

As I said, the specific things that are a problem (vsg::ArrayState and potentially user-defined subclasses) can't avoid being allocated, and can't be assigned an affinity based on their type. What you're suggesting would solve related, hypothetical problems, but not the real problem that I've encountered and am fixing here.

When I do another review of the intersection work I'll brain storm approaches that can streamline the functionality currently handled by ArrayState.

@AnyOldName3
Copy link
Copy Markdown
Contributor Author

I'm fairly sure there's nothing that would achieve the same job without being wildly more complicated or breaking thread safety. We can't know what Vulkan state user-defined draw operations care about (especially anything that comes from future extensions or that cares about user-defined uniforms etc.) that might be set up elsewhere in the scenegraph, so we need to allow a user-defined type to be able to take responsibility, and it needs to be a separate instance of that type so intersection can be parallelised without each thread's state tracking trampling over other that of other threads. If there's something to streamline things, it'll involve breaking existing functionality.

@robertosfield
Copy link
Copy Markdown
Collaborator

I've done another code review and it's a non starter. If I had wanted to use a std::shared_ptr<> approach I would have used std::shared_ptr<>. Trying to mix ref_ptr<> and shared_ptr<> to avoid the 16 byte size of the shared_ptr<> introduces more complexity and referencing of pointers.

There is no way this will be as fast as proper intrusive reference counting which we already have and it's more complicated, which will make it more harder to maintain and less folks will look at the code and actually understand what is being done for what reasons.

This really isn't good engineering, it's changing stuff, making it less fast, less maintainable with a theoretical benefit in niche usage cases, ignoring that fact that there are other better ways to solve those problems.

A better solution for slow memory allocations and deletions is to refine the vsg::Allocator. It's not been written for performance of allocations and deletions, it's been written for performance of access of data.

A better solution for slow observer_ptr<> performance can be come up with application level design refinements to avoid over use of observer_ptr<> or if this can't be done then tackling that specifically as performance optimization tasks.

For both the memory allocation/deletion speed and oberserver_ptr<> issues the right way to tackle is to create a performance test case that is added to vsgExamples/tests and then look at the bottlenecks and how best to address them, then uses these test cases when evaluation which different approaches might work best.

I do have ideas how both these areas might be improved, neither of which have anything close to the approach taken in this PR, this PR isn't just a bit rough around the edges and needing refinement, it's just not a design that actually resolves any of the actual problems we have.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants