Skip to content

Optimize memory usage of Datom type#263

Closed
ronbu wants to merge 1 commit into
tonsky:masterfrom
ronbu:master
Closed

Optimize memory usage of Datom type#263
ronbu wants to merge 1 commit into
tonsky:masterfrom
ronbu:master

Conversation

@ronbu
Copy link
Copy Markdown

@ronbu ronbu commented May 19, 2018

As I would like to use Datascript for large datasets, I am trying to optimise the memory usage.
This patch should reduce memory usage by at least 50%.

[tuned-datascript "0.16.5-SNAPSHOT"]

There are some changes to the API:

  • Attributes are stored as Integers (d/datom 0 :attr :val) is not allowed anymore
  • Queries return attributes as Integers, not Keywords
  • You need to define the Attribute order in the schema -> {:attr {:db/order 0}}
  • Transaction Id's are created from 2^24 -1 downwards
  • New range for Entity ID's is from 0 to 2^24
  • New range for valid Transaction ID's is from 2^24 to (2^24) - 2^20

Internally the new Datom type is implemented as a pointer to :v and a number, which stores Eid and Aid in the lower 32 bits. The remaining 21 Bits in the double number are used for the added flag and the transaction ID.

Needs some work on the JS wrapper and I could not figure out how to benchmark Datascript.

@tonsky
Copy link
Copy Markdown
Owner

tonsky commented May 21, 2018

Wow, lots of effort here! I don’t think I see how something like this could be merged into DataScript database. My concerns are existing clients, breaking changes, compatibility and JS portability. But I’m happy to see it exists as a separate project on its own.

Couple of questions though, purely out of curiosity:

  1. Why double? Wouldn’t simply using long give you more bits to work with?

This patch should reduce memory usage by at least 50%.

  1. Have you actually measured it delivers on that goal?

I could not figure out how to benchmark Datascript.

Add to profiles.clj:

:bench  { :source-paths ["bench/src"] }

Then run (in datascript dir):

lein with-profiles +bench trampoline run -m datascript.bench/bench-all

@ronbu
Copy link
Copy Markdown
Author

ronbu commented May 22, 2018

Thanks for the feedback. I will close this PR.
If anybody is interested, my fork can be found at https://github.com/ronbu/datascript

Why double? Wouldn’t simply using long give you more bits to work with?

It's the only number type available on JavaScript Engines.
Using two 32bit number values is another option i need to investigate.

Have you actually measured it delivers on that goal?

I measured memory usage in Chrome with the following code:

Regular Datascript (167 MB)

  (def db
    (d/init-db (for [i (range 1e6)]
                (d/datom i :name i))
              {:name {:db/order 0}}))

With optimized Datom type (76 MB)

(def db
    (d/init-db (for [i (range 1e6)]
                (d/datom i 0 i))
              {:name {:db/order 0}}))

@ronbu ronbu closed this May 22, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants