Optimize entities #227

rauhs · 2017-06-07T19:18:28Z

Also partially dicussed on Slack:

Currently an attribute lookup on an entity does a search into the btset and then caches the result. This is inefficient since the all the attribute values are right next to each other in the eavt index.
Getting them all at once and storing them in a cache makes sense, but has huge problem:

What if the entity has a many ref with many many references?
What if the entity has MANY attributes

I think 2) is unlikely a use-case and can be ignored. However 1) is an issue.

Ideas:

Save an Iter instance that represents (Datom. eid nil nil nil nil), ie, all Datoms belonging to an entity. This is fast to get.
Enhance Iter to allow fast searching within an Iter. This would mean we can avoid the cache of an entity and just lookup in the Iter.
For avoiding performance problems with 1) we could add a heuristic to fall back to the current implementation when the count of an Iter is "too large" (> 20??). For this implement bounded-count for Iter. See #226

thedavidmeister · 2017-06-17T03:04:21Z

👍

rauhs · 2017-12-21T13:07:51Z

Just want to continue dumping my thoughts: I ended up implementing this for my datascript fork (which adds a few other features) in a slightly different way:

Instead of bounded counting an Iter (which still walks over 1-2 btset leaves) and then again iterating over it when the bounded count was <=20, I now just iterate over the Iter right away but stop after 20 attributes (so as to avoid realizing an entity which has many attributes or --more commonly-- an entity that has a ref-many with many references).

So entity/lookup-entity is now roughly:

If in cache -> return
If not. (cond touched ...)
a) if touched is nil: This is the very first attribute access, do the (db/-search ...) and start reducing over the Iter. Add each attribute to the cache but STOP once we reach 20 attributes or the Iter is reduced over. If we hit the limit of 20: We now have to see if we were in the middle of adding a mutli-ref attribute, since we have to remove these again from the cache if we were. Set the touched to either true (didnt hit the limit) or false (we did hit the limit and there was more in the btset Iter).
b) If touched is true => not found
c) If touched is false => get the attribute with an Iter and add to cache.

For the majority of Entities, this means we only once search the btset and get all the attributes in the cache quickly. For entities with many attributes this is still fast since we're still at near-constant time here.

tonsky / datascript

Optimize entities #227

Optimize entities #227

rauhs commented Jun 7, 2017

thedavidmeister commented Jun 17, 2017

rauhs commented Dec 21, 2017

tonsky / datascript

Sponsor tonsky/datascript

Join GitHub today

Optimize entities #227

Optimize entities #227

Comments

rauhs commented Jun 7, 2017

thedavidmeister commented Jun 17, 2017

rauhs commented Dec 21, 2017

Essential cookies

Always active

Analytics cookies