Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize entities #227

Open
rauhs opened this issue Jun 7, 2017 · 2 comments
Open

Optimize entities #227

rauhs opened this issue Jun 7, 2017 · 2 comments

Comments

@rauhs
Copy link
Contributor

@rauhs rauhs commented Jun 7, 2017

Also partially dicussed on Slack:

Currently an attribute lookup on an entity does a search into the btset and then caches the result. This is inefficient since the all the attribute values are right next to each other in the eavt index.
Getting them all at once and storing them in a cache makes sense, but has huge problem:

  1. What if the entity has a many ref with many many references?
  2. What if the entity has MANY attributes

I think 2) is unlikely a use-case and can be ignored. However 1) is an issue.

Ideas:

  • Save an Iter instance that represents (Datom. eid nil nil nil nil), ie, all Datoms belonging to an entity. This is fast to get.
  • Enhance Iter to allow fast searching within an Iter. This would mean we can avoid the cache of an entity and just lookup in the Iter.
  • For avoiding performance problems with 1) we could add a heuristic to fall back to the current implementation when the count of an Iter is "too large" (> 20??). For this implement bounded-count for Iter. See #226
@thedavidmeister
Copy link

@thedavidmeister thedavidmeister commented Jun 17, 2017

👍

@rauhs
Copy link
Contributor Author

@rauhs rauhs commented Dec 21, 2017

Just want to continue dumping my thoughts: I ended up implementing this for my datascript fork (which adds a few other features) in a slightly different way:

Instead of bounded counting an Iter (which still walks over 1-2 btset leaves) and then again iterating over it when the bounded count was <=20, I now just iterate over the Iter right away but stop after 20 attributes (so as to avoid realizing an entity which has many attributes or --more commonly-- an entity that has a ref-many with many references).

So entity/lookup-entity is now roughly:

  1. If in cache -> return
  2. If not. (cond touched ...)
    a) if touched is nil: This is the very first attribute access, do the (db/-search ...) and start reducing over the Iter. Add each attribute to the cache but STOP once we reach 20 attributes or the Iter is reduced over. If we hit the limit of 20: We now have to see if we were in the middle of adding a mutli-ref attribute, since we have to remove these again from the cache if we were. Set the touched to either true (didnt hit the limit) or false (we did hit the limit and there was more in the btset Iter).
    b) If touched is true => not found
    c) If touched is false => get the attribute with an Iter and add to cache.

For the majority of Entities, this means we only once search the btset and get all the attributes in the cache quickly. For entities with many attributes this is still fast since we're still at near-constant time here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
2 participants
You can’t perform that action at this time.