The two key benefits that I constantly hear lauded about entity systems are 1) the easy construction of new kinds of entities due to not having to tangle with complex inheritance hierarchies, and 2) cache efficiency.
Note that (1) is a benefit of component-based design, not just ES/ECS. You can use components in many ways that do not have the "systems" part and they work just fine (and plenty of both indie and AAA games use such architectures).
Systems need to be able to work with more than one component, ie both the rendering and physics system need to access the transform component.
Most ECS sort their component containers by Entity ID, meaning that the corresponding components in each group will be in the same order.
This means that if you're linearly iterating over graphics component you're also linearly iterating over the corresponding transform components. You might be skipping some of the transforms (since you may have physics trigger volumes that you don't render or such) but since you're always skipping forward in memory (and by not particularly huge distances, usually) you're still going to have efficiency gains.
This is similar to how you have the Structure Of Arrays (SOA) being the recommended approach for HPC. The CPU and cache can deal with multiple linear arrays almost as well as it can deal with a single linear array, and far better than it can deal with random memory access.
You can have components store pointers to other components, or pointers to entities that store pointers to components.
Somewhat unrelated, but... don't do that. Use handles, not pointers.
You can ensure that every component array is 'n' large, where 'n' is the number of entities alive in the system
Since you use handles and not pointers, you can resize the arrays without fear of breaking object references.
You can also use a "chunked array" approach (an array of arrays) similar to many common std::deque
implementation (though without the pitifully-small chunk size of said implementations) if you want to allow pointers for some reason or if you have measured problems with array resize performance.
Secondly, this is all assuming that entities are processed linearly in a list every frame/tick, but in reality this is not often the case
It depends on the entity. Yes, for many use cases, it's not true. Indeed, this is why I so strongly stress the difference between component-based design (good) and entity-system (a specific form of CBD).
Some of your components will certainly be easy to process linearly. Even in normally "tree heavy" use cases we have definitely seen performance increases from using tightly packed arrays (mostly in cases involving an N of a few hundred at most, like AI agents in a typical game).
Say you use a sector/portal renderer or an octree to perform occlusion culling. You might be able to store entities contiguously within a sector/node, but you're going to be jumping around whether you like it or not.
You'd be surprised how much the array still helps. You're jumping around in a much smaller region of memory than "anywhere" and even with all the jumping you're still much more likely to end up in something in cache. With a tree of a certain size or less, you might even be able to prefetch the whole thing into cache and not ever have a cache miss on that tree.
There are also tree structures that are built to live in tightly packed arrays. For instance, with your octree, you can use a heap-like structure (parents before children, siblings next to each other) and ensure that even when you "drill down" the tree you're always iterating forward in the array, which helps the CPU optimize the memory accesses / cache lookups.
Which is an importnat point to make. An x86 CPU is a complex beast. The CPU is running an optimizer at runtime on your machine code, breaking it into smaller microcode and reordering instructions, predicting memory access patterns, etc. Data access patterns matter more than may be readily apparent if all you have is a high-level understanding of how the CPU or cache work.
Then you have other systems, which might prefer entities stored in some other order.
You could store them multiple times. Once you strip down your arrays to the bare minimum details, you might find you actually save memory (since you've removed your 64-bit pointers and other nonsense and can use small 16-bit indices or so on) with this approach.
You could interleave your entity array instead of keeping separate arrays, but you're still wasting memory
This is antithetical to good cache usage. If all you care about are the transforms and graphics data, why make the machine spend time pulling in all that other crap for physics and AI and input and debug and so on?
That's the point usually made in favor of ECS vs monolithic game objects (though not really applicable when comparing to other component-based architectures).
AI is pointless if it can't affect the transform or animation state used for an entity's rendering.
Just because AI can't efficiently access transform data linearly doesn't mean that no other system can use that data layout optimization effectively. You can use a packed array for transform data without stopping game logic systems from doing things the ad hoc way game logic systems usually do things.
You're also forgetting code cache. When you use the systems approach of ECS (unlike some more naive component architecture) you're guaranteeing that you're running the same small loop of code and not jumping back and forth through virtual function tables to an assortment of random Update
functions strewn all over your binary. So in the AI case, you really want to keep all your different AI components (because certainly you have more than one so that you can compose behaviors!) in separate buckets and process each list separately in order to get the best code cache usage.
With a delayed event queue (where a system generates a list of events but doesn't dispatch them until the system finishes processing all entities) you can ensure that your code cache is used well while keeping events.
Using an approach where each system knows which event queues to read out of for the frame, you can even make reading events fast. Or faster than without, at least.
Remember, performance isn't absolute. You don't need to eliminate every last single cache miss to start seeing the performance benefits of good data-oriented design.
Or perhaps there's a hybrid approach that everyone's using but nobody's talking about
This is what advocate: use good data-oriented approaches to components where the performance is critical. Use simpler architecture where simplicity improves development time. Don't shoe-horn every single component into a strict over-definition of componentization like ECS proposes. Develop your component architecture in such a way that you can easily use ECS-like approaches where they make sense and use simpler component structure where ECS-like approach don't make sense (or make less sense than a tree structure, or so on).
You might note that a lot of people talk about ECS without really understanding what it even is. I see Unity referred to as an ECS with depressing frequency, illustrating that too many game developers equate "ECS" with "Components" and pretty much ignore the "Entity System" part entirely. You see a lot of love heaped on ECS on the Internet when a large portion of the people are really just advocating component-based design, not an actual ECS. At this point it's almost pointless to argue it; ECS has been corrupted from its original meaning into a generic term and you might as well accept that "ECS" doesn't mean the same thing as "data-oriented ECS". :/