PostgreSQL - Create and join array from multiple rows based on ID

Question

I have two tables:

The table LINKS has:

LINK_ID --- integer, unique ID
FROM_NODE_X -- numbers/floats, indicating a geographical position
FROM_NODE_Y --
FROM_NODE_Z --
TO_NODE_X --
TO_NODE_Y --
TO_NODE_Z --

The table LINK_COORDS has:

LINK_ID --- integer, refers to above UID
ORDER --- integer, indicating order
X ---
Y ---
Z ---

Logically each LINK consists of a number of waypoints. The final order is:

FROM_NODE , 1 , 2 , 3 , ... , TO_NODE

A link has at least two waypoints (FROM_NODE, TO_NODE), but can have a variable number of waypoints in between (0 to 100+).

I now would need a way to aggregate, sort and store the waypoints of each link in an array which later will be used to draw a line.

I'm struggling with the LINK_COORDS being available as individual rows. Having the start and end positions in the other (LINKS) table doesn't help either. If I had a way to at least get all the LINK_COORDS joined/updated to the LINKS table I probably could work out the rest myself again. So if you have an idea on how to get that far, it'd be much appreciated already.

Considering performance would be nice (the tables have somewhere between 500k to 1mio entries now and will have multiples of that later), but is not essential for now.

EDIT: Thanks for the suggestion, a-horse-with-no-name. I chose to create the point geometries (PostGIS) for each XYZ before this step, so in the end there's only an array of points to create from the individual points. The adapted SQL

UPDATE "Link"
SET "POINTS" = 
    array_append(
        (array_prepend(
            "FROM_POINT",
            (SELECT array_agg(lc."POINT" ORDER BY lc."COUNT")
                FROM "LinkCoordinate" lc 
                WHERE lc."LINK_ID" = "Link"."LINK_ID")))
        , "TO_POINT")

however is running extremely slow: Running it exemplary on 10 links required ~120 seconds. Running it for all the 1,3mio links and plenty more linkcoords would probably take somewhere around half a year. Not really ideal.

How can I figure out where this immense slowness originates from?

If I get the source data in a pre-ordered format (so linkcoordinates of each link_ID), would this allow me to significantly speed up the SQL query?

EDIT: It appears the main slowdown originates from the SELECT subquery used in the array_agg() function. Everything else (incl. ordering) does not really cause any slowdown.

My current guess is that the SELECT query iterates over the entirety of "LinkCoordinate" for each and every link, making it work much harder than it has to, as all LinkCoordinates belonging to a Link are always stored in 'blocks' of rows. A single, sequential processing of the LinkCoordinates would be sufficient, really.

user330315user330315 · Accepted Answer · 2014-03-30 11:00:21Z

something like this maybe:

select l.link_id, 
       min(l.from_node_x) as from_node_x, 
       min(l.from_node_y) as from_node_y, 
       min(l.from_node_z) as from_node_z,
       array_agg(lc.x order by lc."ORDER") as points_x,
       array_agg(lc.y order by lc."ORDER") as points_y,
       array_agg(lc.z order by lc."ORDER") as points_z,
       min(l.to_node_x) as to_node_x, 
       min(l.to_node_y) as to_node_y, 
       min(l.to_node_z) as to_node_z
from links l 
  join link_coords lc on lc.link_id = l.link_id
group by l.link_id;

The min() is necessary due to the group by but won't change the result as all values from the links are the same anyway.

Another possibility is to use a scalar subquery. I'm unsure which of them is faster though - but the join/group by is probably more efficient.

select l.link_id, 
       l.from_node_x, 
       l.from_node_y, 
       l.from_node_z,
       (select array_agg(lc.x order by lc."ORDER") from link_coords lc where lc.link_id = l.link_id) as points_x,
       (select array_agg(lc.y order by lc."ORDER") from link_coords lc where lc.link_id = l.link_id) as points_y,
       (select array_agg(lc.z order by lc."ORDER") from link_coords lc where lc.link_id = l.link_id) as points_z,
       l.to_node_x, 
       l.to_node_y, 
       l.to_node_z
from links l

Collectives™ on Stack Overflow

PostgreSQL - Create and join array from multiple rows based on ID

1 Answer 1

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Your Answer

Sign up or log in

Post as a guest

Related