2

I need some help parsing json so I can perform analysis on it. It's remote station measurement data that comes with location.

I've got to answer two simple questions and Ive got so far but it's giving me a brain block trying to figure some of the behaviour.

Nested arrays (four deep) is the main issue. I can't work out why lateral joins on jsonb_array_elements calls aren't doing the job.

I'm using postgres 9.4.4

Sample json is below as is the pg/sql I'm trying it with.

Here's the sample json. Anatomy/spec of the data and a pretty version can be found below:

{ "SiteRep": { "Wx": { "Param": [ { "name": "D", "units": "compass", "$": "Wind Direction" }, { "name": "H", "units": "%", "$": "Screen Relative Humidity" } ] }, "DV": { "dataDate": "2014-09-23T20:00:00Z", "type": "O", "Location": [ { "i": "3002", "lat": "60.749", "lon": "-0.854", "name": "BALTASOUND", "country": "SCOTLAND", "continent": "EUROPE", "elevation": "15.0", "Period": [ { "type": "Day", "value": "2014-09-22Z", "Rep": [ { "D": "SSW", "H": "88.5", "$": "1200" }, { "D": "S", "H": "91.6", "$": "1260" }, { "D": "SSW", "H": "92.8", "$": "1320" }, { "D": "W", "H": "92.8", "$": "1380" } ] }, { "type": "Day", "value": "2014-09-23Z", "Rep": [ { "D": "W", "H": "88.5", "$": "0" }, { "D": "W", "H": "86.7", "$": "60" }, { "D": "W", "H": "86.2", "$": "120" }, { "D": "WSW", "H": "87.9", "$": "180" } ] } ] }, { "i": "3005", "lat": "60.139", "lon": "-1.183", "name": "LERWICK (S. SCREEN)", "country": "SCOTLAND", "continent": "EUROPE", "elevation": "82.0", "Period": [ { "type": "Day", "value": "2014-09-22Z", "Rep": [ { "D": "SSW", "H": "95.3", "$": "1200" }, { "D": "SSW", "H": "97.4", "$": "1260" }, { "D": "SW", "H": "97.3", "$": "1320" }, { "D": "W", "H": "94.1", "$": "1380" } ] }, { "type": "Day", "value": "2014-09-23Z", "Rep": [ { "D": "WNW", "H": "89.1", "$": "0" }, { "D": "WNW", "H": "88.0", "$": "60" }, { "D": "W", "H": "90.9", "$": "120" }, { "D": "W", "H": "90.9", "$": "180" } ] } ] } ] } } }

Which I'm loading in with this code:

drop table if exists foo;
create table public.foo (data jsonb);

truncate foo;
copy foo from <example json as above>
DELIMITERS e'\x02';

First specific task is to count the number of "D" keys which have value "S".
Second task is to group these by different values of "$" key.

So I'm just trying to parse the json into pg tables with this:

select
"locations_entries"."date" as "location_date",
"locations_entries"."value" -> 'i' as "i",
"locations_entries"."value" -> 'lat' as "lat",
"locations_entries"."value" -> 'lon' as "lon",
-- "locations_entries"."value" -> 'Period' as "Period",
"locations_entries"."value" -> 'Period' -> 'value' as "p_value",
"locations_entries"."value" -> 'Period' -> 'type' as "type",
left("locations_entries"."value"::text,64) || '...'  as "value"
from
(
select "data" -> 'SiteRep' -> 'DV' -> 'dataDate' as "date",
"data" -> 'SiteRep' -> 'DV' -> 'type' as "type",
"location_entry"."value"
from foo,
lateral jsonb_array_elements(("data" -> 'SiteRep' -> 'DV' -> 'Location')::jsonb) location_entry --,
--lateral jsonb_array_elements("data" -> 'SiteRep' -> 'DV' -> 'Location' -> 'Period' ) period_entry
) "locations_entries"
order by "i"

Note the second lateral join is commented out. More on that in a second

The call above returns two rows:

    location_date      |   i    |   lat    |   lon    | p_value | type |                                value                                
------------------------+--------+----------+----------+---------+------+---------------------------------------------------------------------
"2014-09-23T20:00:00Z" | "3002" | "60.749" | "-0.854" |         |      | {"i": "3002", "lat": "60.749", "lon": "-0.854", "name": "BALTASO...
"2014-09-23T20:00:00Z" | "3005" | "60.139" | "-1.183" |         |      | {"i": "3005", "lat": "60.139", "lon": "-1.183", "name": "LERWICK...

Now if I uncomment the second lateral join it returns zero rows:

 location_date | i | lat | lon | value | type | value 
---------------+---+-----+-----+-------+------+-------
(0 rows)

Yet if I look at what the value of "data" -> 'SiteRep' -> 'DV' -> 'Location' -> 'Period' is to see what I'm calling jsonb_array_elements on it's:

   i    |                               Period                                
--------+---------------------------------------------------------------------
 "3002" | [{"Rep": [{"$": "1200", "D": "SSW", "H": "88.5"}, {"$": "1260", ...
 "3005" | [{"Rep": [{"$": "1200", "D": "SSW", "H": "95.3"}, {"$": "1260", ...

Which (I think) should be fine.

Any ideas what I'm doing wrong and how I can parse the multiple nested arrays into tables?

I tried here and here but can't apply these solutions to my json as they don't go deep enough or I'm misunderstanding my json.

1
  • The anatomy of the json can be found at the bottom of the page here Commented Sep 22, 2015 at 16:10

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.