I'm trying to run an update query on the column answer_date
of a table P
. I want to fill each row of answer_date
of P
with the unique date from create_date
column of H
where P.ID1
matches with H.ID1
and where P.acceptance_date
is not empty.
The query takes a long while to run, so I check the interim changes in answer_date
but the entire column is empty like it was created.
Btree indices exists on all the mentioned columns.
Is there something wrong with the query?
UPDATE P
SET answer_date = subquery.date
FROM (SELECT DISTINCT H.create_date as date
FROM H, P
where H.postid=P.acceptance_id
) AS subquery
WHERE P.acceptance_id is not null;
Table schema is as follows:
Table "public.P"
Column | Type | Modifiers | Storage | Stats target | Description
-----------------------+-----------------------------+-----------+----------+--------------+-------------
id | integer | not null | plain | |
acceptance_id | integer | | plain | |
answer_date | timestamp without time zone | | plain | |
Indexes:
"posts_pkey" PRIMARY KEY, btree (id)
"posts_accepted_answer_id_idx" btree (acceptance_id) WITH (fillfactor='100')
and
Table "public.H"
Column | Type | Modifiers | Storage | Stats target | Description
-------------------+-----------------------------+-----------+----------+--------------+-------------
id | integer | not null | plain | |
postid | integer | | plain | |
create_date | timestamp without time zone | not null | plain | |
Indexes:
"H_pkey" PRIMARY KEY, btree (id)
"ph_creation_date_idx" btree (create_date) WITH (fillfactor='100')
Table P
as 70 million rows and H
has 220 million rows.
Postgres version is 9.6 Hardware is a Windows laptop with 8Gb of RAM.
DISTINCT
is not a function! It's a part ofSELECT DISTINCT
, and applies to the whole selected rows. (select distinct (c1), c2...
eq.select distinct c1, c2...
eq.select distinct c1, (c2)...
etc.) – jarlh yesterdayJOIN
syntax. Easier to write (without errors), easier to read (and maintain), and easier to convert to outer join if needed. – jarlh yesterday