I am not super experienced with databases, just have a college level course under my belt. I am trying to see if an existing query can be optimized. Currently on average takes over 300ms. There is a table of assessment responses (dbname.responses) which is about 14 million records. The query tries to obtain a district average for a particular assessment. So for instance average score on exam 1, for all the lastest responses from students taking that exam. The response table essentially contains all the responses for assessments. only the latest response from a student is factored into the score. Here it goes:
SELECT internal_scorable_key,
sum(score),
sum(case when response_type = 'NOT_SCORED_RESPONSE' then 1 else 0 end) AS "not_valid_responses",
sum(case when response_type = 'IS_SCORED_RESPONSE' then 1 else 0 end) AS "is_valid_responses"
FROM (
SELECT DISTINCT ON (administration_key, internal_scorable_key)
administration_key, internal_scorable_key, assesment_response_key, score, response_type
FROM dbname.response
JOIN dbname.assessment_admin USING (administration_key)
WHERE
is_disabled=false
AND internal_school_yr_key=5
AND internal_inst_key_of_assessment IN (4564)
AND internal_assess_key = (SELECT internal_assess_key FROM dbname.internal_assess WHERE ext_id = '11300-2')
ORDER BY administration_key, internal_scorable_key, assesment_response_key DESC
) all_the_things
GROUP BY internal_scorable_key
dbname.response Table...
response_key bigint NOT NULL,
administration_key bigint NOT NULL,
internal_scorable_key bigint NOT NULL,
score integer NOT NULL,
response_type character varying(255) NOT NULL,
client_create_date timestamp with time zone NOT NULL,
upload_date timestamp with time zone NOT NULL,
response_source character varying(255) NOT NULL,
print_id character varying(255) NOT NULL,
assessment_administration_page_key bigint,
constructed_response_key bigint,
answer_indexes text NOT NULL,
dbname.assessment_admin Table...
administration_key bigint NOT NULL,
internal_assess_key bigint NOT NULL,
internal_student_key bigint NOT NULL,
administration_id bigint NOT NULL,
internal_inst_key_of_assessment bigint,
account_key bigint,
internal_school_yr_key bigint NOT NULL,
is_disabled boolean NOT NULL,
dbname.internal_assess Table...
internal_assess_key bigint NOT NULL,
ext_id character varying(255) NOT NULL,
created_at timestamp with time zone NOT NULL DEFAULT now(),
here is the postgres explain image:
"HashAggregate (cost=51.89..52.51 rows=62 width=28)"
" -> Unique (cost=49.87..50.34 rows=62 width=44)"
" InitPlan 1 (returns $0)"
" -> Seq Scan on internal_assess (cost=0.00..1.77 rows=1 width=8)"
" Filter: ((external_id)::text = '2568-1'::text)"
" -> Sort (cost=48.10..48.25 rows=62 width=44)"
" Sort Key: response.administration_key, response.internal_scorable_key, response.response_key"
" -> Nested Loop (cost=4.37..46.25 rows=62 width=44)"
" -> Bitmap Heap Scan on assessment_admin (cost=4.37..18.34 rows=3 width=8)"
" Recheck Cond: (internal_assess_key = $0)"
" Filter: ((NOT is_disabled) AND (internal_school_yr_key = 1) AND (internal_inst_key_of_assessment = 1))"
" -> Bitmap Index Scan on student_assessment_administration_id_key (cost=0.00..4.37 rows=16 width=0)"
" Index Cond: (internal_assess_key = $0)"
" -> Index Scan using idx_response_assessmentadministrationkey on response (cost=0.00..9.03 rows=27 width=44)"
" Index Cond: (administration_key = assessment_admin.administration_key)"
the only indexes are on the responses table:
CREATE INDEX idx_response_assessmentadministrationkey
ON response
USING btree
(administration_key);
and
CREATE INDEX idx_response_uploaddate
ON response
USING btree
(upload_date);
EXPLAIN ANALYZE
output, or better yet, paste it into explain.depesz.com and post the link generated after you hit "submit". Also, show the indexes. Have you runVACUUM ANALYZE
or even just plainANALYZE
on those tables before running your query? What version of Postgresql are you using? – bma Oct 30 at 4:34