Segfault with pg_stat_wait_history on replics #1

dev1ant · 2015-07-21T09:46:37Z

pgtest02f/postgres R # show shared_preload_libraries ;
                shared_preload_libraries
---------------------------------------------------------
 pg_stat_statements,pg_stat_kcache,pg_stat_wait,repl_mon
(1 row)

Time: 0.366 ms
pgtest02f/postgres R # select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

Time: 0.715 ms
pgtest02f/postgres R # select * from pg_stat_wait_current ;
  pid  |           sample_ts           | class_id | class_name | event_id | event_name | wait_time | p1 | p2 | p3 | p4 | p5
-------+-------------------------------+----------+------------+----------+------------+-----------+----+----+----+----+----
 28182 | 2015-07-08 11:56:18.611896+03 |        4 | Latch      |        0 | Latch      |    841512 |  0 |  0 |  0 |  0 |  0
 28260 | 2015-07-08 11:56:18.611896+03 |        4 | Latch      |        0 | Latch      |     33178 |  0 |  0 |  0 |  0 |  0
 28259 | 2015-07-08 11:56:18.611896+03 |        4 | Latch      |        0 | Latch      |  29602538 |  0 |  0 |  0 |  0 |  0
 28270 | 2015-07-08 11:56:18.611896+03 |        5 | Network    |        0 | READ       |    763131 |  0 |  0 |  0 |  0 |  0
 28266 | 2015-07-08 11:56:18.611896+03 |        5 | Network    |        0 | READ       |    425323 |  0 |  0 |  0 |  0 |  0
(5 rows)

Time: 5.924 ms
pgtest02f/postgres R # select * from pg_stat_wait_history ;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Time: 831.582 ms
psql: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
pgtest02f/ R >

Here is the backtrace of coredump:

(gdb) bt
#0  SetLatch (latch=0x0) at pg_latch.c:518
#1  0x00007f99c51a8b73 in pg_stat_wait_get_history (fcinfo=0x7ffffedff180) at pg_stat_wait.c:558
#2  0x000000000059ed67 in ExecMakeTableFunctionResult (funcexpr=0x161c630, econtext=0x161be58, argContext=<value optimized out>, expectedDesc=0x161d9e8, randomAccess=0 '\000') at execQual.c:2196
#3  0x00000000005b07f2 in FunctionNext (node=0x161bd48) at nodeFunctionscan.c:95
#4  0x000000000059f94e in ExecScanFetch (node=0x161bd48, accessMtd=0x5b0540 <FunctionNext>, recheckMtd=0x5afe70 <FunctionRecheck>) at execScan.c:82
#5  ExecScan (node=0x161bd48, accessMtd=0x5b0540 <FunctionNext>, recheckMtd=0x5afe70 <FunctionRecheck>) at execScan.c:132
#6  0x0000000000598878 in ExecProcNode (node=0x161bd48) at execProcnode.c:426
#7  0x00000000005b1c19 in ExecSort (node=0x161bad8) at nodeSort.c:103
#8  0x00000000005987e8 in ExecProcNode (node=0x161bad8) at execProcnode.c:468
#9  0x00000000005adc68 in ExecMergeJoin (node=0x15f0a00) at nodeMergejoin.c:730
#10 0x0000000000598818 in ExecProcNode (node=0x15f0a00) at execProcnode.c:453
#11 0x00000000005973e2 in ExecutePlan (queryDesc=0x15f3f90, direction=<value optimized out>, count=0) at execMain.c:1490
#12 standard_ExecutorRun (queryDesc=0x15f3f90, direction=<value optimized out>, count=0) at execMain.c:319
#13 0x00007f99c55b317b in pgss_ExecutorRun (queryDesc=0x15f3f90, direction=ForwardScanDirection, count=0) at pg_stat_statements.c:875
#14 0x000000000068c2d7 in PortalRunSelect (portal=0x1505c10, forward=<value optimized out>, count=0, dest=<value optimized out>) at pquery.c:946
#15 0x000000000068d501 in PortalRun (portal=0x1505c10, count=9223372036854775807, isTopLevel=1 '\001', dest=0x1652f50, altdest=0x1652f50, completionTag=0x7ffffedffa70 "") at pquery.c:790
#16 0x0000000000689b8e in exec_simple_query (query_string=0x14f8f80 "select * from pg_stat_wait_history ;") at postgres.c:1072
#17 0x000000000068b208 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x14df980 "postgres", username=<value optimized out>) at postgres.c:4074
#18 0x0000000000633fdd in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4164
#19 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:3829
#20 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1597
#21 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1244
#22 0x00000000005cbc28 in main (argc=3, argv=0x14deaa0) at main.c:228
(gdb)

This is all happening with waits_monitoring_94 branch.

dev1ant · 2015-10-26T08:52:26Z

Fixed in 4c54dc0.


        Fix some corner-case issues in REFRESH MATERIALIZED VIEW CONCURRENTLY.

refresh_by_match_merge() has some issues in the way it builds a SQL query to construct the "diff" table: 1. It doesn't require the selected unique index(es) to be indimmediate. 2. It doesn't pay attention to the particular equality semantics enforced by a given index, but just assumes that they must be those of the column datatype's default btree opclass. 3. It doesn't check that the indexes are btrees. 4. It's insufficiently careful to ensure that the parser will pick the intended operator when parsing the query. (This would have been a security bug before CVE-2018-1058.) 5. It's not careful about indexes on system columns. The way to fix #4 is to make use of the existing code in ri_triggers.c for generating an arbitrary binary operator clause. I chose to move that to ruleutils.c, since that seems a more reasonable place to be exporting such functionality from than ri_triggers.c. While #1, #3, and postgres#5 are just latent given existing feature restrictions, and #2 doesn't arise in the core system for lack of alternate opclasses with different equality behaviors, #4 seems like an issue worth back-patching. That's the bulk of the change anyway, so just back-patch the whole thing to 9.4 where this code was introduced. Discussion: https://postgr.es/m/[email protected]


        Redesign the partition dependency mechanism.

The original setup for dependencies of partitioned objects had serious problems: 1. It did not verify that a drop cascading to a partition-child object also cascaded to at least one of the object's partition parents. Now, normally a child object would share all its dependencies with one or another parent (e.g. a child index's opclass dependencies would be shared with the parent index), so that this oversight is usually harmless. But if some dependency failed to fit this pattern, the child could be dropped while all its parents remain, creating a logically broken situation. (It's easy to construct artificial cases that break it, such as attaching an unrelated extension dependency to the child object and then dropping the extension. I'm not sure if any less-artificial cases exist.) 2. Management of partition dependencies during ATTACH/DETACH PARTITION was complicated and buggy; for example, after detaching a partition table it was possible to create cases where a formerly-child index should be dropped and was not, because the correct set of dependencies had not been reconstructed. Less seriously, because multiple partition relationships were represented identically in pg_depend, there was an order-of-traversal dependency on which partition parent was cited in error messages. We also had some pre-existing order-of-traversal hazards for error messages related to internal and extension dependencies. This is cosmetic to users but causes testing problems. To fix #1, add a check at the end of the partition tree traversal to ensure that at least one partition parent got deleted. To fix #2, establish a new policy that partition dependencies are in addition to, not instead of, a child object's usual dependencies; in this way ATTACH/DETACH PARTITION need not cope with adding or removing the usual dependencies. To fix the cosmetic problem, distinguish between primary and secondary partition dependency entries in pg_depend, by giving them different deptypes. (They behave identically except for having different priorities for being cited in error messages.) This means that the former 'I' dependency type is replaced with new 'P' and 'S' types. This also fixes a longstanding bug that after handling an internal dependency by recursing to the owning object, findDependentObjects did not verify that the current target was now scheduled for deletion, and did not apply the current recursion level's objflags to it. Perhaps that should be back-patched; but in the back branches it would only matter if some concurrent transaction had removed the internal-linkage pg_depend entry before the recursive call found it, or the recursive call somehow failed to find it, both of which seem unlikely. Catversion bump because the contents of pg_depend change for partitioning relationships. Patch HEAD only. It's annoying that we're not fixing #2 in v11, but there seems no practical way to do so given that the problem is exactly a poor choice of what entries to put in pg_depend. We can't really fix that while staying compatible with what's in pg_depend in existing v11 installations. Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com

dev1ant closed this Oct 26, 2015

postgrespro / postgres

Segfault with pg_stat_wait_history on replics #1

Segfault with pg_stat_wait_history on replics #1

dev1ant commented Jul 21, 2015

dev1ant commented Oct 26, 2015

postgrespro / postgres

Join GitHub today

Segfault with pg_stat_wait_history on replics #1

Segfault with pg_stat_wait_history on replics #1

Comments

dev1ant commented Jul 21, 2015

dev1ant commented Oct 26, 2015