Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Segfault with pg_stat_wait_history on replics #1

Closed
dev1ant opened this issue Jul 21, 2015 · 1 comment
Closed

Segfault with pg_stat_wait_history on replics #1

dev1ant opened this issue Jul 21, 2015 · 1 comment

Comments

@dev1ant
Copy link

@dev1ant dev1ant commented Jul 21, 2015

pgtest02f/postgres R # show shared_preload_libraries ;
                shared_preload_libraries
---------------------------------------------------------
 pg_stat_statements,pg_stat_kcache,pg_stat_wait,repl_mon
(1 row)

Time: 0.366 ms
pgtest02f/postgres R # select pg_is_in_recovery();
 pg_is_in_recovery
-------------------
 t
(1 row)

Time: 0.715 ms
pgtest02f/postgres R # select * from pg_stat_wait_current ;
  pid  |           sample_ts           | class_id | class_name | event_id | event_name | wait_time | p1 | p2 | p3 | p4 | p5
-------+-------------------------------+----------+------------+----------+------------+-----------+----+----+----+----+----
 28182 | 2015-07-08 11:56:18.611896+03 |        4 | Latch      |        0 | Latch      |    841512 |  0 |  0 |  0 |  0 |  0
 28260 | 2015-07-08 11:56:18.611896+03 |        4 | Latch      |        0 | Latch      |     33178 |  0 |  0 |  0 |  0 |  0
 28259 | 2015-07-08 11:56:18.611896+03 |        4 | Latch      |        0 | Latch      |  29602538 |  0 |  0 |  0 |  0 |  0
 28270 | 2015-07-08 11:56:18.611896+03 |        5 | Network    |        0 | READ       |    763131 |  0 |  0 |  0 |  0 |  0
 28266 | 2015-07-08 11:56:18.611896+03 |        5 | Network    |        0 | READ       |    425323 |  0 |  0 |  0 |  0 |  0
(5 rows)

Time: 5.924 ms
pgtest02f/postgres R # select * from pg_stat_wait_history ;
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Time: 831.582 ms
psql: server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
pgtest02f/ R >

Here is the backtrace of coredump:

(gdb) bt
#0  SetLatch (latch=0x0) at pg_latch.c:518
#1  0x00007f99c51a8b73 in pg_stat_wait_get_history (fcinfo=0x7ffffedff180) at pg_stat_wait.c:558
#2  0x000000000059ed67 in ExecMakeTableFunctionResult (funcexpr=0x161c630, econtext=0x161be58, argContext=<value optimized out>, expectedDesc=0x161d9e8, randomAccess=0 '\000') at execQual.c:2196
#3  0x00000000005b07f2 in FunctionNext (node=0x161bd48) at nodeFunctionscan.c:95
#4  0x000000000059f94e in ExecScanFetch (node=0x161bd48, accessMtd=0x5b0540 <FunctionNext>, recheckMtd=0x5afe70 <FunctionRecheck>) at execScan.c:82
#5  ExecScan (node=0x161bd48, accessMtd=0x5b0540 <FunctionNext>, recheckMtd=0x5afe70 <FunctionRecheck>) at execScan.c:132
#6  0x0000000000598878 in ExecProcNode (node=0x161bd48) at execProcnode.c:426
#7  0x00000000005b1c19 in ExecSort (node=0x161bad8) at nodeSort.c:103
#8  0x00000000005987e8 in ExecProcNode (node=0x161bad8) at execProcnode.c:468
#9  0x00000000005adc68 in ExecMergeJoin (node=0x15f0a00) at nodeMergejoin.c:730
#10 0x0000000000598818 in ExecProcNode (node=0x15f0a00) at execProcnode.c:453
#11 0x00000000005973e2 in ExecutePlan (queryDesc=0x15f3f90, direction=<value optimized out>, count=0) at execMain.c:1490
#12 standard_ExecutorRun (queryDesc=0x15f3f90, direction=<value optimized out>, count=0) at execMain.c:319
#13 0x00007f99c55b317b in pgss_ExecutorRun (queryDesc=0x15f3f90, direction=ForwardScanDirection, count=0) at pg_stat_statements.c:875
#14 0x000000000068c2d7 in PortalRunSelect (portal=0x1505c10, forward=<value optimized out>, count=0, dest=<value optimized out>) at pquery.c:946
#15 0x000000000068d501 in PortalRun (portal=0x1505c10, count=9223372036854775807, isTopLevel=1 '\001', dest=0x1652f50, altdest=0x1652f50, completionTag=0x7ffffedffa70 "") at pquery.c:790
#16 0x0000000000689b8e in exec_simple_query (query_string=0x14f8f80 "select * from pg_stat_wait_history ;") at postgres.c:1072
#17 0x000000000068b208 in PostgresMain (argc=<value optimized out>, argv=<value optimized out>, dbname=0x14df980 "postgres", username=<value optimized out>) at postgres.c:4074
#18 0x0000000000633fdd in BackendRun (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:4164
#19 BackendStartup (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:3829
#20 ServerLoop (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1597
#21 PostmasterMain (argc=<value optimized out>, argv=<value optimized out>) at postmaster.c:1244
#22 0x00000000005cbc28 in main (argc=3, argv=0x14deaa0) at main.c:228
(gdb)

This is all happening with waits_monitoring_94 branch.

@dev1ant
Copy link
Author

@dev1ant dev1ant commented Oct 26, 2015

Fixed in 4c54dc0.

@dev1ant dev1ant closed this Oct 26, 2015
asp437 pushed a commit that referenced this issue Apr 5, 2018
refresh_by_match_merge() has some issues in the way it builds a SQL
query to construct the "diff" table:

1. It doesn't require the selected unique index(es) to be indimmediate.
2. It doesn't pay attention to the particular equality semantics enforced
by a given index, but just assumes that they must be those of the column
datatype's default btree opclass.
3. It doesn't check that the indexes are btrees.
4. It's insufficiently careful to ensure that the parser will pick the
intended operator when parsing the query.  (This would have been a
security bug before CVE-2018-1058.)
5. It's not careful about indexes on system columns.

The way to fix #4 is to make use of the existing code in ri_triggers.c
for generating an arbitrary binary operator clause.  I chose to move
that to ruleutils.c, since that seems a more reasonable place to be
exporting such functionality from than ri_triggers.c.

While #1, #3, and postgres#5 are just latent given existing feature restrictions,
and #2 doesn't arise in the core system for lack of alternate opclasses
with different equality behaviors, #4 seems like an issue worth
back-patching.  That's the bulk of the change anyway, so just back-patch
the whole thing to 9.4 where this code was introduced.

Discussion: https://postgr.es/m/[email protected]
glukhovn pushed a commit that referenced this issue Feb 14, 2019
The original setup for dependencies of partitioned objects had
serious problems:

1. It did not verify that a drop cascading to a partition-child object
also cascaded to at least one of the object's partition parents.  Now,
normally a child object would share all its dependencies with one or
another parent (e.g. a child index's opclass dependencies would be shared
with the parent index), so that this oversight is usually harmless.
But if some dependency failed to fit this pattern, the child could be
dropped while all its parents remain, creating a logically broken
situation.  (It's easy to construct artificial cases that break it,
such as attaching an unrelated extension dependency to the child object
and then dropping the extension.  I'm not sure if any less-artificial
cases exist.)

2. Management of partition dependencies during ATTACH/DETACH PARTITION
was complicated and buggy; for example, after detaching a partition
table it was possible to create cases where a formerly-child index
should be dropped and was not, because the correct set of dependencies
had not been reconstructed.

Less seriously, because multiple partition relationships were
represented identically in pg_depend, there was an order-of-traversal
dependency on which partition parent was cited in error messages.
We also had some pre-existing order-of-traversal hazards for error
messages related to internal and extension dependencies.  This is
cosmetic to users but causes testing problems.

To fix #1, add a check at the end of the partition tree traversal
to ensure that at least one partition parent got deleted.  To fix #2,
establish a new policy that partition dependencies are in addition to,
not instead of, a child object's usual dependencies; in this way
ATTACH/DETACH PARTITION need not cope with adding or removing the
usual dependencies.

To fix the cosmetic problem, distinguish between primary and secondary
partition dependency entries in pg_depend, by giving them different
deptypes.  (They behave identically except for having different
priorities for being cited in error messages.)  This means that the
former 'I' dependency type is replaced with new 'P' and 'S' types.

This also fixes a longstanding bug that after handling an internal
dependency by recursing to the owning object, findDependentObjects
did not verify that the current target was now scheduled for deletion,
and did not apply the current recursion level's objflags to it.
Perhaps that should be back-patched; but in the back branches it
would only matter if some concurrent transaction had removed the
internal-linkage pg_depend entry before the recursive call found it,
or the recursive call somehow failed to find it, both of which seem
unlikely.

Catversion bump because the contents of pg_depend change for
partitioning relationships.

Patch HEAD only.  It's annoying that we're not fixing #2 in v11,
but there seems no practical way to do so given that the problem
is exactly a poor choice of what entries to put in pg_depend.
We can't really fix that while staying compatible with what's
in pg_depend in existing v11 installations.

Discussion: https://postgr.es/m/CAH2-Wzkypv1R+teZrr71U23J578NnTBt2X8+Y=Odr4pOdW1rXg@mail.gmail.com
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Linked pull requests

Successfully merging a pull request may close this issue.

None yet
1 participant
You can’t perform that action at this time.