PG-1059: Fix check-world failures #10

dutow · 2024-09-24T22:27:30Z

No description provided.

extension to see if custom features can be used

and allow extensions to override it For now, it extends on `pread` and `pwrite` from/into segment files. This is the minimum we need for full XLog encryption with pg_de.

include it

PG-981: Renamed PERCONA_FORK to PERCONA_EXT

PG-847: exclude pg_tde* files from checksum validation

The failover slots ensure a seamless transition of a subscriber after the standby is promoted. But the docs for it also explain the behavior of asynchronous replication which can confuse the readers. Reported-by: Masahiro Ikeda Backpatch-through: 17 Discussion: https://postgr.es/m/OS3PR01MB6390B660F4198BB9745E0526B18B2@OS3PR01MB6390.jpnprd01.prod.outlook.com

This does not make sense. It would write the output of the USING clause into the converted column, which would violate the generation expression. This adds a check to error out if this is specified. There was a test for this, but that test errored out for a different reason, so it was not effective. Reported-by: Jian He <jian.universality@gmail.com> Reviewed-by: Yugo NAGATA <nagata@sraoss.co.jp> Discussion: https://www.postgresql.org/message-id/flat/c7083982-69f4-4b14-8315-f9ddb20b9834%40eisentraut.org

If an ORDER BY item in SELECT is a bare identifier, the parser first seeks it as an output column name of the SELECT (for SQL92 compatibility). However, ruleutils.c is expecting the SQL99 interpretation where such a name is an input column name. So it's possible to produce an incorrect display of a view in the (admittedly pretty ill-advised) case where some other column is renamed in the SELECT output list to match an ORDER BY column. This can be fixed by table-qualifying such names in the dumped view text. To avoid cluttering less-ill-advised queries, we'd like to do so only when there's an actual name conflict. That requires passing the current get_query_def call's resultDesc parameter down to get_variable, so that it can determine what the output column names are. In hopes of reducing rather than increasing notational clutter in ruleutils.c, I moved that value into the deparse_context struct and removed it from the parameter lists of get_query_def's other subroutines. I made a few other cosmetic changes while at it: * Likewise move the colNamesVisible parameter into deparse_context. * Rename deparse_context's windowTList field to targetList, since it's no longer used only in connection with WINDOW clauses. * Replace the special_exprkind field with a bool inGroupBy, since that was all it was being used for, and the apparent flexibility of storing a ParseExprKind proved to be illusory. (We need a separate varInOrderBy field to make this patch work.) * Remove useless save/restore logic in get_select_query_def. In principle, this bug is quite old. However, it seems unreachable before 1b4d280, because before that the presence of "new" and "old" entries in a view's rangetable caused us to always table-qualify every Var reference in dumped views. Hence, back-patch to v16 where that came in. Per bug #18589 from Quynh Tran. Discussion: https://postgr.es/m/18589-70091cb81db1a3f1@postgresql.org

Reported-by: Etsuro Fujita <etsuro.fujita@gmail.com>

Reported-by: m.zhilin@postgrespro.ru

current through df80b1d

Commit 2489d76 removed some logic from pullup_replace_vars() that avoided wrapping a PlaceHolderVar around a pulled-up subquery output expression if the expression could be proven to go to NULL anyway (because it contained Vars or PHVs of the pulled-up relation and did not contain non-strict constructs). But removing that logic turns out to cause performance regressions in some cases, because the extra PHV blocks subexpression folding, and will do so even if outer-join reduction later turns it into a no-op with no phnullingrels bits. This can for example prevent an expression from being matched to an index. The reason for always adding a PHV was to ensure we had someplace to put the varnullingrels marker bits of the Var being replaced. However, it turns out we can optimize in exactly the same cases that the previous code did, because we can instead attach the needed varnullingrels bits to the contained Var(s)/PHV(s). This is not a complete solution --- it would be even better if we could remove PHVs after reducing them to no-ops. It doesn't look practical to back-patch such an improvement, but this change seems safe and at least gets rid of the performance-regression cases. Per complaint from Nikhil Raj. Back-patch to v16 where the problem appeared. Discussion: https://postgr.es/m/CAG1ps1xvnTZceKK24OUfMKLPvDP2vjT-d+F2AOCWbw_v3KeEgg@mail.gmail.com

This test occasionally shows +WARNING: could not get result of cancel request due to timeout which appears to be because the cancel request is sometimes unluckily sent to the remote session between queries, and then it's ignored. This patch tries to make that less probable in three ways: 1. Use a test query that does not involve remote estimates, so that no EXPLAINs are sent. 2. Make sure that the remote session is ready-to-go (transaction started, SET commands sent) before we start the timer. 3. Increase the statement_timeout to 100ms, to give the local session enough time to plan and issue the query. We might have to go higher than 100ms to make this adequately stable in the buildfarm, but let's see how it goes. Back-patch to v17 where this test was introduced. Jelte Fennema-Nio and Tom Lane Discussion: https://postgr.es/m/578934.1725045685@sss.pgh.pa.us

This change improves the description of the restrict_nonsystem_relation_kind parameter in guc_table.c and the documentation for better clarity. Backpatch to 12, where this GUC parameter was introduced. Reviewed-by: Peter Eisentraut Discussion: https://postgr.es/m/6a96f1af-22b4-4a80-8161-1f26606b9ee2%40eisentraut.org Backpatch-through: 12

The first test was sensitive to the insert LSN after setting up the catalogs, which depended on environmental things like the locales on the OS and usernames. Switch to a new WAL file before the first test, as a simple way to put every computer into the same state. Back-patch to all supported releases. Reported-by: Anton Voloshin <a.voloshin@postgrespro.ru> Reported-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Nathan Bossart <nathandbossart@gmail.com> Discussion: https://postgr.es/m/b26aeac2-cb6d-4633-a7ea-945baae83dcf%40postgrespro.ru

Commit b5a9b18 introduced block streaming infrastructure with a special fast path for all-cached scans, and commit b7b0f3f connected the infrastructure up to sequential scans. One of the fast path micro-optimizations had an unintended consequence: it interfered with parallel sequential scan's block range allocator (from commit 56788d2), which has its own ramp-up and ramp-down algorithm when handing out groups of pages to workers. A scan of an all-cached table could give extra blocks to one worker, when others had finished. In some plans (probably already very bad plans, such as the one reported by Alexander), the unfairness could be magnified. An internal buffer of 16 block numbers is removed, keeping just a single block buffer for technical reasons. Back-patch to 17. Reported-by: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/63a63690-dd92-c809-0b47-af05459e95d1%40gmail.com

Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: d0110df9f34c2d32cb2652d4477c3135dabe84f7

fixup for 2e6a804 Reported-by: Nazir Bilal Yavuz <byavuz81@gmail.com>

/usr/bin/msgfmt: po/fr.po: warning: PO file header fuzzy warning: older versions of msgfmt will give an error on this Apparently, not all versions of msgfmt produce this. Quick fix for now, more to be researched later.

This is confusing, as it exports twice the same variable. Oversight in 6782709 that has spread in more places afterwards. Reported-by: Alvaro Herrera, Tom Lane Discussion: https://postgr.es/m/202408201630.mn6vbohjh7hh@alvherre.pgsql Backpatch-through: 17

These tests depend on the test module injection_points to be installed, but it may not be available as the contents of src/test/modules/ are not installed by default. This commit adds a workaround based on a scan of pg_available_extensions to check if the extension is available, skipping the test if it is not. This allows installcheck to work transparently. There are more tests impacted by this problem on HEAD, but for now this addresses only the tests that exist on HEAD and v17 as the release is close by. Reported-by: Maxim Orlov Discussion: https://postgr.es/m/CACG=ezZkoT-pFz6a9XnyToiuR-Wg8fGELqHLoyBodr+2h-77qA@mail.gmail.com Backpatch-through: 17

This term was using an inconsistent casing between the code and the documentation, using "CommitTsSLRU" in wait_event_names.txt and "CommitTSSLRU" in the code. Let's update the term in the code to reflect what's in the documentation, "CommitTs" being more commonly used, so as pg_stat_activity shows the same term as the documentation. Oversight in 53c2a97. Author: Alexander Lakhin Discussion: https://postgr.es/m/f7e514cf-2446-21f1-a5d2-8c089a6e2168@gmail.com Backpatch-through: 17

Since commit 2549f06, we reject an identifier immediately following a numeric literal (without separating whitespace), because that risks ambiguity with hex/octal/binary integers. However, that patch used token patterns like "{integer}{ident_start}", which is problematic because {ident_start} matches only a single byte. If the first character after the integer is a multibyte character, this ends up with flex reporting an error message that includes a partial multibyte character. That can cause assorted bad-encoding problems downstream, both in the report to the client and in the postmaster log file. To fix, use {identifier} not {ident_start} in the "junk" token patterns, so that they will match complete multibyte characters. This seems generally better user experience quite aside from the encoding problem: for "123abc" the error message will now say that the error appeared at or near "123abc" instead of "123a". While at it, add some commentary about why these patterns exist and how they work. Report and patch by Karina Litskevich; review by Pavel Borisov. Back-patch to v15 where the problem came in. Discussion: https://postgr.es/m/CACiT8iZ_diop=0zJ7zuY3BXegJpkKK1Av-PU7xh0EDYHsa5+=g@mail.gmail.com

commit 60ae37a Backpatch-through: 17 only

The updated comment provides more helpful guidance by mentioning that escontext should be set when soft error handling is needed. Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17

The deparsing code in get_json_expr_options() unnecessarily emitted the default column-specific ON ERROR / EMPTY behavior when the top-level ON ERROR behavior in JSON_TABLE was set to ERROR. Fix that by not overriding the column-specific default, determined based on the column's JsonExprOp in get_json_table_columns(), with JSON_BEHAVIOR_ERROR when that is the top-level ON ERROR behavior. Note that this only removes redundancy; the current deparsing output is not incorrect, just redundant. Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17

Use EMPTY ARRAY instead of EMPTY. This change does not affect the runtime behavior of JSON_TABLE(), which continues to return an empty relation ON ERROR. It only alters whether the default ON ERROR behavior is shown in the deparsed output. Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17

When the ON ERROR / ON EMPTY behavior is to return NULL, returning NULL directly from ExecEvalJsonExprPath() suffices. Therefore, there's no need to create separate steps to check the error/empty flag or those to evaluate the the constant NULL expression. This speeds up common cases because the default ON ERROR / ON EMPTY behavior for JSON_QUERY() and JSON_VALUE() is to return NULL. However, these steps are necessary if the RETURNING type is a domain, as constraints on the domain may need to be checked. Reported-by: Jian He <jian.universality@gmail.com> Author: Jian He <jian.universality@gmail.com> Author: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17

Reverts c88ce38, 5067c23, and e4e2797, because a few BF animals didn't like one or all of them.

The deparsing code in get_json_expr_options() unnecessarily emitted the default column-specific ON ERROR / EMPTY behavior when the top-level ON ERROR behavior in JSON_TABLE was set to ERROR. Fix that by not overriding the column-specific default, determined based on the column's JsonExprOp in get_json_table_columns(), with JSON_BEHAVIOR_ERROR when that is the top-level ON ERROR behavior. Note that this only removes redundancy; the current deparsing output is not incorrect, just redundant. Reviewed-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17

Use EMPTY ARRAY instead of EMPTY. This change does not affect the runtime behavior of JSON_TABLE(), which continues to return an empty relation ON ERROR. It only alters whether the default ON ERROR behavior is shown in the deparsed output. Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17

Updated product name from PostgreSQL to Percona Server for PostgreSQL and incremented the version number from 17rc1 to 17rc1.1.

This commit adds two test runners which build Postgres with make and meson, and run the basic regression tests. Tests are executed for every PR, and also every day using the latest pg_tde code.

This way the build scripts are aware of its value, and we can use it to add tests specifically to tde_heap, or other percona features.

Basic test for PG17

PERCONA_EXT is now a configuration option, not hardcoded

PG-1008: Change product name and version number

Naeem-Akhter · 2024-09-25T18:57:48Z

I guess, there would be no need of changes after we decided in the meeting, that we won't change the existing --version string, but will add another flag/identifier as '--with-extra-version' to identify Percona Build and package number.

Your thoughts @dutow

1. TruncateMultiXact() performs the SLRU truncations in a critical section. Deleting the SLRU segments calls ForwardSyncRequest(), which will try to compact the request queue if it's full (CompactCheckpointerRequestQueue()). That in turn allocates memory, which is not allowed in a critical section. Backtrace: TRAP: failed Assert("CritSectionCount == 0 || (context)->allowInCritSection"), File: "../src/backend/utils/mmgr/mcxt.c", Line: 1353, PID: 920981 postgres: autovacuum worker template0(ExceptionalCondition+0x6e)[0x560a501e866e] postgres: autovacuum worker template0(+0x5dce3d)[0x560a50217e3d] postgres: autovacuum worker template0(ForwardSyncRequest+0x8e)[0x560a4ffec95e] postgres: autovacuum worker template0(RegisterSyncRequest+0x2b)[0x560a50091eeb] postgres: autovacuum worker template0(+0x187b0a)[0x560a4fdc2b0a] postgres: autovacuum worker template0(SlruDeleteSegment+0x101)[0x560a4fdc2ab1] postgres: autovacuum worker template0(TruncateMultiXact+0x2fb)[0x560a4fdbde1b] postgres: autovacuum worker template0(vac_update_datfrozenxid+0x4b3)[0x560a4febd2f3] postgres: autovacuum worker template0(+0x3adf66)[0x560a4ffe8f66] postgres: autovacuum worker template0(AutoVacWorkerMain+0x3ed)[0x560a4ffe7c2d] postgres: autovacuum worker template0(+0x3b1ead)[0x560a4ffecead] postgres: autovacuum worker template0(+0x3b620e)[0x560a4fff120e] postgres: autovacuum worker template0(+0x3b3fbb)[0x560a4ffeefbb] postgres: autovacuum worker template0(+0x2f724e)[0x560a4ff3224e] /lib/x86_64-linux-gnu/libc.so.6(+0x27c8a)[0x7f62cc642c8a] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x85)[0x7f62cc642d45] postgres: autovacuum worker template0(_start+0x21)[0x560a4fd16f31] To fix, bail out in CompactCheckpointerRequestQueue() without doing anything, if it's called in a critical section. That covers the above call path, as well as any other similar cases where RegisterSyncRequest might be called in a critical section. 2. After fixing that, another problem became apparent: Autovacuum process doing that truncation can deadlock with the checkpointer process. TruncateMultiXact() sets "MyProc->delayChkptFlags |= DELAY_CHKPT_START". If the sync request queue is full and cannot be compacted, the process will repeatedly sleep and retry, until there is room in the queue. However, if the checkpointer is trying to start a checkpoint at the same time, and is waiting for the DELAY_CHKPT_START processes to finish, the queue will never shrink. More concretely, the autovacuum process is stuck here: #0 0x00007fc934926dc3 in epoll_wait () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x000056220b24348b in WaitEventSetWaitBlock (set=0x56220c2e4b50, occurred_events=0x7ffe7856d040, nevents=1, cur_timeout=<optimized out>) at ../src/backend/storage/ipc/latch.c:1570 #2 WaitEventSetWait (set=0x56220c2e4b50, timeout=timeout@entry=10, occurred_events=<optimized out>, occurred_events@entry=0x7ffe7856d040, nevents=nevents@entry=1, wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:1516 #3 0x000056220b243224 in WaitLatch (latch=<optimized out>, latch@entry=0x0, wakeEvents=wakeEvents@entry=40, timeout=timeout@entry=10, wait_event_info=wait_event_info@entry=150994949) at ../src/backend/storage/ipc/latch.c:538 #4 0x000056220b26cf46 in RegisterSyncRequest (ftag=ftag@entry=0x7ffe7856d0a0, type=type@entry=SYNC_FORGET_REQUEST, retryOnError=true) at ../src/backend/storage/sync/sync.c:614 #5 0x000056220af9db0a in SlruInternalDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1495 #6 0x000056220af9dab1 in SlruDeleteSegment (ctl=ctl@entry=0x56220b7beb60 <MultiXactMemberCtlData>, segno=segno@entry=11350) at ../src/backend/access/transam/slru.c:1566 #7 0x000056220af98e1b in PerformMembersTruncation (oldestOffset=<optimized out>, newOldestOffset=<optimized out>) at ../src/backend/access/transam/multixact.c:3006 #8 TruncateMultiXact (newOldestMulti=newOldestMulti@entry=3221225472, newOldestMultiDB=newOldestMultiDB@entry=4) at ../src/backend/access/transam/multixact.c:3201 #9 0x000056220b098303 in vac_truncate_clog (frozenXID=749, minMulti=<optimized out>, lastSaneFrozenXid=749, lastSaneMinMulti=3221225472) at ../src/backend/commands/vacuum.c:1917 #10 vac_update_datfrozenxid () at ../src/backend/commands/vacuum.c:1760 #11 0x000056220b1c3f76 in do_autovacuum () at ../src/backend/postmaster/autovacuum.c:2550 #12 0x000056220b1c2c3d in AutoVacWorkerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/autovacuum.c:1569 and the checkpointer is stuck here: #0 0x00007fc9348ebf93 in clock_nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #1 0x00007fc9348fe353 in nanosleep () from /lib/x86_64-linux-gnu/libc.so.6 #2 0x000056220b40ecb4 in pg_usleep (microsec=microsec@entry=10000) at ../src/port/pgsleep.c:50 #3 0x000056220afb43c3 in CreateCheckPoint (flags=flags@entry=108) at ../src/backend/access/transam/xlog.c:7098 #4 0x000056220b1c6e86 in CheckpointerMain (startup_data=<optimized out>, startup_data_len=<optimized out>) at ../src/backend/postmaster/checkpointer.c:464 To fix, add AbsorbSyncRequests() to the loops where the checkpointer waits for DELAY_CHKPT_START or DELAY_CHKPT_COMPLETE operations to finish. Backpatch to v14. Before that, SLRU deletion didn't call RegisterSyncRequest, which avoided this failure. I'm not sure if there are other similar scenarios on older versions, but we haven't had any such reports. Discussion: https://www.postgresql.org/message-id/ccc66933-31c1-4f6a-bf4b-45fef0d4f22e@iki.fi

dutow and others added 30 commits August 28, 2024 20:23

Added a Percona specific definition to pg_config, allowing our

8b86257

extension to see if custom features can be used

Make XLog storage extensible

5b0d92a

and allow extensions to override it For now, it extends on `pread` and `pwrite` from/into segment files. This is the minimum we need for full XLog encryption with pg_de.

Added pg_tde as submodule following the smgr branch

d3b6712

added pg_tde to meson build

087cf92

Applied patch

11271a1

Downloaded smgr patch

5dad05c

fixing rebase bug

c3c1ae6

Removed fsync_checker extension from patch as we do not want to

c239a8c

include it

Updated pg_tde submodule reference

5786e24

Added pg_tde to makefile

0388294

PG-981: Renamed PERCONA_FORK to PERCONA_EXT

7573d1b

PG-847: exclude pg_tde* files from checksum validation

3ceb503

Merge pull request postgres#3 from dutow/nofork

ecb2aa7

PG-981: Renamed PERCONA_FORK to PERCONA_EXT

Merge pull request postgres#4 from Percona-Lab/PG-847

370259b

PG-847: exclude pg_tde* files from checksum validation

Message style improvements

8f33264

Correct name in list of acknowledgments

5995795

Reported-by: Etsuro Fujita <etsuro.fujita@gmail.com>

Remove duplicate name from list of acknowledgments

8427215

Reported-by: m.zhilin@postgrespro.ru

Update list of acknowledgments in release notes

4913ff0

current through df80b1d

Translation updates

92b91cd

Source-Git-URL: https://git.postgresql.org/git/pgtranslation/messages.git Source-Git-Hash: d0110df9f34c2d32cb2652d4477c3135dabe84f7

Fix rarely-run test for message wording change

73adaba

fixup for 2e6a804 Reported-by: Nazir Bilal Yavuz <byavuz81@gmail.com>

Fix warnings from msgfmt

c8e0480

/usr/bin/msgfmt: po/fr.po: warning: PO file header fuzzy warning: older versions of msgfmt will give an error on this Apparently, not all versions of msgfmt produce this. Quick fix for now, more to be researched later.

Stamp 17rc1.

b6d662d

michaelpq and others added 20 commits September 6, 2024 17:21

doc PG 17 relnotes: remove tab complete for MERGE/SPLIT partit.

cf60739

commit 60ae37a Backpatch-through: 17 only

Revert recent SQL/JSON related commits

899c071

Reverts c88ce38, 5067c23, and e4e2797, because a few BF animals didn't like one or all of them.

PG-1008: Change product name and version number

bf7e2c3

Updated product name from PostgreSQL to Percona Server for PostgreSQL and incremented the version number from 17rc1 to 17rc1.1.

Basic test for PG17

1d5761a

This commit adds two test runners which build Postgres with make and meson, and run the basic regression tests. Tests are executed for every PR, and also every day using the latest pg_tde code.

PERCONA_EXT is now a configuration option, not hardcoded

d280540

This way the build scripts are aware of its value, and we can use it to add tests specifically to tde_heap, or other percona features.

Merge pull request postgres#8 from dutow/cistuff

a3f691a

Basic test for PG17

Merge pull request postgres#9 from dutow/dyn_ext

1a98df1

PERCONA_EXT is now a configuration option, not hardcoded

Merge pull request postgres#7 from Percona-Lab/pg-1008

6aba5b5

PG-1008: Change product name and version number

PG-1059: Generalize upgrade version string matcher

85edf92

PG-1059: Fix pgbench test check

a09e481

dAdAbird approved these changes Sep 25, 2024

View reviewed changes

dutow force-pushed the TDE_REL_17_STABLE branch from 6aba5b5 to 4879037 Compare September 25, 2024 19:51

dutow closed this Sep 30, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PG-1059: Fix check-world failures #10

PG-1059: Fix check-world failures #10

dutow commented Sep 24, 2024

Naeem-Akhter commented Sep 25, 2024

PG-1059: Fix check-world failures #10

PG-1059: Fix check-world failures #10

Conversation

dutow commented Sep 24, 2024

Naeem-Akhter commented Sep 25, 2024