Fix background worker not restarting after crash-and-restart cycle.

Previously, if a background worker crashed (e.g., due to a SIGKILL) and
the server restarted due to restart_after_crash being enabled,
the worker was not restarted as expected. Background workers without
the never-restart flag should automatically restart in this case.

This issue was introduced in commit 28a520c0b7, which failed to reset
the rw_pid field in the RegisteredBgWorker struct for the crashed worker.

This commit fixes the problem by resetting rw_pid for all eligible
background workers during the crash-and-restart cycle.

Back-patched to v18, where the bug was introduced.

Bug fix patches were proposed by Andrey Rudometov and ChangAo Chen,
but this commit uses a different approach.

Reported-by: Andrey Rudometov <unlimitedhikari@gmail.com>
Reported-by: ChangAo Chen <cca5507@qq.com>
Author: Andrey Rudometov <unlimitedhikari@gmail.com>
Author: ChangAo Chen <cca5507@qq.com>
Co-authored-by: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: ChangAo Chen <cca5507@qq.com>
Reviewed-by: Shveta Malik <shveta.malik@gmail.com>
Discussion: https://postgr.es/m/CAF6JsWiO=i24qYitWe6ns1sXqcL86rYxdyU+pNYk-WueKPSySg@mail.gmail.com
Discussion: https://postgr.es/m/tencent_E00A056B3953EE6440F0F40F80EC30427D09@qq.com
Backpatch-through: 18
This commit is contained in:
Fujii Masao
2025-07-25 18:38:36 +09:00
parent f7dfccf960
commit 75f633f54a
2 changed files with 8 additions and 0 deletions

View File

@ -613,6 +613,7 @@ ResetBackgroundWorkerCrashTimes(void)
* resetting.
*/
rw->rw_crashed_at = 0;
rw->rw_pid = 0;
/*
* If there was anyone waiting for it, they're history.

View File

@ -2630,6 +2630,13 @@ CleanupBackend(PMChild *bp,
}
bp = NULL;
/*
* In a crash case, exit immediately without resetting background worker
* state. However, if restart_after_crash is enabled, the background
* worker state (e.g., rw_pid) still needs be reset so the worker can
* restart after crash recovery. This reset is handled in
* ResetBackgroundWorkerCrashTimes(), not here.
*/
if (crashed)
{
HandleChildCrash(bp_pid, exitstatus, procname);