Muhammad Usama [Mon, 14 Oct 2019 10:03:08 +0000 (15:03 +0500)]
Fix for miscellaneous watchdog issues.
The commit takes care of following reports in watchdog
-- pgpool-general: 6672 Query (watchdog split brain)
--
0000547: We need to do arping again after recovering from split-brain.
Basically in a bid to solve these issue, the commit makes blow mentioned
behavioral changes in watchdog
1-- If life check reports a watchdog node is dead but the watchdog core is still
able to connect and communicate with it without any problem, the watchdog core
was getting into the dilemma of whether to consider it as a lost or an alive node.
Fix:
With this commit, the lost nodes reported by life-check (external or internal)
are treated as hard failures even when the node is reachable from the watchdog
core. and that node can only become alive if it is restarted or secondly if
life-check informs that the node has become alive again.
The second type of node failures that can mark the watchdog node as lost, occurs
when a watchdog node fails to respond to the messages sent to it or frequent
errors occur on its socket. These errors are detected by the watchdog core
itself and get recovered automatically whenever that node becomes responsive
again. Apart from this the commit also made some improvements in the area of
detecting such internal errors.
2-- Standby nodes were too aggressive in reacting to the coup attempt
(when another watchdog node tries to become a master while valid master already
exists in the cluster) or to the possibility of split-brain (when it receives
"I AM MASTER" message from the node that is not the master as per the standby
node record). In both these situations, the standby nodes used to re-join
the master node in the hope to find the true master.
But that didn't prove to be a very good strategy, since it may lead to
influence the selection of true master node when the true-master and
fake-master nodes get into a fight to retain the master node status.
As one of the things true and fake master compares is the number of connected
standby nodes. And if during the fight the standby leaves the current master
to re-join the cluster the standby node count on the true master becomes
lower than actual and that affects its chances of winning the elections.
Fix:
The commit makes the standby nodes more lean-back and to actively reject the
nodes that are trying to become a master in the presence of the true-master
and avoid re-joining the cluster until it is absolutely necessary.
3-- The third problem was in the case of network partitioning or partial
life-check failure (when node A thinks node B is lost but node B thinks
node A is not lost) In this case the kicked out standby node was too
aggressive in trying to connect to the master or become a master itself
so that potentially put the unnecessary burden on the network and the
cluster nodes.
Fix:
So the fix for this issue is to make the isolated node a bit calm and wait
between trying to connect to master or become a master. For that purpose,a new
WD_NETWORK_ISOLATION state is added to the state machine, which just adds a
delay between successive tries of becoming a master. The node can only go into
this state when it comes to knowing that it is marked as lost on the current
master and gets out of it when it gets an indication that life-check on the
remote node has informed that node is alive again or secondly it gets out of
it after 10 seconds anyway.
Finally, Because creating the partial or complete network failures scenario is
very difficult so the commit also adds a watchdog debug aid similar to the
health-check debug to simulate different scenarios.
To enable the watchdog-debug aid the Pgpool-II needs to be compiled with
WATCHDOG_DEBUG=1" flag ($make WATCHDOG_DEBUG=1)
Once compiled with debug aid enabled, you can put the command in
pgpool_logdir->watchdog_debug_requests file to test different scenarios.
e.g.
$ echo "KILL_ALL_COMMUNICATION" > logdir/watchdog_debug_requests
$ echo "KILL_ALL_SENDERS" >> logdir/watchdog_debug_requests
The current list of commands that are supported by watchdog debug aid are
DO_NOT_REPLY_TO_BEACON <= Standby node stops replying master node beacon messages
while this line is present in the watchdog_debug_requests file
DO_NOT_SEND_BEACON <= Master node stops sending beacon messages to standby nodes
while this line is present in the watchdog_debug_requests file
KILL_ALL_COMMUNICATION <= Watchdog stops all communication with all nodes
while this line is present in the watchdog_debug_requests file
KILL_ALL_RECEIVERS <= Watchdog ignores messages of all node
while this line is present in the watchdog_debug_requests file
KILL_ALL_SENDERS <= Watchdog stops sending messages to all nodes
while this line is present in the watchdog_debug_requests file
Bo Peng [Thu, 10 Oct 2019 07:36:43 +0000 (16:36 +0900)]
Doc: Update document "8.3. Pgpool-II + Watchdog Setup Example".
Bo Peng [Thu, 10 Oct 2019 05:52:49 +0000 (14:52 +0900)]
Update pgpool.spec to support PostgreSQL 12.
Bo Peng [Thu, 10 Oct 2019 05:43:57 +0000 (14:43 +0900)]
Start pgpool using "postgres" user by default.
Currently the Pgpool-II startup user is "root" user.
Because of the security reason, startup Pgpool-II
using non-root user is recommended.
For this reason, I modified "pgpool.service" file to start
Pgpool using "postres" user, and allow "postrges" user to
run "if_up/down_cmd" and "arping_cmd" with sudo without a password.
The default setting of "if_up/down_cmd" and "arping_cmd" is changed:
if_up_cmd = '/usr/bin/sudo /sbin/ip addr add $_IP_$/24 dev eth0 label eth0:0'
if_down_cmd = '/usr/bin/sudo /sbin/ip addr del $_IP_$/24 dev eth0'
arping_cmd = '/usr/bin/sudo /usr/sbin/arping -U $_IP_$ -w 1 -I eth0'
If if_up/down_cmd or arping_cmd starts with "/",
the setting specified in "if_cmd_path" or "arping_path" will be ignored.
This patch also changes all of the config files permission to 600.
Bo Peng [Thu, 10 Oct 2019 04:54:03 +0000 (13:54 +0900)]
Update pgpool_socket_dir.patch.
Bo Peng [Thu, 10 Oct 2019 04:30:55 +0000 (13:30 +0900)]
Fix sample scripts typo.
Tatsuo Ishii [Thu, 10 Oct 2019 01:01:08 +0000 (10:01 +0900)]
Fix assorted ancient v2 protocol bugs.
- In v2 code path, extract_message() pfrees memory which was returned
from pool_read_string(). This is plain wrong and could cause sefault
since the memory returned by it is being managed by pool_stream
modules.
- In v2 code path pool_process_notice_message_from_one_backend() added
"NOTICE:" to the log message. This is not necessary as the part is
already included in the message.
- In v2 code path pool_extract_error_message() did not prepare unread
data correctly. This caused subsequent
pool_process_notice_message_from_one_backend() produces empty
message, read_kind_from_backend() failure.
Tatsuo Ishii [Tue, 8 Oct 2019 07:56:57 +0000 (16:56 +0900)]
Make enable_shared_relcache to on by default.
enable_shared_relcache enhances performance in most cases without
downsides. So let's enable the feature by default.
Takuma Hoshiai [Tue, 8 Oct 2019 07:00:23 +0000 (16:00 +0900)]
Fix extended query communication in do_query()
do_query() didn't send Describe message to PostgreSQL.
It didn't use strcasecmp() but strcasecmp() when check if query is SELECT.
Takuma Hoshiai [Tue, 8 Oct 2019 04:04:46 +0000 (13:04 +0900)]
Fix problem that syslog_facility don't change by reload
The cause is macro definition mistake. This fix unify macro definition, and delete old test code to use vsyslog().
Reported in bug 548.
Tatsuo Ishii [Fri, 4 Oct 2019 04:52:19 +0000 (13:52 +0900)]
Fix inappropriate ereport call in VALID_BACKEND.
VALID_BACKEND (more precisely pool_virtual_master_db_node_id) macro
emitted message if pgpool is performing failover/failback:
ereport(WARNING,
(errmsg("failover/failback is in progress"),
errdetail("executing failover or failback on backend"),
errhint("In a moment you should be able to reconnect to the database")));
This could be called within signal handlers and
POOL_SETMASK(&BlockSig)/POOL_SETMASK(&UnBlockSig) was called to block
an interrupt because ereport is not reentrant. However it is possible
that callers have already called POOL_SETMASK, and this could result
unwanted signal unblock.
Fix is, removing ereport and POOL_SETAMASK all together. This results
in removing the message above but we have no choice.
I found the problem while investigating regression
055.backend_all_down failure but of course the bug could have bitten
users in other places.
Muhammad Usama [Thu, 3 Oct 2019 20:25:17 +0000 (01:25 +0500)]
Fix for Coverity warning '
1395047 Resource leak'
Muhammad Usama [Thu, 3 Oct 2019 15:33:22 +0000 (20:33 +0500)]
Fix for Coverity warnings in pool_auth
Muhammad Usama [Thu, 3 Oct 2019 14:53:44 +0000 (19:53 +0500)]
Fix for Coverity warnings in watchdog and lifecheck
Tatsuo Ishii [Thu, 3 Oct 2019 12:33:09 +0000 (21:33 +0900)]
Fix signal unblock leak in failover.
When failover event occurs, register_node_operation_request() gets
called to en-queue failover/failback requests. If the request queue is
full, this function returns false with unlocking semaphore. But it
forgot to unblock signal mask. This leads to block all signals
including SITERM, which makes pgpool fail to shutdown.
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2019-October/003449.html
Muhammad Usama [Mon, 30 Sep 2019 09:55:06 +0000 (14:55 +0500)]
Fixing a compiler warning and a tiny coding mistake
Muhammad Usama [Sat, 28 Sep 2019 19:51:07 +0000 (00:51 +0500)]
Fix for bug-545: Quorum lost and not recovered
Master watchdog node was not adding the lost standby node to its list of valid
standby nodes after it is rediscovered by the lifecheck.The fix is to ask the
node to rejoin the master node when it gets rediscovered by the lifecheck.
As part of this commit, I have also added the watchdog data version and Pgpool-II
version in the watchdog info packet to make the extensions in the watchdog
messages easier in the future.
Thanks to Guille(reporter of this bug), for providing lots of help in testing the fix
Tatsuo Ishii [Wed, 25 Sep 2019 05:22:21 +0000 (14:22 +0900)]
Fix memory leak in replication mode.
Per coverity.
Tatsuo Ishii [Wed, 25 Sep 2019 01:42:06 +0000 (10:42 +0900)]
Update version to Pgpool-II 4.2 devel.
Tatsuo Ishii [Tue, 24 Sep 2019 23:49:48 +0000 (08:49 +0900)]
Fix memory leak while attempting to connect to backend.
If no backend is up and running, memory for copy of startup packet
will be lost. This was brought by commit
cdb49d3b7. Per coverity.
Tatsuo Ishii [Tue, 24 Sep 2019 07:50:25 +0000 (16:50 +0900)]
Fix coverity warnings.
Tatsuo Ishii [Tue, 24 Sep 2019 06:26:45 +0000 (15:26 +0900)]
Fix coverity warnings.
Muhammad Usama [Mon, 23 Sep 2019 20:30:30 +0000 (01:30 +0500)]
Adding new wd_cli utility
wd_cli makes it easier to integrate the external health check systems with the
Pgpool-II. It provides a very convenient and easy to use interface to interact with
watchdog and perform health check related node operations.
For example:
If some external health-check system monitoring the health of the watchdog
cluster detects the remote node failure. It can use wd_cli utility to inform
Pgpool-II about that node failure
$wd_cli --inform DEAD -N 'Remote watchdog name' -p 9001 -m 'node 1 failed'
Similarly when the node becomes reachable again
$wd_cli --inform ALIVE -N 'Remote watchdog name' -p 9001 -m 'node 1 found'
This utility can also be used to get the live information of all nodes.
$wd_cli -f simple_conf/watchdog/pgpool_wd1.conf --info -v --all
Total Watchdog nodes configured for lifecheck: 2
*****************
Node ID: 0
Node Status code 4
Node Status: MASTER
Node Name: localhost:9991 Linux localhost.localdomain
Node Host: localhost
Node WD Port: 9001
Node Pgpool Port: 9991
Node ID: 1
Node Status code 7
Node Status: STANDBY
Node Name: localhost:9992 Linux localhost.localdomain
Node Host: localhost
Node WD Port: 9002
Node Pgpool Port: 9992
wd_cli provides many options for node search criteria.
See wd_cli --help for complete list of options.
The commit also re-arrange the code related to socket communication and
watchdog IPC commands and separates the internal and external watchdog
commands to make the code structure more modular.
The regression tests and documentation updates are not part of this commit
and will be taken care of separately.
Tatsuo Ishii [Wed, 18 Sep 2019 01:51:28 +0000 (10:51 +0900)]
Fix uninitialized variable.
Per Coverity.
Tatsuo Ishii [Tue, 17 Sep 2019 22:39:15 +0000 (07:39 +0900)]
Fix compiler warning.
Tatsuo Ishii [Tue, 17 Sep 2019 22:36:38 +0000 (07:36 +0900)]
Fix compiler warnings.
Tatsuo Ishii [Mon, 16 Sep 2019 00:24:08 +0000 (09:24 +0900)]
Fix occasional query hang while processing DEALLOCATE.
When DEALLOCATE tries to remove a named statement, it inherits
where_to_send map of the named statement in
where_to_send_deallocate(). However it forgot to copy the load balance
node id in the query context of the named statement. This made sending
query to backend not happen: if the target node id is different from
query_context->load_balance_node_id nor primary node id,
pool_virtual_master_db_node_id (it is called as MASTER_NODE_ID)
returns primary node id, and pool_send_and_wait(MASTER_NODE_ID)
ignores the request because VALID_BACKEND returns false in this case
(MASTER_NODE_ID = primary node id is not in the where_to_send map). As
a result, following check_error() waits for response from backend in
vain.
Fix is, let where_to_send_deallocate() copy load balance node id from
the query context of the previous named statement.
Per bug 546.
Tatsuo Ishii [Sun, 15 Sep 2019 13:39:18 +0000 (22:39 +0900)]
Fix segfault in certain case.
The scenario is something like:
1) a named statement is created.
2) DEALLOCATE removes it.
3) an erroneous query is executed.
In #2, "sent message" for the named statement is removed but
"uncompleted_message" is left. Then after #3, in ReadyForQuery()
uncompleted_message is added and removed. However, storage for the
uncompleted_message has been already freed in #2, and it causes a
segfault.
Fix is, in SimpleQuery() set NULL to uncompleted_message if it's not
PREPARE command so that ReadyForQuery() does not try to remove the
already removed message.
Per bug 546.
Here is a minimum test case.
'P' "_plan0x7f2d465db530" "SELECT 1" 0
'S'
'Y'
'Q' "DEALLOCATE _plan0x7f2d465db530"
'Y'
'Q' "CREATE INDEX users_auth_id_index ON non_existing_table ( auth_id )"
'Y'
'X'
Tatsuo Ishii [Sun, 15 Sep 2019 04:00:21 +0000 (13:00 +0900)]
Attempt to fix 028.watchdog_enable_consensus_with_half_votes.
Still get password does not match error. Let's change hostname arg of
watchdog_info to localhost.
Tatsuo Ishii [Fri, 13 Sep 2019 22:21:44 +0000 (07:21 +0900)]
Fix 028.watchdog_enable_consensus_with_half_votes failing.
The previous commit was not enough...
Tatsuo Ishii [Thu, 12 Sep 2019 12:23:38 +0000 (21:23 +0900)]
Fix 028.watchdog_enable_consensus_with_half_votes test failure.
Fix path to PCPPASSFILE.
Tatsuo Ishii [Thu, 12 Sep 2019 04:40:05 +0000 (13:40 +0900)]
Fix identical code used for different branches per Coverity.
Tatsuo Ishii [Thu, 12 Sep 2019 04:39:41 +0000 (13:39 +0900)]
Fix memory leak per Coverity.
Tatsuo Ishii [Wed, 11 Sep 2019 04:08:16 +0000 (13:08 +0900)]
Fix 028.watchdog_enable_consensus_with_half_votes test script.
pcp_watchdog_node needs to have -h option to point to UNIX socket
directory if it is not the standard (/tmp) place.
Tatsuo Ishii [Tue, 10 Sep 2019 06:54:13 +0000 (15:54 +0900)]
Fix typo in fork_lifecheck_child().
Tatsuo Ishii [Tue, 10 Sep 2019 06:42:10 +0000 (15:42 +0900)]
Fix typo in fork_watchdog_child().
Tatsuo Ishii [Mon, 9 Sep 2019 08:17:58 +0000 (17:17 +0900)]
Add more debug info out.
Tatsuo Ishii [Mon, 9 Sep 2019 02:37:24 +0000 (11:37 +0900)]
Doc: clarify that certificate authentication works between only client and Pgpool-II.
Per complaint from: https://www.pgpool.net/pipermail/pgpool-general-jp/2019-September/001611.html
Tatsuo Ishii [Fri, 6 Sep 2019 08:22:52 +0000 (17:22 +0900)]
Add start message of PCP process.
Tatsuo Ishii [Fri, 6 Sep 2019 07:31:07 +0000 (16:31 +0900)]
Fix memory leak.
Per Coverity.
Tatsuo Ishii [Fri, 6 Sep 2019 07:02:40 +0000 (16:02 +0900)]
Fix failure of statement_level_load_balance extra test.
Now pgpool issues SELECT version() internaly, the count of SELECT
queries needed to be adjusted.
Tatsuo Ishii [Fri, 6 Sep 2019 06:54:39 +0000 (15:54 +0900)]
Fix memory leak.
Per Coverity.
Tatsuo Ishii [Fri, 6 Sep 2019 06:24:09 +0000 (15:24 +0900)]
Fix uninitialized variable.
Probably harmless but bug is bug...
Per Coverity.
Bo Peng [Fri, 6 Sep 2019 01:58:58 +0000 (10:58 +0900)]
Update Makefile to include pgpool_adm--1.2.sql and pgpool_adm--1.1--1.2.sql.
Bo Peng [Fri, 6 Sep 2019 00:55:48 +0000 (09:55 +0900)]
Change regression test file name.
Tatsuo Ishii [Thu, 5 Sep 2019 23:21:28 +0000 (08:21 +0900)]
Doc: mention that auto re-attaching quarantined node is available in v4.1 or later.
Tatsuo Ishii [Thu, 5 Sep 2019 22:45:34 +0000 (07:45 +0900)]
Doc: make "quarantine" to be indexed.
It should have been indexed because the term is important for failover
behavior.
Bo Peng [Thu, 5 Sep 2019 12:00:46 +0000 (21:00 +0900)]
Fix outfuncs.c to support PostgreSQL 12 CTE [NOT] MATERIALIZED.
Tatsuo Ishii [Thu, 5 Sep 2019 11:46:10 +0000 (20:46 +0900)]
Doc: add patch author name to check_temp_table.
Bo Peng [Thu, 5 Sep 2019 07:46:33 +0000 (16:46 +0900)]
Update SPEC file.
Bo Peng [Thu, 5 Sep 2019 07:38:18 +0000 (16:38 +0900)]
Enable master branch AM_MAINTAINER_MODE.
Bo Peng [Thu, 5 Sep 2019 07:33:34 +0000 (16:33 +0900)]
Disable AM_MAINTAINER_MODE.
Bo Peng [Thu, 5 Sep 2019 06:44:46 +0000 (15:44 +0900)]
Doc: update example documentation.
Bo Peng [Thu, 5 Sep 2019 06:28:49 +0000 (15:28 +0900)]
Allow failover.sh.sample, follow_sh.sample, recovery_1st_stage.sample, recovery_2nd_stage.sample,
pgpool_remote_start.sample sample scripts to be included in distributions.
Bo Peng [Thu, 5 Sep 2019 06:26:51 +0000 (15:26 +0900)]
Allow failover.sh, follow_sh, recovery_1st_stage, recovery_2nd_stage,
pgpool_remote_start scripts to be included in distributions.
Tatsuo Ishii [Thu, 5 Sep 2019 04:40:02 +0000 (13:40 +0900)]
Fix regression test 028.watchdog_enable_consensus_with_half_of_the_votes to not fail.
It seems pcp_watchdog_info fails because:
ERROR: connection to socket "/tmp/.s.PGSQL.50001" failed with error "No such file or directory"
Fix could be trying to wait for pcp_watchdog_info available. See if this fixes the issue.
Tatsuo Ishii [Thu, 5 Sep 2019 03:58:19 +0000 (12:58 +0900)]
Doc: add a missing release item in Japanese release 4.1 note.
Also tweak SGML tag in English 4.1 release note.
Bo Peng [Wed, 4 Sep 2019 08:37:06 +0000 (17:37 +0900)]
Add various sample scripts:
- failover.sh
- follow_master.sh
- recovery_1st_stage
- recovery_2nd_stage
- pgpool_remote_start
Bo Peng [Wed, 4 Sep 2019 08:35:20 +0000 (17:35 +0900)]
Doc: Update 4.1 release-note.
Bo Peng [Wed, 4 Sep 2019 08:04:01 +0000 (17:04 +0900)]
Allow to specify absolute path in "pool_passwd".
Patch is provided by Danylo Hlynskyi.
Tatsuo Ishii [Wed, 4 Sep 2019 05:53:07 +0000 (14:53 +0900)]
Doc: add release note entry for enable_consensus_with_half_votes.
Also add description of enable_consensus_with_half_votes to Japanese
watchdog doc.
Tatsuo Ishii [Tue, 3 Sep 2019 22:45:17 +0000 (07:45 +0900)]
Doc: mention that VIP will not be brougt up if quorum does not exist.
Tatsuo Ishii [Tue, 3 Sep 2019 04:14:47 +0000 (13:14 +0900)]
Fix 028.watchdog_enable_consensus_with_half_of_the_votes test failed.
Path to pcp_watchdog_info used in the test script was not set.
Also set the path to pcp.conf.
Tatsuo Ishii [Sun, 1 Sep 2019 03:20:33 +0000 (12:20 +0900)]
Add regression test for enable_consensus_with_half_votes.
This is a test for 2, 3, 4 watchdog nodes, with
enable_consensus_with_half_votes is on/off cases. In each test, half
of watchdog nodes are shut down and check to see if quorum exists (or
on the edge for even number nodes).
Tatsuo Ishii [Sun, 1 Sep 2019 02:38:35 +0000 (11:38 +0900)]
Fix pgpool_setup to reflect the -p (baseport) to ORIGBASEPORT variable.
Otherwise, shutdown generated script by pgpool_setup does not use
proper port number for netstat command.
Muhammad Usama [Thu, 29 Aug 2019 15:34:13 +0000 (20:34 +0500)]
Add parameter enable_consensus_with_half_votes to configure majority rule calculations
Pgpool-II takes the decision of quorum existence and failover consensus after
receiving the exact 50% of votes when the watchdog cluster is configured with
an even number of nodes. With enable_consensus_with_half_votes parameter,
users can tell Pgpool-II, whether the distributed consensus in an even number
of nodes cluster requires (n/2) or ((n/2) +1) votes to decide the majority.
The patch is drafted by "Tatsuo Ishii <ishii@sraoss.co.jp>" and I have made
few modifications on top of that.
Tatsuo Ishii [Wed, 28 Aug 2019 05:48:11 +0000 (14:48 +0900)]
Fix pgpool_setup to deal with PostgreSQL 9.1.
"---data-checksums" was unconditionally added to initdb's arg but
PostgreSQL 9.1's initdb does not have the option. To solve the issue,
internal variable $PGVERSION now represents "major version" * 100:
e.g. 120 for PostgreSQL 12.x (including 12beta), 91 for PostgreSQL
9.1.x, so that pgpool_setup can check if the option can be added to
initdb options.
Tatsuo Ishii [Sun, 25 Aug 2019 01:54:33 +0000 (10:54 +0900)]
Doc: fix indentation in scripts.
Auto indentation by commit
2cb0bd3f8f236aeacfba37cd4d604893561bad52
broke indentation of scripts in <programlisting> tag.
Tatsuo Ishii [Sun, 25 Aug 2019 01:15:37 +0000 (10:15 +0900)]
Doc: fix typo in "What is Pgpool-II?" section.
Author: Alejandro Roman
Discussion: https://www.pgpool.net/pipermail/pgpool-hackers/2019-August/003392.html
Takuma Hoshiai [Mon, 19 Aug 2019 09:58:02 +0000 (18:58 +0900)]
Doc: Add Pgpool-II 4.1 japanese release note.
Muhammad Usama [Sun, 18 Aug 2019 11:29:39 +0000 (16:29 +0500)]
Fix for password authentication does not work in certain cases
Pgpool-II was only trying to get the password from the pool_passwd file when
backends were configured to use password authentication. And when the password
is not present in the pool_passwd file it was failing without trying to get the
password from the user even when clear text authentication was allowed on
frontend connections.
The fix is to use clear text password authentication with frontend to get the
user password (If it is allowed by the configuration) when the password for the
connecting user is not found in the pool_passwd file.
The issue was reported by "Tatsuo Ishii<ishii@sraoss.co.jp>"
Tatsuo Ishii [Fri, 16 Aug 2019 07:07:03 +0000 (16:07 +0900)]
Doc: mention about relcache_query_target in the performance chapter.
Tatsuo Ishii [Fri, 16 Aug 2019 06:17:43 +0000 (15:17 +0900)]
Update copyright year.
Tatsuo Ishii [Thu, 15 Aug 2019 07:50:17 +0000 (16:50 +0900)]
Fix watchdog_setup to not accept number of Pgpool-II being lower than 1.
Pgpool-II won't start up if other watchdog is not present.
Bo Peng [Fri, 16 Aug 2019 02:51:36 +0000 (11:51 +0900)]
Add "-I" option to "arping_cmd" command default setting.
Bo Peng [Fri, 16 Aug 2019 00:11:40 +0000 (09:11 +0900)]
Doc: add 4.0.6-3.4.25 release-note.
Tatsuo Ishii [Thu, 15 Aug 2019 00:32:17 +0000 (09:32 +0900)]
Fix temp table trace modules.
CLOBBER_FREED_MEMORY revealed bug with usage of list package.
Tatsuo Ishii [Wed, 14 Aug 2019 02:45:54 +0000 (11:45 +0900)]
Doc: fix tag mistake in sr_check_user note.
Tatsuo Ishii [Wed, 14 Aug 2019 00:14:36 +0000 (09:14 +0900)]
Fix memory leak.
Pointed out by Coverity.
Tatsuo Ishii [Tue, 13 Aug 2019 00:42:34 +0000 (09:42 +0900)]
Fix test failure of extended-query-test/disable-load-balance-while-function.data.
The test failed because CREATE function propagation to standby took a while.
To fix this, add "SELECT pg_sleep(5)" to wait for standby synced with primary.
Also expected file is updated to reflect this.
2) CREATE function propagation to standby took a while
Tatsuo Ishii [Sun, 11 Aug 2019 10:19:02 +0000 (19:19 +0900)]
Fix extra test failure of extended-query-test/disable-load-balance-off.
There were two reasons why the test fails.
1) when disable_load_balance_on_write = off, parse_before_bind should
have not be called at all. So fix this.
2) the test data tests/disable-load-balance-off.data did not set
disable_load_balance_on_write = off.
Tatsuo Ishii [Sun, 11 Aug 2019 09:58:51 +0000 (18:58 +0900)]
Fix extra test failure of extended-query-test/disable-load-balance-default.
It expected the first BIND/EXECUTE was sent to load balance node. This
is not correct since previous parse message was sent to primary node
because of parse_before_bind treatment. Fix is, just adjust the test
script. Also fix comment in tests/disable-load-balance-default.data.
Tatsuo Ishii [Sun, 11 Aug 2019 02:27:58 +0000 (11:27 +0900)]
Fix test failure of extended-query-test/disable-load-balance-always.
It expected the first SELECT to be sent to load balance node but a
preceding write query (DROP/CREATE TABLE) prevented it because it set
the writing_transaction flag. Fix is, instead of issuing DROP/CREATE
TABLE before the SELECT, issue harmless SET command after the SELECT
in extended query mode.
Tatsuo Ishii [Sat, 10 Aug 2019 23:34:33 +0000 (08:34 +0900)]
Fix extra test scripts to not fail.
Now SELECT version() is always issued, this makes the script confused
because they extra lines in question by using "grep SELECT". To avoid
the confusion, add "grep -v version" into the command pipe line.
Tatsuo Ishii [Fri, 9 Aug 2019 08:04:28 +0000 (17:04 +0900)]
Fix "unable to bind. cannot get parse message" error.
This was caused by too-eager memory free in parse_before_bind. It
called
pool_remove_sent_message/pool_create_sent_message/pool_add_sent_message
combo to replace the query context in the sent message. Unfortunately
pool_remove_sent_message free memory such as statement name, which was
being passed by caller. As a result, the new sent message created by
pool_create_sent_message pointed to freed statement name, which may
make a search by statement name fail because now the statement name in
the sent message points to freed memory area, which might be
overwritten by later memory allocation. Fix is, instead of calling
pool_remove_sent_message etc., just replace the query context in the
sent message.
Per bug 531.
Tatsuo Ishii [Fri, 9 Aug 2019 05:50:56 +0000 (14:50 +0900)]
Create PostgreSQL version cache as early as possible.
Since once error query is issued, the query to create version cache
(SELECT version()) is ignored and it leads to failure in creating
version cache. Fix is, to create version cache after query context is
created and before user query is sent in SimpleQuery() and Parse().
Tatsuo Ishii [Fri, 9 Aug 2019 01:20:53 +0000 (10:20 +0900)]
Doc: update 4.1 release note.
For these commits: 2019-05-27 [
33df0d33], 2019-08-08 [
3922c12c].
Muhammad Usama [Thu, 8 Aug 2019 13:50:51 +0000 (18:50 +0500)]
Fix for
0000483: online-recovery is blocked after a child process exits ...
The problem is if some child process exits abnormally during the second stage
of online recovery, then the connection counter that keeps the track of exiting
processes does not get decremented and Pgpool-II keeps waiting for the exit of
the already exited process. Eventually, the recovery fails after
client_idle_limit_in_recovery expires.
The fix for this issue is to set the connection counter to zero when
client_idle_limit_in_recovery is enabled and it has less value than
recovery_timeout, Since all clients must have been kicked out by the time
when client_idle_limit_in_recovery expires.
A similar fix is already committed as part of bug 431 by Tatsuo Ishii, So this
commit basically imports the same logic in the watchdog function that processes
the remote online recovery requests.
Apart from the above-mentioned change, Hoshiai San identified that the watchdog
IPC command timeout for the online recovery start functions executed through
watchdog is set exactly to the same as recovery_timeout which needs to be
increased to make the solution work correctly.
Tatsuo Ishii [Thu, 8 Aug 2019 07:35:22 +0000 (16:35 +0900)]
Doc: run auto indent using emacs.
Here is the emacs script F.Y.I.
;; must be run by emacs
(load "/home/t-ishii/.emacs.d/init.el")
(find-file (nth 0 command-line-args-left));
(indent-region (point-min) (point-max));
(save-buffer)
Bo Peng [Thu, 8 Aug 2019 06:35:36 +0000 (15:35 +0900)]
Doc: Update "Pgpool-II + Watchdog Setup Example" configuration example.
Muhammad Usama [Wed, 7 Aug 2019 15:22:01 +0000 (20:22 +0500)]
Fix for no primary on standby pgpool when primary is quarantined on master
Master watchdog Pgpool sends primary_node_id = -1 in the backend status sync
message if the primary node is quarantined on it. So standby watchdog Pgpool
must not update its primary_node_id if the primary backend node id in sync
message is invalid_node_id (-1) while the same sync message reports the
backend status of the current primary node as "NOT DOWN".
The issue was reported by "Tatsuo Ishii <ishii@sraoss.co.jp>" and fixed by me
Tatsuo Ishii [Thu, 8 Aug 2019 05:55:53 +0000 (14:55 +0900)]
Doc; mention quorum faiover introduced in 3.7
Also fix indentation.
Tatsuo Ishii [Thu, 8 Aug 2019 05:44:11 +0000 (14:44 +0900)]
Doc: fix indentation.
Also remove unnecessary xref label of sect2.
Tatsuo Ishii [Thu, 8 Aug 2019 02:38:02 +0000 (11:38 +0900)]
Make waiting for TIME_WAIT in pgpool_setup optional.
Since commit
3b32bc4e583da700cc8df7c5777e90341655ad3b the shutdownall
script generated by pgpool_setup waits for Pgpool-II socket in
TIME_WAIT state disappeared. However in most cases this takes long
time and it makes uncomfortable for developer's testing works.
This commit makes the wait to be optional: unless environment variable
"CHECK_TIME_WAIT" is set to other than "false", it never waits for the
TIME_WAIT state.
Tatsuo Ishii [Thu, 8 Aug 2019 02:02:50 +0000 (11:02 +0900)]
Import some of memory manager debug facilities from PostgreSQL.
Now we can use CLOBBER_FREED_MEMORY, which is useful to detect
accesses to already pfreed memory.
Tatsuo Ishii [Thu, 8 Aug 2019 02:00:35 +0000 (11:00 +0900)]
Enhance extended query test driver.
- Change diff format using context diff.
- Suppress diffs related to message line number changes.
- Fix indentation.
Bo Peng [Thu, 8 Aug 2019 00:48:43 +0000 (09:48 +0900)]
Remove some code that was forgotten to be deleted in a previous commit.
Bo Peng [Thu, 8 Aug 2019 00:45:00 +0000 (09:45 +0900)]
Add new arguments in pgpool_recovery function and failover_command/failback_command/follow_master_command.
Now able to use "recovery node port number" in pgpool_recovery function.
Also the following options is added in failover_command/failback_command/follow_master_command.
- %N = old primary node hostname
- %S = old primary node port number
Bo Peng [Thu, 8 Aug 2019 00:43:51 +0000 (09:43 +0900)]
Revert "Add new arguments in pgpool_recovery function and failover_command/failback_command/follow_master_command."
This reverts commit
25a4237c9bc8db33f6710df8e43b285f36751038.
Bo Peng [Thu, 8 Aug 2019 00:26:36 +0000 (09:26 +0900)]
Add new arguments in pgpool_recovery function and failover_command/failback_command/follow_master_command.
Now able to use "recovery node port number" in pgpool_recovery function.
Also the following options is added in failover_command/failback_command/follow_master_command.
- %N = old primary node hostname
- %S = old primary node port number