Quantcast
Channel: PostgreSQL + Docker: incorrect resource manager data checksum in record at 46F/6A7B6D28 - Database Administrators Stack Exchange
Viewing all articles
Browse latest Browse all 2

PostgreSQL + Docker: incorrect resource manager data checksum in record at 46F/6A7B6D28

$
0
0

I'm running PostgreSQL 9.3.9, Docker 1.8.2 and Ubuntu 14.04. I have an issue where my hot standby keeps failing with the following error message:

incorrect resource manager data checksum in record at 46F/6A7B6D28

After turning on DEBUG2, this is now the log I see:

2015-10-02 06:56:34.033 UTCDEBUG:  sending write 477/1E9C9990 flush 477/1E9C6700 apply 477/1E9C67002015-10-02 06:56:34.078 UTCDEBUG:  sending write 477/1E9C9990 flush 477/1E9C9990 apply 477/1E9C67002015-10-02 06:56:34.078 UTCDEBUG:  sendtime 2015-10-02 06:56:34.027356+00 receipttime 2015-10-02 06:56:34.078378+00 replication apply delay 0 ms transfer latency 51 ms2015-10-02 06:56:34.078 UTCLOG:  incorrect resource manager data checksum in record at 477/1E9C84882015-10-02 06:56:34.078 UTCDEBUG:  sending write 477/1E9CE560 flush 477/1E9C9990 apply 477/1E9C84882015-10-02 06:56:34.095 UTCDEBUG:  sending write 477/1E9CE560 flush 477/1E9CE560 apply 477/1E9C84882015-10-02 06:56:34.095 UTCFATAL:  terminating walreceiver process due to administrator command2015-10-02 06:56:34.195 UTCDEBUG:  switched WAL source from stream to archive after failure2015-10-02 06:56:34.195 UTCLOG:  incorrect resource manager data checksum in record at 477/1E9C84882015-10-02 06:56:34.195 UTCDEBUG:  switched WAL source from archive to stream after failure2015-10-02 06:56:34.196 UTCLOG:  incorrect resource manager data checksum in record at 477/1E9C84882015-10-02 06:56:39.200 UTCDEBUG:  switched WAL source from stream to archive after failure2015-10-02 06:56:39.200 UTCDEBUG:  incorrect resource manager data checksum in record at 477/1E9C84882015-10-02 06:56:39.200 UTCDEBUG:  switched WAL source from archive to stream after failure

Or this log upon failure:

2015-10-02 00:55:19.838 UTCDEBUG:  sendtime 2015-10-02 00:55:19.836191+00 receipttime 2015-10-02 00:55:19.838961+00 replication apply delay 0 ms transfer latency 2 ms2015-10-02 00:55:19.839 UTCDEBUG:  sending write 476/E48058A0 flush 476/E4805798 apply 476/E48057982015-10-02 00:55:19.878 UTCDEBUG:  attempting to remove WAL segments older than log file 0000000000000472000000FB2015-10-02 00:55:19.878 UTCDEBUG:  sending write 476/E48058A0 flush 476/E48058A0 apply 476/E48057982015-10-02 00:55:19.879 UTCDEBUG:  SlruScanDirectory invoking callback on pg_multixact/offsets/00002015-10-02 00:55:19.879 UTCDEBUG:  SlruScanDirectory invoking callback on pg_multixact/offsets/00022015-10-02 00:55:19.879 UTCDEBUG:  SlruScanDirectory invoking callback on pg_multixact/offsets/00012015-10-02 00:55:19.879 UTCDEBUG:  SlruScanDirectory invoking callback on pg_multixact/members/00002015-10-02 00:55:19.879 UTCDEBUG:  SlruScanDirectory invoking callback on pg_multixact/members/00042015-10-02 00:55:19.879 UTCDEBUG:  SlruScanDirectory invoking callback on pg_multixact/members/00022015-10-02 00:55:19.879 UTCDEBUG:  SlruScanDirectory invoking callback on pg_multixact/members/00012015-10-02 00:55:19.879 UTCDEBUG:  SlruScanDirectory invoking callback on pg_multixact/members/00032015-10-02 00:55:19.879 UTCDEBUG:  SlruScanDirectory invoking callback on pg_multixact/members/00052015-10-02 00:55:19.879 UTCDEBUG:  SlruScanDirectory invoking callback on pg_multixact/offsets/00002015-10-02 00:55:19.879 UTCDEBUG:  SlruScanDirectory invoking callback on pg_multixact/offsets/00022015-10-02 00:55:19.879 UTCDEBUG:  SlruScanDirectory invoking callback on pg_multixact/offsets/00012015-10-02 00:55:19.879 UTCDEBUG:  SlruScanDirectory invoking callback on pg_subtrans/0D6D2015-10-02 00:55:19.879 UTCDEBUG:  recovery restart point at 476/E410F7E02015-10-02 00:55:19.879 UTCDETAIL:  last completed transaction was at log time 2015-10-02 00:55:19.835728+002015-10-02 00:55:20.177 UTCDEBUG:  sendtime 2015-10-02 00:55:20.174799+00 receipttime 2015-10-02 00:55:20.177888+00 replication apply delay 0 ms transfer latency 3 ms2015-10-02 00:55:20.177 UTCDEBUG:  sending write 476/E4806570 flush 476/E48058A0 apply 476/E48058A02015-10-02 00:55:20.211 UTCDEBUG:  sending write 476/E4806570 flush 476/E4806570 apply 476/E48058A02015-10-02 00:55:20.212 UTCDEBUG:  sendtime 2015-10-02 00:55:20.181054+00 receipttime 2015-10-02 00:55:20.212069+00 replication apply delay 0 ms transfer latency 31 ms2015-10-02 00:55:20.212 UTCDEBUG:  sending write 476/E4809808 flush 476/E4806570 apply 476/E48065702015-10-02 00:55:20.254 UTCDEBUG:  sending write 476/E4809808 flush 476/E4809808 apply 476/E48065702015-10-02 00:55:20.254 UTCDEBUG:  sendtime 2015-10-02 00:55:20.182113+00 receipttime 2015-10-02 00:55:20.254256+00 replication apply delay 0 ms transfer latency 72 ms2015-10-02 00:55:20.254 UTCDEBUG:  sendtime 2015-10-02 00:55:20.195848+00 receipttime 2015-10-02 00:55:20.254304+00 replication apply delay 0 ms transfer latency 58 ms2015-10-02 00:55:20.254 UTCDEBUG:  sendtime 2015-10-02 00:55:20.226562+00 receipttime 2015-10-02 00:55:20.254326+00 replication apply delay 0 ms transfer latency 27 ms2015-10-02 00:55:20.254 UTCDEBUG:  sending write 476/E480F1D0 flush 476/E4809808 apply 476/E48098082015-10-02 00:55:20.271 UTCDEBUG:  sending write 476/E480F1D0 flush 476/E480F1D0 apply 476/E48098082015-10-02 00:55:20.479 UTCDEBUG:  sendtime 2015-10-02 00:55:20.476873+00 receipttime 2015-10-02 00:55:20.479711+00 replication apply delay 0 ms transfer latency 2 ms2015-10-02 00:55:20.479 UTCDEBUG:  sending write 476/E480F2D8 flush 476/E480F1D0 apply 476/E480F1D02015-10-02 00:55:20.534 UTCDEBUG:  sending write 476/E480F2D8 flush 476/E480F2D8 apply 476/E480F1D02015-10-02 00:55:20.534 UTCLOG:  record with zero length at 476/E480F1D02015-10-02 00:55:20.534 UTCDEBUG:  sendtime 2015-10-02 00:55:20.477442+00 receipttime 2015-10-02 00:55:20.534532+00 replication apply delay 0 ms transfer latency 57 ms2015-10-02 00:55:20.534 UTCDEBUG:  sendtime 2015-10-02 00:55:20.485564+00 receipttime 2015-10-02 00:55:20.534891+00 replication apply delay 0 ms transfer latency 49 ms2015-10-02 00:55:20.534 UTCDEBUG:  sending write 476/E4814EF0 flush 476/E480F2D8 apply 476/E480F1D02015-10-02 00:55:20.575 UTCDEBUG:  sending write 476/E4814EF0 flush 476/E4814EF0 apply 476/E480F1D02015-10-02 00:55:20.575 UTCFATAL:  terminating walreceiver process due to administrator command2015-10-02 00:55:20.675 UTCDEBUG:  switched WAL source from stream to archive after failure2015-10-02 00:55:20.675 UTCLOG:  record with zero length at 476/E480F1D02015-10-02 00:55:20.675 UTCDEBUG:  switched WAL source from archive to stream after failure

The first time I got this message I read something which said I should regenerate the hot standby from the master. So I rsync'd everything back over and restarted. The hot standby connects to the master over a TINC VPN connection, tt connected successfully and was streaming changes.

It all seemed to be working fine until around 8am. This is when our system starts to get a bit busier. Then all I see in the logs is the above message and the last WAL file on the hotstandby is 000000010000046F0000006A. If I restart the hot standby, it will start catching up with the master, but it also spikes my CPU and brings everything grinding to a halt, and it never seems to catch up anyway.

I have just migrated the DB and hot standby to two new servers, and I am using the simple configuration from the previous servers on these new machines. On the previous servers, the databases were not dockerized, on this they are. I have other dockerized PostgreSQL installations which are running the replication fine, but these servers do not have as high a load as this one. I even replicated from the new master to the old, non-dockerized PostgreSQL server which used to be the main server. The same thing happened here. I had been using the replication between the old master and hot standby just fine over TINC VPN before the migration. So I don't think Docker or TINC are at fault.

Here is the PostgreSQL configuration for both master and hot standby:

listen_addresses          =  '*'port                      =  5432max_connections           =  600shared_buffers            =  8GBeffective_cache_size      =  16GBwork_mem                  =  10MBmaintenance_work_mem      =  1638GBmax_locks_per_transaction =  256hot_standby       =  onarchive_mode      =  onwal_level         =  'hot_standby'max_wal_senders   =  10wal_keep_segments =  1000     # 80 GB required on pg_xlogarchive_command   =  'true'

And the additional recovery.conf on the hot standby:

standby_mode = 'on'primary_conninfo = 'host=MY_IP port=MY_PORT user=MY_USER password=MY_PASSWORD'

Any help would be greatly appreciated! I need my hot standby!


Viewing all articles
Browse latest Browse all 2

Trending Articles



<script src="https://jsc.adskeeper.com/r/s/rssing.com.1596347.js" async> </script>