18.5. Write Ahead Log
For additional information on tuning these settings, see Section 29.4.
18.5.1. Settings
wal_level(enum)wal_leveldetermines how much information is written to the WAL. The default value isminimal, which writes only the information needed to recover from a crash or immediate shutdown.archiveadds logging required for WAL archiving;hot_standbyfurther adds information required to run read-only queries on a standby server; and, finallylogicaladds information necessary to support logical decoding. Each level includes the information logged at all lower levels. This parameter can only be set at server start.In
minimallevel, WAL-logging of some bulk operations can be safely skipped, which can make those operations much faster (see Section 14.4.7). Operations in which this optimization can be applied include:CREATE TABLE ASCREATE INDEXCLUSTERCOPYinto tables that were created or truncated in the same transactionBut minimal WAL does not contain enough information to reconstruct the data from a base backup and the WAL logs, so
archiveor higher must be used to enable WAL archiving (archive_mode) and streaming replication.In
hot_standbylevel, the same information is logged as witharchive, plus information needed to reconstruct the status of running transactions from the WAL. To enable read-only queries on a standby server,wal_levelmust be set tohot_standbyor higher on the primary, and hot_standby must be enabled in the standby. It is thought that there is little measurable difference in performance between usinghot_standbyandarchivelevels, so feedback is welcome if any production impacts are noticeable.In
logicallevel, the same information is logged as withhot_standby, plus information needed to allow extracting logical change sets from the WAL. Using a level oflogicalwill increase the WAL volume, particularly if many tables are configured forREPLICA IDENTITY FULLand manyUPDATEandDELETEstatements are executed.fsync(boolean)If this parameter is on, the Postgres Pro server will try to make sure that updates are physically written to disk, by issuing
fsync()system calls or various equivalent methods (see wal_sync_method). This ensures that the database cluster can recover to a consistent state after an operating system or hardware crash.While turning off
fsyncis often a performance benefit, this can result in unrecoverable data corruption in the event of a power failure or system crash. Thus it is only advisable to turn offfsyncif you can easily recreate your entire database from external data.Examples of safe circumstances for turning off
fsyncinclude the initial loading of a new database cluster from a backup file, using a database cluster for processing a batch of data after which the database will be thrown away and recreated, or for a read-only database clone which gets recreated frequently and is not used for failover. High quality hardware alone is not a sufficient justification for turning offfsync.For reliable recovery when changing
fsyncoff to on, it is necessary to force all modified buffers in the kernel to durable storage. This can be done while the cluster is shutdown or while fsync is on by runninginitdb --sync-only, runningsync, unmounting the file system, or rebooting the server.In many situations, turning off synchronous_commit for noncritical transactions can provide much of the potential performance benefit of turning off
fsync, without the attendant risks of data corruption.fsynccan only be set in thepostgresql.conffile or on the server command line. If you turn this parameter off, also consider turning off full_page_writes.synchronous_commit(enum)Specifies whether transaction commit will wait for WAL records to be written to disk before the command returns a “success” indication to the client. Valid values are
on,remote_write,local, andoff. The default, and safe, setting ison. Whenoff, there can be a delay between when success is reported to the client and when the transaction is really guaranteed to be safe against a server crash. (The maximum delay is three times wal_writer_delay.) Unlike fsync, setting this parameter tooffdoes not create any risk of database inconsistency: an operating system or database crash might result in some recent allegedly-committed transactions being lost, but the database state will be just the same as if those transactions had been aborted cleanly. So, turningsynchronous_commitoff can be a useful alternative when performance is more important than exact certainty about the durability of a transaction. For more discussion see Section 29.3.If synchronous_standby_names is set, this parameter also controls whether or not transaction commits will wait for the transaction's WAL records to be replicated to the standby server. When set to
on, commits will wait until a reply from the current synchronous standby indicates it has received the commit record of the transaction and flushed it to disk. This ensures the transaction will not be lost unless both primary and standby suffer corruption of their database storage. When set toremote_write, commits will wait until a reply from the current synchronous standby indicates it has received the commit record of the transaction and written it out to the standby's operating system, but the data has not necessarily reached stable storage on the standby. This setting is sufficient to ensure data preservation even if the standby instance of Postgres Pro were to crash, but not if the standby suffers an operating-system-level crash.When synchronous replication is in use, it will normally be sensible either to wait for both local flush to disk and replication of WAL records, or to allow the transaction to commit asynchronously. However, the setting
localis available for transactions that wish to wait for local flush to disk, but not synchronous replication. Ifsynchronous_standby_namesis not set, the settingson,remote_writeandlocalall provide the same synchronization level: transaction commits only wait for local flush to disk.This parameter can be changed at any time; the behavior for any one transaction is determined by the setting in effect when it commits. It is therefore possible, and useful, to have some transactions commit synchronously and others asynchronously. For example, to make a single multistatement transaction commit asynchronously when the default is the opposite, issue
SET LOCAL synchronous_commit TO OFFwithin the transaction.wal_sync_method(enum)Method used for forcing WAL updates out to disk. If
fsyncis off then this setting is irrelevant, since WAL file updates will not be forced out at all. Possible values are:open_datasync(write WAL files withopen()optionO_DSYNC)fdatasync(callfdatasync()at each commit)fsync(callfsync()at each commit)fsync_writethrough(callfsync()at each commit, forcing write-through of any disk write cache)open_sync(write WAL files withopen()optionO_SYNC)
The
open_* options also useO_DIRECTif available. Not all of these choices are available on all platforms. The default is the first method in the above list that is supported by the platform, except thatfdatasyncis the default on Linux. The default is not necessarily ideal; it might be necessary to change this setting or other aspects of your system configuration in order to create a crash-safe configuration or achieve optimal performance. These aspects are discussed in Section 29.1. This parameter can only be set in thepostgresql.conffile or on the server command line.full_page_writes(boolean)When this parameter is on, the Postgres Pro server writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint. This is needed because a page write that is in process during an operating system crash might be only partially completed, leading to an on-disk page that contains a mix of old and new data. The row-level change data normally stored in WAL will not be enough to completely restore such a page during post-crash recovery. Storing the full page image guarantees that the page can be correctly restored, but at the price of increasing the amount of data that must be written to WAL. (Because WAL replay always starts from a checkpoint, it is sufficient to do this during the first change of each page after a checkpoint. Therefore, one way to reduce the cost of full-page writes is to increase the checkpoint interval parameters.)
Turning this parameter off speeds normal operation, but might lead to either unrecoverable data corruption, or silent data corruption, after a system failure. The risks are similar to turning off
fsync, though smaller, and it should be turned off only based on the same circumstances recommended for that parameter.Turning off this parameter does not affect use of WAL archiving for point-in-time recovery (PITR) (see Section 24.3).
This parameter can only be set in the
postgresql.conffile or on the server command line. The default ison.wal_log_hints(boolean)When this parameter is
on, the Postgres Pro server writes the entire content of each disk page to WAL during the first modification of that page after a checkpoint, even for non-critical modifications of so-called hint bits.If data checksums are enabled, hint bit updates are always WAL-logged and this setting is ignored. You can use this setting to test how much extra WAL-logging would occur if your database had data checksums enabled.
This parameter can only be set at server start. The default value is
off.wal_compression(boolean)When this parameter is
on, the Postgres Pro server compresses a full page image written to WAL when full_page_writes is on or during a base backup. A compressed page image will be decompressed during WAL replay. The default value isoff. Only superusers can change this setting.Turning this parameter on can reduce the WAL volume without increasing the risk of unrecoverable data corruption, but at the cost of some extra CPU spent on the compression during WAL logging and on the decompression during WAL replay.
wal_buffers(integer)The amount of shared memory used for WAL data that has not yet been written to disk. The default setting of -1 selects a size equal to 1/32nd (about 3%) of shared_buffers, but not less than
64kBnor more than the size of one WAL segment, typically16MB. This value can be set manually if the automatic choice is too large or too small, but any positive value less than32kBwill be treated as32kB. This parameter can only be set at server start.The contents of the WAL buffers are written out to disk at every transaction commit, so extremely large values are unlikely to provide a significant benefit. However, setting this value to at least a few megabytes can improve write performance on a busy server where many clients are committing at once. The auto-tuning selected by the default setting of -1 should give reasonable results in most cases.
wal_writer_delay(integer)Specifies the delay between activity rounds for the WAL writer. In each round the writer will flush WAL to disk. It then sleeps for
wal_writer_delaymilliseconds, and repeats. The default value is 200 milliseconds (200ms). Note that on many systems, the effective resolution of sleep delays is 10 milliseconds; settingwal_writer_delayto a value that is not a multiple of 10 might have the same results as setting it to the next higher multiple of 10. This parameter can only be set in thepostgresql.conffile or on the server command line.commit_delay(integer)commit_delayadds a time delay, measured in microseconds, before a WAL flush is initiated. This can improve group commit throughput by allowing a larger number of transactions to commit via a single WAL flush, if system load is high enough that additional transactions become ready to commit within the given interval. However, it also increases latency by up tocommit_delaymicroseconds for each WAL flush. Because the delay is just wasted if no other transactions become ready to commit, a delay is only performed if at leastcommit_siblingsother transactions are active when a flush is about to be initiated. Also, no delays are performed iffsyncis disabled. The defaultcommit_delayis zero (no delay). Only superusers can change this setting.In PostgreSQL releases prior to 9.3,
commit_delaybehaved differently and was much less effective: it affected only commits, rather than all WAL flushes, and waited for the entire configured delay even if the WAL flush was completed sooner. Beginning in PostgreSQL 9.3, the first process that becomes ready to flush waits for the configured interval, while subsequent processes wait only until the leader completes the flush operation.commit_siblings(integer)Minimum number of concurrent open transactions to require before performing the
commit_delaydelay. A larger value makes it more probable that at least one other transaction will become ready to commit during the delay interval. The default is five transactions.
18.5.2. Checkpoints
checkpoint_timeout(integer)Maximum time between automatic WAL checkpoints, in seconds. The valid range is between 30 seconds and one hour. The default is five minutes (
5min). Increasing this parameter can increase the amount of time needed for crash recovery. This parameter can only be set in thepostgresql.conffile or on the server command line.checkpoint_completion_target(floating point)Specifies the target of checkpoint completion, as a fraction of total time between checkpoints. The default is 0.5. This parameter can only be set in the
postgresql.conffile or on the server command line.checkpoint_warning(integer)Write a message to the server log if checkpoints caused by the filling of checkpoint segment files happen closer together than this many seconds (which suggests that
max_wal_sizeought to be raised). The default is 30 seconds (30s). Zero disables the warning. No warnings will be generated ifcheckpoint_timeoutis less thancheckpoint_warning. This parameter can only be set in thepostgresql.conffile or on the server command line.max_wal_size(integer)Maximum size to let the WAL grow to between automatic WAL checkpoints. This is a soft limit; WAL size can exceed
max_wal_sizeunder special circumstances, like under heavy load, a failingarchive_command, or a highwal_keep_segmentssetting. The default is 1 GB. Increasing this parameter can increase the amount of time needed for crash recovery. This parameter can only be set in thepostgresql.conffile or on the server command line.min_wal_size(integer)As long as WAL disk usage stays below this setting, old WAL files are always recycled for future use at a checkpoint, rather than removed. This can be used to ensure that enough WAL space is reserved to handle spikes in WAL usage, for example when running large batch jobs. The default is 80 MB. This parameter can only be set in the
postgresql.conffile or on the server command line.
18.5.3. Archiving
archive_mode(enum)When
archive_modeis enabled, completed WAL segments are sent to archive storage by setting archive_command. In addition tooff, to disable, there are two modes:on, andalways. During normal operation, there is no difference between the two modes, but when set toalwaysthe WAL archiver is enabled also during archive recovery or standby mode. Inalwaysmode, all files restored from the archive or streamed with streaming replication will be archived (again). See Section 25.2.9 for details.archive_modeandarchive_commandare separate variables so thatarchive_commandcan be changed without leaving archiving mode. This parameter can only be set at server start.archive_modecannot be enabled whenwal_levelis set tominimal.archive_command(string)The local shell command to execute to archive a completed WAL file segment. Any
%pin the string is replaced by the path name of the file to archive, and any%fis replaced by only the file name. (The path name is relative to the working directory of the server, i.e., the cluster's data directory.) Use%%to embed an actual%character in the command. It is important for the command to return a zero exit status only if it succeeds. For more information see Section 24.3.1.This parameter can only be set in the
postgresql.conffile or on the server command line. It is ignored unlessarchive_modewas enabled at server start. Ifarchive_commandis an empty string (the default) whilearchive_modeis enabled, WAL archiving is temporarily disabled, but the server continues to accumulate WAL segment files in the expectation that a command will soon be provided. Settingarchive_commandto a command that does nothing but return true, e.g./bin/true(REMon Windows), effectively disables archiving, but also breaks the chain of WAL files needed for archive recovery, so it should only be used in unusual circumstances.archive_timeout(integer)The archive_command is only invoked for completed WAL segments. Hence, if your server generates little WAL traffic (or has slack periods where it does so), there could be a long delay between the completion of a transaction and its safe recording in archive storage. To limit how old unarchived data can be, you can set
archive_timeoutto force the server to switch to a new WAL segment file periodically. When this parameter is greater than zero, the server will switch to a new segment file whenever this many seconds have elapsed since the last segment file switch, and there has been any database activity, including a single checkpoint. (Increasingcheckpoint_timeoutwill reduce unnecessary checkpoints on an idle system.) Note that archived files that are closed early due to a forced switch are still the same length as completely full files. Therefore, it is unwise to use a very shortarchive_timeout— it will bloat your archive storage.archive_timeoutsettings of a minute or so are usually reasonable. You should consider using streaming replication, instead of archiving, if you want data to be copied off the master server more quickly than that. This parameter can only be set in thepostgresql.conffile or on the server command line.

