Introduction to systemd’s Restart Policy
I’ve worked with systemd for years, and one of its most useful features is the ability to automatically restart services that fail or exit unexpectedly. This is all controlled by the restart policy, which can be customized using the Restart directive in systemd service files. However, I’ve seen this go wrong when a service is restarted repeatedly in a short period of time, leading to unintended consequences. To mitigate this, systemd provides two directives: RestartSec and StartLimitBurst.
Understanding RestartSec and StartLimitBurst
The real trick is understanding how these two directives work together. RestartSec specifies the time interval between restarts of a service. If a service is restarted more frequently than this interval, systemd will delay the next restart. For example, if RestartSec is set to 10 seconds, and a service is restarted twice within a 5-second period, the next restart will be delayed by 5 seconds. Don’t bother with very short intervals, though - they can lead to more problems than they solve.
StartLimitBurst is also crucial, as it specifies the maximum number of starts of a service within a certain time interval (defined by StartLimitIntervalSec). If this limit is exceeded, the service will be placed in a failed state and will not be restarted. This is where people usually get burned - they set StartLimitBurst too high, and their service ends up in a failed state when it’s needed most.
To illustrate the difference between these two directives, consider a service that is configured to restart on failure, with RestartSec set to 10 seconds and StartLimitBurst set to 5. If the service fails 5 times within a 1-minute period, it will be restarted 5 times, with a 10-second delay between each restart. However, if it fails a 6th time within the same minute, it will be placed in a failed state and will not be restarted.
Configuring RestartSec and StartLimitBurst
In practice, configuring RestartSec and StartLimitBurst is relatively straightforward. You need to edit the systemd service file for the service in question. For example, to configure the httpd service, you would edit the /etc/systemd/system/httpd.service file (or /lib/systemd/system/httpd.service on some distributions).
Here is an example of how you might configure RestartSec and StartLimitBurst for the httpd service:
[Service]
Restart=always
RestartSec=10
StartLimitBurst=5
StartLimitIntervalSec=1min
I usually start with a RestartSec value of 10 seconds and adjust from there, depending on the specific needs of the service.
Security Considerations
When configuring RestartSec and StartLimitBurst, security is a top concern. A service that is restarted repeatedly in a short period of time may be vulnerable to denial-of-service (DoS) attacks. By configuring RestartSec and StartLimitBurst carefully, you can help prevent such attacks. For example, setting RestartSec to a very short interval can leave your service open to DoS attacks, so it’s best to set it to a longer interval, like 10 seconds.
Troubleshooting
If you’re experiencing issues with a service that is being restarted repeatedly, the systemctl command is your best friend. You can use systemctl status to view the current status of the service, and systemctl journal to view the systemd journal logs for the service.
Here is an example of how you might use systemctl to troubleshoot the httpd service:
systemctl status httpd
systemctl journal -u httpd
By using these commands, you can gain insight into the restart history of the service and identify any potential issues that may be causing the service to fail or restart repeatedly.
For more information on systemd and its configuration options, see the systemd documentation.
See also
- Taming Shared Directory Chaos with Setgid and Sticky Bits
- Taming Persistent Network Interface Names on Linux Laptops
- Taming Log Noise with systemd's Built-in Journalctl Filters and Priorities
- Taming Wildcard DNS Queries with systemd-resolved and resolv.conf
- Taming Log Noise with journalctl and a Little Help from jq