You may not realize that I got my start in the technology world in the early 1990s learning Unix. From my first Netcom account to my high school allowing me, A JUNIOR, to have and run a Slackware machine directly connected to the Internet. My first Linux kernel was a 1.2.8 kernel, and I vividly remember someone trolling me on IRC to fix some strange problem the best bet was to rm -rf /*
as root.
He got me good.
Working through *nix in those days was made easier (and still is today) through Manual Pages, or manpages for short. It’s the thing that people would expect you to go do when they replied “RTFM” when you asked a question that was clearly pointed out in the manpage. You type man <command> and you get a fully formatted text usage guide. Some can be long, some are quite short. By reading the manpage, you should have a good idea on how to use that particular piece of software on your *nix machine.
I recently worked to contribute to an open source package to address and issue I’m having with a remote server and how it phones home. Specifically, when internet connections would go down, sometimes it would get stuck in a loop until it was rebooted, emailing me faithfully—EVERY HOUR—that the task of reconnecting failed because there was already a reconnection in process.
It was stuck.
So I started poking around. Was there a way in a BASH script to put a time limit on the execution of a command so that it could be killed if that time limit exceeded? Why yes, a tool that is part of the GNU CoreUtils package will do exactly this. The tool is called timeout.
So what do I do first? Pull up the manpage. This looks like it will work perfectly! But upon trying it, I got an error message. It was not working. I was attempting to pass the -k flag with a 10 second limit, but no joy. After banging my head on the desk a little bit and honing my Google-fu, I realized that the manpage does not accurately represent the functionality of the tool. When using the -k flag, you must immediately pass a timeout duration for that signal. The tool will first send SIGINT, and if the process does not quit, after the duration you pass into -k, it will send a SIGKILL. This is trying to gracefully shut down the process before killing it. You can also send in other signals with -s.
According to my read of the docs, this should work: timeout -k 10s ping example.com
But in order to send a SIGKILL to the command, you have to also send a second timeout for sending SIGKILL: timeout -k 10s 10s ping example.com
Otherwise, timeout will give you this error: timeout: invalid time interval 'ping‘
You can obviously replace ping
with the actual command you are trying to run.
I’m posting this not only for my own documentation so I can refer back if I get stuck here again, but also for others who are trying to use that tool and realizing the help docs don’t match the functionality of the tool. Edit: Also yes, I did file a bug report, and it was closed because the code is doing what it is supposed to, but the manpage is a little ambiguous.
Edit 2: As it turns out after some back and forth on the bug report, it’s just my poor understanding on how the tool actually works combined with a lack of examples and some ambiguous language in the manpage.
I.e., my bad. Not a bug.
After looking at the code again and hearing back from the maintainers, there isn’t a logic bug but more confusion on my part between -s, -k, and the order of steps in what timeout is actually doing. In the pull request to the synology-scripts repo, the maintainer suggested a couple of other options and I think we’re in a good place as far as what the go-forward plan is. In retrospect, clarification on the manpage and usage is probably the best thing to do, plus adding in examples at the bottom of the manpage. They would be something like this:
EXAMPLES
timeout 2s sleep 10
Send SIGTERM to sleep after 2 seconds.
timeout -s 9 2s sleep 10
Send SIGKILL to sleep after 2 seconds.
timeout -k 10s 1d ping -i 60 example.com
Send SIGTERM to ping after one day of data, and if the process
does not end within 10 seconds, send SIGKILL.
timeout --preserve-status 15m <some_process_that_may_get_stuck>
Send SIGTERM to a some process after 15 minutes, but preserve
the exit code of that process.
Possibly Related Posts:
- Let’s Encrypt for non-webservers
- Aviation Apps I Use
- Sellers Buying 5-Star Amazon Reviews
- Introducing Where To Now
- Brando’s Rules for Success