- Por André Guelfi Torres
- ·
- Publicado 16 Apr 2019
After writing more and more bash scripts for a client, I've decided to write down my thoughts about it.
This assumes you have some knowledge about bash, as it is not intended as a beginner's tutorial.
This is useful for 'alpine' docker images, as some alpine do not include bash.
The more general you need your script to be, the more you should prefer sh rather than bash. A source of generality can be making your script public (publish it), executing it under multiple environments, making it the installer for other tools, etc.
Do not assume that the current directory is the place for writing temporary files (or any file, for that matter).
For temporary files, use mktemp
, and for directories mktemp -d
$ man mktemp
MKTEMP(1) BSD General Commands Manual MKTEMP(1)
NAME
mktemp -- make temporary file name (unique)
DESCRIPTION
The mktemp utility takes each of the given file name templates and over-
writes a portion of it to create a file name. This file name is unique
and suitable for use by the application.
(remember to cleanup resources when your script exists - maybe use exit traps)
There are some resources that you need to remove / cleanup / close at the end of your script. Both when things go well as when they don't. Think of it as a (java) try-with-resources or try..catch..finally.
Bash offers trap
to perform this task:
trap arg signal
trap command signal
Taken from here
An example:
function finish {
# Your cleanup code here
}
trap finish EXIT
trap finish SIGQUIT
More information, and this example from here
This is more common with perl than with bash, as most bash installs are placed at /bin/bash
.
You can use /usr/bin/env bash
/ /usr/bin/env sh
to spawn a shell.
Usage:
#!/usr/bin/env bash
#rest of commands
add these options:
set -euxo pipefail
These can be added anywhere, but I usually add them after the shebang (the beginning of the script)
Reference: The set built-in
Another reference: the inspiration for these options comes from here
a brief note:
set -e
stops the execution if a command fails (this is the default behavior in make
)set -u
: Treat unset variables and parameters other than the special parameters ‘@’ or ‘*’ as an error when performing parameter expansion. An error message will be written to the standard error, and a non-interactive shell will exit.set -x
: debug. Trace the commands on the consoleset -o pipefail
: make the pipe command fail if any of the commands in the pipe fail. a|b|c
when a
fails, b will execute, the return value will be the one of b
a|b|c
when a
fails, b
will not execute, the return value will be the one of a
If you want to use a try...catch pattern, disable -e
temporarily:
set +e # 1
ls NON_EXISTING_FILE # 2
set -e # 3
I usually make my bash scripts as simple as possible (see Limitations), but even then, they fail often while building them.
For that reason, you can enable the 'debug' option permanently:
# Inside the script
set -x
Or just for one invocation:
# When invoking the script
bash -x myscript.sh
Note: your script will get the parameters in the same fashion as if executing ./myscript.sh
:
$ cat myscript.sh
echo $1
$ ./myscript.sh 1
1
$ bash -x myscript.sh 1
+ echo 1
1
A common pattern I use while building scripts is to prepare the command but do not execute it yet:
...
# prepare options, decide what to do
echo COMMAND_WITH_SIDE_EFFECTS
When I am sure that this is the desired command, usually after trying it manually on the console, I can remove the echo
:
...
# prepare options, decide what to do
COMMAND_WITH_SIDE_EFFECTS
You can use the previous pattern but as a feature of your script:
echo
to your final commandCOMMAND="rm -rf ./.git"
if [ $DRY_RUN ]; then
COMMAND="echo $COMMAND"
fi
$COMMAND
When some scripts grow in size and are not a script but an application, being more or less verbose is useful.
See curl
as an example:
$ curl localhost:8080
curl: (7) Failed to connect to localhost port 8080: Connection refused
$ curl -vvv localhost:8080
* Rebuilt URL to: localhost:8080/
* Trying ::1...
* connect to ::1 port 8080 failed: Connection refused
* Trying fe80::1...
* connect to fe80::1 port 8080 failed: Connection refused
* Trying 127.0.0.1...
* connect to 127.0.0.1 port 8080 failed: Connection refused
* Failed to connect to localhost port 8080: Connection refused
* Closing connection 0
Same with quiet mode, a mode to reduce verbosity.
Same with 'raw' mode, a mode to only print the raw output, maybe for consumption from another script.
Imagine a script that prints the first, second, and third received parameter, then all of them:
$ cat myscript.sh
echo "first=$1 second=$2 third=$3; all=$@"
The normal invocation:
$ ./myscript.sh 1 2 3
first=1 second=2 third=3; all=1 2 3
(everything works as expected)
now let's try strings (with spaces)
$ ./myscript.sh hello world
first=hello second=world third=; all=hello world
Ok, bash uses spaces to delimit words. Now that we know this, lets be careful.
We want to process some files (with spaces):
$ ls file*
file 1.txt file 2.txt
$ ./myscript.sh $(ls file*)
first=file second=1.txt third=file; all=file 1.txt file 2.txt
A defect appeared: I want "file 1.txt" to be a parameter, not two.
Let's imagine a script checking whether a file exists:
$ cat file_exists.sh
if [ -e $1 ]; then # -e is for file exists; see `man test`
echo "file $1 exists"
else
echo "file $1 does not exist"
fi
$ ls file*
file 1.txt file 2.txt file_exists.sh
$ ./file_exists.sh "file 1.txt"
./file_exists.sh: line 1: [: file: binary operator expected
file file 1.txt does not exist
Let's add quotes to the test to make it work with spaces:
$ cat file_exists.sh
if [ -e "$1" ]; then # note the quotes
echo "file $1 exists"
else
echo "file $1 does not exist"
fi
$ ./file_exists.sh "file 1.txt"
file file 1.txt exists
In general, be careful with spaces, as they mark the end of the string / parameter. Be proactive with quoting. From the google bash guide:
Also:
'$PATH' is literally $PATH
"$PATH" is the contents of the variable $PATH
If your script is a one-off thing, or will not suffer churn/modification, then feel free to discard this tip. On the other hand, if this script will be part of a critical path (e.g., deploying) or will be modified in the future, try to apply the SOLID principles that we apply for other pieces of software.
Especially the SRP (below)
I like to design my scripts by separating concerns or responsibilities.
One typical example: process many files at once:
$ cat s1.sh
#!/usr/bin/env bash
function find_files {
while IFS= read -r -d '' file; do
files+=( "$file" )
done < <(find . -maxdepth 1 -type f -iname "file*.txt" -print0)
}
function process_file {
file="$1"
echo "Will write to file $file"
}
function main {
declare -a files # this is a global variable inside the script
find_files
for file in "${files[@]}"; do
process_file "$file"
done
}
main
The main benefit is that iterating the files is something that usually does not fail (just copy paste the script), while the main work is done in process_file
. The two functions have different pace of change, therefore two responsibilities. The latter, I can test manually (on the REPL) until it works, then copy-paste the script (see 'How I write my scripts').
Its execution:
$ ls file*
file1.txt file2.txt
$ ./s1.sh
Will write to file ./file1.txt
Will write to file ./file2.txt
For more information on return values and functions in bash, see this article
Files in bash are read every time you invoke them. So if you separate the process_file
function to another file, you can change the contents of it while the long-running main script is working.
rm
This is common knowledge, but it can happen to any of us.
Removing files is a sharp-edged tool, such as DELETE
in SQL. This is why we SELECT
the same data set before deleting. Why we ls
files before rm
ing them.
Some operating systems now protect #rm -rf /
with another flag, but the mistake of #rm -rf $VARIABLE/*
where $VARIABLE
is empty is common enough.
To avoid the above mistake,
#!/usr/env/bin bash
set -euxo pipefail
cd $VARIABLE #this will fail if $VARIABLE is unbound
rm -rf ./* # notice the dot (.) before the star
cd - #go back to the previous folder
This will only delete files from the current directory down (./
), yet another level of protection.
Shell files can also be analyzed statically, (i.e., lint)). A tool for that is ShellCheck.
Shellcheck helps you locate possible errors, bugs, stylistic errors and suspicious constructs in your scripts.
The tool is large enough to warrant another article, but the basic usage is straightforward: run the linter with the shell script as input.
Some example run:
$ shellcheck sh1.sh
In sh1.sh line 22:
destination=${date}-$(basename $file)
^-- SC2086: Double quote to prevent globbing and word splitting.
In sh1.sh line 25:
git add $file
^-- SC2086: Double quote to prevent globbing and word splitting.
In sh1.sh line 34:
if [[ -z $(which imagemagick) ]]; then
^-- SC2230: which is non-standard. Use builtin 'command -v' instead.
Note: I use the tool with docker (see here, official docker image)
Usually, I design my scripts:
process_file
to receive a single element (i.e., the function passed to map
/ iterate). This is the hard partThis is a full example with code to plumb the candidate to the function.
I want to remove all the existing files in a directory that are greater in size than 30 KB. (I know this can be done with find -exec
or ls | xargs rm
, this is just an example for arbitrary logic).
First, on the REPL, find all the files:
$ ls -lh file*
-rw-r--r-- 1 user group 0B Jul 13 00:50 file1.txt
-rw-r--r-- 1 user group 0B Jul 13 00:50 file2.txt
-rw-r--r-- 1 user group 531K Jul 13 00:07 file3.txt
Find files greater than the desired size:
$ find . -maxdepth 1 -type f -iname "file*.txt" -size +30k -print0
./file3.txt%
now, only need to delete the file:
function process_file {
file="$1"
echo "rm $file" # 1
}
Note: #1 - Notice the echo
command to protect the real execution
First, I make sure that the plumbing code is all correct before executing commands with side effects (e.g., rm). If you are working with delicate data, you can consider working in a docker container.
Then, remove the "temporary dry-run mode":
function process_file {
file="$1"
rm $file
}
The full script:
$ cat s2.sh
#!/usr/bin/env bash
function find_files {
while IFS= read -r -d '' file; do
files+=( "$file" )
done < <(find . -maxdepth 1 -type f -iname "file*.txt" -size +30k -print0)
}
function process_file {
file="$1"
rm $file
}
function main {
declare -a files
find_files
for file in "${files[@]}"; do
process_file "$file"
done
}
main
This is a full example with a manual invocation to plumb the candidate to the function.:
process_file
to receive a single element (i.e., the function passed to map
/ iterate).First, on the REPL, find all the files:
$ ls -lh file*
-rw-r--r-- 1 user group 0B Jul 13 00:50 file1.txt
-rw-r--r-- 1 user group 0B Jul 13 00:50 file2.txt
-rw-r--r-- 1 user group 531K Jul 13 00:07 file3.txt
-rw-r--r-- 1 user group 531K Jul 13 00:07 file_SUPER_IMPORTANT_DO_NOT_DELETE.txt
Find files greater than the desired size:
$ find . -maxdepth 1 -type f -iname "file*.txt" -size +30k > candidates.txt
$ cat candidates.txt
./file3.txt
./file_SUPER_IMPORTANT_DO_NOT_DELETE.txt
Then, open vim to review, as a way of checking the valid candidates. This is the same process that git rebase --interactive
offers: a CLI command to rebase based on your editor.
I realize that the file file_SUPER_IMPORTANT_DO_NOT_DELETE.txt
should not be deleted. So I remove that, manually.
Now,
$ cat candidates.txt
./file3.txt
then I prefer to edit the file manually than to create a script. Remember, this is a one-off effort. And programs need to be maintained. One-off scripts are to be thrown away, so no maintenance effort.
Hint: the vim command %s/^/rm /
will insert at the beginning of the line the command rm
that we need. The command %s/$/;/
will append a semicolon at the end of the line. It's not needed for this example, but as a reminder. This replacement can also be done with sed
/awk
.
$ cat candidates.txt
rm ./file3.txt;
Now, just execute this file:
bash candidates.txt
And your files are processed. Gone, in this case.
Every tool (and metaphor) has its limits. Know when to use a tool and when to change tools.
Small scripts, simple invocations, etc.
One-off tasks are perfect for bash: write code, review effects, throw it away. Don't plan on reusing it. Although you can keep a collection of snippets for iterating, dealing with spaces, etc.
More than 50-100 bash lines (a rough approximation), I consider a small program already. Maybe start thinking on building a better foundation around it.
With my current knowledge of bash, I feel that some jobs are not appropriate for bash. For example, when dealing with spaces in strings, arrays, complex functions, etc.
For that, I prefer a more powerful language, ideally scripting (so I can get a quick feedback cycle.) I've been playing with Perl lately (works very well), Ruby in the past. I've heard good things about typescript and go as well.
Perl works well for powerful scripts that don't need to be tested.
Ruby works well for programs (no longer scripts) that need to be tested.
For my build scripts, I enjoy hitting <tab>
for auto-completion of the goals. Bash does not offer that out of the box (but can be performed using programmable completion). Make, on the other hand, offers goal autocompletion out of the box:
.PHONY: build
build:
./gradlew build
Now, I can make b<TAB>
and it will suggest make build
Software es nuestra pasión.
Somos Software Craftspeople. Construimos software bien elaborado para nuestros clientes, ayudamos a los/as desarrolladores/as a mejorar en su oficio a través de la formación, la orientación y la tutoría. Ayudamos a las empresas a mejorar en la distribución de software.