The Woodshed

Behind here, no one can hear you scream

Prompting Expletives

Ah yes, that New Blog Smell. And what better way to start off a blog than a literate Bash program?! (Don’t answer that.)

Why program in Bash? For the same reason people climb a mountain: because it’s there.

No, seriously, that’s the only reason to program in well, lots of languages, but especially one of the venerable shell scripting languages that lurk below the surface of the world’s popular operating systems. It’s easier to avoid these languages than, say, JavaScript, but inevitably at some point in your career, the circumstances will arise that you have to create, adapt, debug, or maintain a shell script.

One such circumstance is when you want to customize some part of the shell itself. At some point this week, I embarked upon the foolish mission to customize my Bash prompt in a non-trivial way. Bash provides two predefined ways of displaying the current working directory in your prompt: the full path, or immediate directory only. Both of these options bug me, so I finally decided to do something about it.

Bash 4 provides an environment variable that controls automatic abbreviation of paths over a certain depth. However, I happened to be on a Mac, and OS X Mavericks includes Bash 3. (I also wouldn’t be surprised if many servers I SSH into also run older versions of Bash.)

So off to the interwebs to find someone who’d already come up with an acceptable solution. I found something pretty close, but it still required adaptation (and a surprising amount of required learning). Below you’ll find the code, along with comments on the horrors I had to endure.

Things started innocently enough. My Mac has a rather long name, so I wanted to abbreviate it:

abbr_host=$(echo "$HOSTNAME" | sed 's/\([[:alnum:]]\)[[:alnum:]]*/\1/g; s/\..*$//')

The $() syntax was new to me (see how well I’ve avoided learning Bash?), but I learned that it’s almost-but-not-quite the same as the backquote syntax (which I was familiar with). Random Internet strangers are of the opinion that $() should be preferred to backquotes, so here we are.

Since I’ve used Perl and Vim a decent amount, sed’s s/// syntax was familiar enough. Unfortunately the version on my system didn’t recognize the \w regex construct, so I had to use [:alnum:]. I suffered some mild surprise that one set of brackets around :alnum: was insufficient—this appears to be parsed as the character class consisting of any of the letters a, l, n, u, m or the colon.

Then I decided that always abbreviating the machine name would look funky with short names. So I wanted to make the abbreviation conditional; I decided on “abbreviate only if the name contains a hyphen, underscore, or space.”

A quick search revealed that Bash does support matching strings against regex patterns! The bad news is that doing so is fraught with peril to the uninitiated. I spent way too much time getting this part working:

abbr_host=$(echo "$HOSTNAME" | sed 's/\..*$//')
abbr_pat='[-_ ]'
[[ "$abbr_host" =~ $abbr_pat ]] && abbr_host=$(echo "$abbr_host" | sed 's/\([[:alnum:]]\)[[:alnum:]]*/\1/g')

The abbr_host= parts are the same as above, just split into two lines. The weird bit is the double-bracket conditional.

When you are learning shell scripting, you pretty much get the adage “quote all the things!” pounded into you. This is because the Bourne shell and its descendants have some absolutely retarded rules regarding “null” values. For example, in (ba)sh, there is a command named [ that is commonly used for conditional statements. Yes, it’s a command, it’s not actually syntax. (In fact, it’s a synonym for another command named test.) This command accepts a variable number of arguments, and it tries to assemble these arguments into something resembling a conditional expression. This is a silly way of doing things on its own, but it gets worse when combined with how the shell performs expansion and substitution.

To illustrate, if you pass the [ command the argument -n and a string, it’s supposed to test whether or not the string is empty (zero-length). Specifically, it gives a false result if the string is empty, and a true result otherwise:

[ -n foo ] # => true
[ -n "" ]  # => false

One might think to use this to test whether or not a variable has a value. (Warning: incorrect code follows)

if [ -n $special_instructions ]; then handle_specially; else handle_normally; fi

Written in that way, handle_specially will always be run. If $special_instructions happened to have a value, it works as expected. If it didn’t, however, the shell removes the variable reference entirely before calling [, so what gets executed is [ -n ]. And if we check the manual, when [ is called with one argument, its behavior is to give a true result if that argument isn’t null (and the string -n isn’t null—it’s a string of length 2). Cool, huh? Yeah.

The solution is to put double (not single, God help you) quotes around the variable reference:

if [ -n "$special_instructions" ]; then handle_specially; else handle_normally; fi

This works how you wanted; if $special_instructions is null/empty, its reference still gets removed by the shell, but it leaves the resulting "" (which the manual calls an “explicitly null value”). Thus [ still gets two arguments, and will check to see the first arg is -n, and then know to check the second arg for nullness.

That was a long aside, but it’s necessary to understand the above to appreciate the irony of the next point. Going back to the code:

[[ "$abbr_host" =~ $abbr_pat ]]

This amazingly does not work right if you quote the pattern variable. Gotta leave it unquoted, despite the extremely numerous cases where leaving variable references unquoted can burn you. If you see him, feel free to ask Stallman why this has to work this way (be warned: he’ll try to distract you by changing the subject to copyright law or something).

Wow, I’ve written a lot of words and didn’t even get to the frustrating part of the code. Guess this just became a multi-part article.