“What the heck is this?”
“Oh my god, who should we tell?”
“Let’s run away and pretend we never saw it.”
Uncovering an ecosystem of spaghetti-code bash scripts supporting production systems is equivalent to finding a dead body while you’re walking your dog in a secluded corner of a large park. You know that you’ve found a big problem and while you’re sure that you’re not responsible for it existing, at that time it becomes your express duty to do something about it. However, your first instinct is to run away and erase the horror from your brain.
On the job, I have stumbled across ancient, undocumented, unmaintained, unmonitored scripts performing somewhat important operations. I’ve logged out immediately and looked over my shoulder to make sure that nobody saw me make the discovery. It genuinely evokes the same emotions as would coming across a crime scene. The correct DevOps approach to handling the excavation of these scripts will be the topic of an entirely different post; here I’ll instead focus on the naive programmer’s line of questioning, (which is always the same):
1. Who maintains this? OR *LinkedIn search for the purported author; realize he left ten years ago; proceed to question 2*
2. Why don’t we use Python in place of bash scripts?
When I was first starting out, there seemed to be literally nothing written about this dilemma. Why are so many core scripts written in bash? How many would be better off being written in python (or using some hybrid approach) instead?
The answer to the first question, as heretical as it may be in a workplace embracing DevOps practices, is that it is how things have always been done. Even if the scripts aren’t the nightmares alluded to above, at any company you can find a bundle of them that make you do a double take before eliciting a hearty chuckle. This isn’t to say that Python (or Ruby, or Perl, for that matter) is always the superior choice, but that alternatives are seldom considered as new developers are prone to following the footprints of those who came before them.
There’s the issue of maintenance. When faced with unplanned work pertaining to bash scripts, it’s easier to modify a few lines than approach the philosophical discussion “Why not Python?” Why take a script that gives you problems every few months and rewrite the majority of it in a new language when it will require attaining in-depth knowledge of the operation and its dependencies, lengthy discussion, and potentially hand-to-hand combat?
Below are some of the most popular arguments for each approach.
Ubiquity and Install Size
The general case for shell is that it’s ubiquitous on all Linux distributions. Python typically, but not necessarily, is. Then there’s the issue of Python versioning. A brand new box might have Python 3.6 while an old one might be running 2.6. The which language version question is one you’ll never have to ask relying solely on bash scripts. You’ll instead have to ask a different, way more frustrating one–which flavor of the shell are we working with and may some commands be executed differently? In addition, both languages are the same in that you’ll have to ensure that all modules/commands are installed. In the edge cases here, Python will be your friend because, once it’s installed, you’ve got a consistent experience everywhere (unless you’re using it to directly run shell commands, in which case it also carries the same risks).
Size of the Python install isn’t particularly relevant, but I suppose that when dealing with microcontrollers and embedded systems this can be a nontrivial concern.
In this category, there’s no clear winner, but bash will generally be less frustrating to deal with since you will generally encounter compatibility problems less frequently.
Here’s where it gets tricky. Maintainability is a measure relative to one’s own experience with a language and program’s purpose. Until it’s not. Once you try to get bash to perform black magic by mimicking language features that it doesn’t actually support (namely data structures & OOP), you’ve gone to far. Your code is a mess. I don’t care how brilliant it might seem, you should be supplementing with Python.
I sympathize with the authors who long ago first touched these bash scripts that became monstrosities. After years of other developers tacking on features, the benefit of using bash has been obfuscated. My general rule of thumb is that once you hit 100 lines of code, start parsing multiple curl requests, or have complex logical comparisons, it’s time to switch over to Python. More clearly: When it starts to look like a bastardization of Python, use Python.
As a less readable language, Bash is also less maintainable. It’s also an unstructured language, so “getting a gist” of what any program does is difficult. Each line can contort your face in a new way. Good Python code scarcely needs to be documented at all. Once you reach a certain experience level, looking through a new Python program becomes like reading a book; Speed-reading bash scripts always feels like translating hieroglyphics before they found the Rosetta Stone.
Ease of Use
Here’s the real contention point. The fact of the matter is that some things are just easier to do with shell scripting. Not only are the commands (for these operations) more compact and concise to execute in bash vs Python equivalents, but most programmers will understand the code more intuitively.
Bash is really good at 1. File I/O 2. Process wrapping and signal handling 3. Command chaining and glueing things together 4. Operations pertaining to the holy trinity (grep, sed, awk)
And a bunch of other things. Don’t get me wrong, it’s a useful language that has stood the test of time.
If you need to perform the above operations (without nerding out over what awk is truly capable of), a bash script will look cleaner and will probably contain less lines of code than an equivalent Python script. There’s an element of beauty to it.
Python is good when you need to do more. It’s a fully-featured language with an ecosystem of non-standard packages that reduce the complexity of many different operations. In addition, Python’s subprocess module can call bash scripts and other programs much as bash does naturally. So, theoretically, you can replace every command in your bash script with “pure” Python, but the code would look silly and everyone would hate you. This drives home the point that you should seek a balance instead of shunning one in favor of the other.
One of my personal examples is that when I am working with microservices returning structured data, it is torturous to use curl requests in bash versus Python’s Requests library. It’s so tedious that it’s very tempting to leave certain “fall through” cases unhandled, resulting in unsafe code and a few fun bugs-as-features. Handling all the different possible response codes, exceptions, and structured data in bash–the thought sickens me.
Say it aloud with me: “I don’t care.”
I’ve been thinking about it a lot anyway. When it comes down to it, Python is an interpreted language and there is going to be some startup overhead. Further, there are some operations in bash that are implemented in a way that is always going to be faster than when you call them from Python.
I haven’t benchmarked the performance and it doesn’t seem that anyone else has either. Fine-grained performance differences just aren’t particularly important in the realm of most functions that bash scripts are written to perform. My intuition is that sed, grep, and awk will make Python equivalents look silly when comparing runtimes, but there can exist operations that are implemented more efficiently in Python libraries.
A hybrid approach wins out at the end of the day. My opinion is that the “bash was made for this and Python wasn’t!” functions should be relegated to bash, while everything else should be performed by Python once a program exceeds a line or logical-operation count.