logo


Pipes and useless cats

I love me some Unix command pipes:

$ cat /some/file.txt | sort | head

Pipelines let you chain together multiple commands to manipulate data flows. Pipes are not only useful as a data filtering mechanism, but when combined with tools such as cut, awk and sed can also be used for projections and transformations. The Unix pipe, while simple in concept, is a sophisticated shell construct and one big reason why Unix shells are to this day a popular tool in a programmer/system administrator/data scientist’s toolkit.

So why am I sitting here telling you something that you already know? Fair question - to answer that let’s take another look at that command:

$ cat /some/file.txt | sort | head

While shell pipelines are great, we have a subtle problem here - and it’s something that’s known as a useless cat. No, I don’t hate cats - this expression harks back to the old usenet days where a forum member of comp.unix.shell would write a weekly post where he would highlight a redundant use of the cat command.

So why is the above command useless? Because sort can take one or more files as arguments, much like the majority of Unix commands. So this command can be rewritten as:

$ sort /some/file.txt | head

Removing cat from the equation means that we’ve reduced the number of processes that need to execute, and cut down on the buffering and data copying that the shell needs to do to make pipelines work - a win-win.

In fact cat really doesn’t have many uses - if you need to view the contents of a file you’re better off using vi or less, and otherwise most Unix commands can directly work with files.

So next time you’re about to run a cat command - think about whether or not you need it, or whether you’re just perpetuating use of the useless cat!

About the author

Hadoop in Practice, Second Edition

Alex Holmes is a senior software engineer with over 15 years of experience developing large scale distributed Java systems. Since 2008 he has gained expertise in using Hadoop to solve Big Data problems across a number of projects. He is the author of Hadoop in Practice (first edition, with second edition currently in the early access program), a book published by Manning Publications. He has presented at JavaOne and Jazoon.

If you want to see what Alex is up to you can check out his work on GitHub, or follow him on Twitter or Google+.

comments powered by Disqus

RECENT BLOG POSTS

Full post archive