[1+1=2]

OneAndOneIs2

« Some more little tweaksMiss me? »

Sun, Sep 30, 2012

[Icon][Icon]To understand the command line...

• Post categories: Omni, FOSS, Technology, My Life, Programming, Helpful

...you must first understand Unix.

I gave another talk at my local LUG this week. The idea was to set the scene so I could then move onto more useful things like git, perl, javascript, etc.

I wanted to begin by getting everybody comfortable and familiar with the command line. The concept I wanted to put across was that using the CLI is like walking into the film 10 minutes before the end. It doesn't make sense because you don't know what lead up to it.

Imagine knowing nothing about Lord Of The Rings and then only being shown the ending. You'd see Sam & Frodo slogging their way up a mountain with the single goal of throwing a ring into it, for some reason. After almost killing themselves to get there, Frodo announces he's not going to do it after all. Then he vanishes. Then a little hairless freak runs in, floats around excitedly, Frodo reappears, baldie has the ring. He falls into the lava and the ring melts. For some reason, at the exact same time, a massive black tower with a flaming cat's eye on the top falls down; and the ground collapses, swallowing up the huge army of orcs but very conveniently not harming the small army of humans they were facing.

How much of that makes any kind of sense? None, it's nonsense. You can't grasp the meaning of any of these events without knowing the backdrop. And you can't understand why the CLI is the way it is today for the same reason. It was built by some amazingly clever people, and they didn't try and make it as obtuse and hard-to-understand as possible. Quite the contrary, they did their best to make it as sensible and intuitive as possible.

So in order to understand how the CLI is sensible and intuitive, it helps to understand how it got to where it is today. Hence my talk.

We began with my phone. A Samsung Galaxy S2, running the Ice Cream Sandwich version of Android. Very intuitive, very easy to use. Somebody with no experience can be handed an Android phone and be up & running with it in seconds. No manual to read, nothing to explain, they can just figure it out. And it looks nice, and it has pretty buttons and nice backgrounds... It has a very gentle, shallow learning curve.

This is good, because it means that you can go from being a novice to an experienced user in no time.

This is bad, because it means that the difference between a master and a beginner is almost nothing. There is very little power or sophistication in the Android interface - what you see it what you get, and there's very little you don't see.

You can't do clever and powerful things with it. You can only make phone calls and run apps with it. And all the apps are stand-alones. Except for a few apps that are specifically written to work with specific other apps, they're all self-enclosed.

Conversely, there is the CLI. This isn't pretty and you can't just sit a novice down and let them get on with it. They will get nowhere.

Given a bit of instruction and a few useful commands, they'll at least be able to get a few things done. But they'll still be just rank beginners barely scratching the surface of what can be done.

I've been using the CLI for years, as a hacker and a programmer, compiling the whole of Linux From Scratch from it and even writing my own applications for it. Yet I am still aware that I know far, far less than half of what there is to know. There are people out there who can make me look like a clueless amateur. And even THEY don't know half of what there is to know.

The command line has a steep learning curve, and it just keeps going - few, if any, people can really claim to know just about everything worth knowing. And even when you know all the commands there are and how to use them, you'll still never stop coming up with new and useful ways of tying them together, because CLI apps talk to each other.

Unix was invented in the 60s. It's considered arcane, complicated, inconsistent and unfriendly by many. And yet it, and its derivatives, are absolutely everywhere today - BSD, Linux, OS X, iOS, Android; it's on PCs, iPhones, and servers - whereas many 'friendlier' and allegedly-better alternatives have appeared and died out in the meantime leaving barely a trace. Why is that? And why does it have such weird names and jargon?

Well, for one thing, there were the notes from the initial design discussion.

At the end of the discussion, Canaday picked up the phone, dialed into a Bell Labs dictation service, and read in his notes. "The next day these notes came back," Thompson said, "and all the acronyms were butchered, like 'inode' and 'eyen.'"

Butchered or not, the notes became the basis for UNIX. Each researcher received a copy of the notes, "...and they became the working document for the file system," Thompson said.

So right from the start, there was weirdness, because the design document was full of typos and mistakes.

On the plus side, though, this did free the makers from the constraints of having to find a suitable English word for the completely new ideas they were implementing. This was liberating, in a way - it freed them to call their creations absolutely anything they liked.

UserFriendly.org

So, why did they chose the names they did? Why do we have a CLI populated with

  • awk
  • cat
  • less
  • ed
  • sed
  • grep
  • vi

Well, awk, I grant you, is an unintuitive name. It's simply the initials of its creators. That is not helpful. Fair enough.

'cat' to show the contents of a file.. well, speaking as a cat owner, I can say that a command that rips the insides out of something and shows it to you can appropriately be named 'cat', right enough. But in fact, 'cat' is short for 'concatenate' - which was a perfectly fair description of its original use, legitimately shortened because it's way too long to type out in full.

So why do you use 'less' to show a file's contents more interactively? How does that make sense?

Well, if you just 'cat' a file, then if it has more lines in it than can fit on your screen, you'll lose the top and be unable to read it. So they invented the pager, an application that would show you a screenful of text at a time. And to allow you to know that output was being paged, and how far through you were, and how much left there was, the pager had a prompt: At the bottom of the screen, it would display the word "More", and a percentage of how far through the output you were.

So it was logical enough that the pager would itself be named more. You can see the link.

But 'more' was limited. If you realised you wanted to go back to a part of the file that had already scrolled past, you had to re-run the command. An upgrade was needed, and a better pager was created: One that could scroll up and down; and handle useful things like searches. And the name? Well, it was like 'more' but it did more.. and as everyone knows, less is more. So 'less' was born.

'ed'? Well, you need an editor, you want the shortest possible name for it.. ed is a pretty good choice.

'sed' - once you have the concept of text streams, a need to edit them, and an editor called 'ed', what else would you call a stream editor?

grep. Now here's an interesting one.

If you use ed, you'll know that it doesn't show you the file contents unless you ask it to - it was written in the days before interactive displays, many people used ed with only a printer to output to.

And often you didn't want it to show you all lines - just a subset. So it could handle that. You could say "show me the first 5 lines" or "show me the next 8 lines" or "show me lines 100-110", and that was all fine.

But sometimes you wanted to say "Show me all lines containing x" where 'x' was a bit of text you were interested in. And ed could handle that, too.

To search in ed, you used /

So /foo would look for the next instance of 'foo'

But you didn't want the NEXT instance, you wanted EVERY instance. So you wanted a global search. That was g/foo

So far so logical?

Once you found all those lines, you wanted to see them. So you told ed to print them. So the command was g/foo/p - globally search for 'foo' and print the matching lines.

Superb. But ed was clever - it didn't just do literal string matching, it did proper regular-expression handling. So you could use g/^[0-9]/p to show you all lines beginning with a number. Or g/;$/p to show you all lines ending in a semi-colon.

So the "show me matching lines" functionality, which ed users made massive use of, could be summed up as "globally search for the regular expression and print it" - or, in ed format, g/regular expression/p. Or, to shorten it a bit more, g/re/p

Yup. Grep. That's where it came from - thank you, ed. When they needed an application to find lines that contained a regex within files, they used the familiar ed idiom. Any user at the time would have understood the application name being 'grep' because they'd have known ed.

And 'vi' - well, contrary to what some would say, it wasn't the sixth attempt at an editor made by somebody who liked roman numerals. It was ed's successor, ex, that took advantage of the newfangled monitors some people had and offered a visual, interactive mode. The shortest non-ambiguous contraction of 'visual' being, of course, vi.

Seems like early text editors had a surprising amount of influence on the rest of the OS, doesn't it? Does that surprise you? After all, in any modern OS, the text editor is a pretty small deal. Windows and Notepad, Ubuntu and gedit.. meh.

But back then, people *lived* in the text editor. Early Unix users did nothing but hack and write code, the editor was vitally important to them. In fact, it was as important as the OS itself:

"I allocated a week each to the operating system, the shell, the editor, and the assembler to reproduce itself...", Thompson explained.

Yep, the editor got as much time as the OS, the shell, and the compiler. As somebody who spends half his life in a vim session, I'm bang alongside the idea that the editor is a very big deal.

Something else that had long-lasting influence over software we still use today was the ADM-3A terminal. If you ever wondered why vi uses 'hjkl' as left,down,up,right - this is why.

It's also a good illustration of the distinction between 'easy' and 'efficient' - the modern cursor key layout, with the 'up' in front of the 'down', is obvious, intuitive, and easy to master. No argument. But it's not the most efficient layout - it's not possible to keep all four fingers on the four navigation keys - you have to settle for three fingers and move them around. And if you're typing and suddenly need to move, you have to move completely away from where you're typing and switch to the cursors.

hjkl answers both issues - the four keys in a line make it possible to keep one finger on each and so move about as quickly as possible, without ever having to move from the text-entry part of the keyboard to any other. It's not as easy, not as obvious, but it's quicker and more efficient when you're used to it. Even on my modern keyboard, I go for hjkl far more than I reach for cursors.

Look closely at the ADM-3A's keyboard, you'll see the arrows on the hjkl. You'll also see that the 'home' key doubles up as the ~ key. If you ever wondered why 'cd ~/' was how to get back to your home directory - this is why. On modern hardware it makes no sense at all. On that old machine, 'ls ~/' to list your home directory was completely intuitive and memorable.

Whilst we're at it, do you know why 'ls' shows less files than 'ls -a'?

Yes, because files with names beginning with a '.' don't get displayed. Dotfiles are hidden files.

Why? We set files to be readable, writable, or executable via file permissions. Why do we set their visibility with a leading period, instead of via "chmod"?

Well, according to Rob Pike, because:

Long ago, as the design of the Unix file system was being worked out, the entries . and .. appeared, to make navigation easier. I'm not sure but I believe .. went in during the Version 2 rewrite, when the file system became hierarchical (it had a very different structure early on). When one typed ls, however, these files appeared, so either Ken or Dennis added a simple test to the program. It was in assembler then, but the code in question was equivalent to something like this:
if (name[0] == '.') continue;
This statement was a little shorter than what it should have been, which is
if (strcmp(name, ".") == 0 || strcmp(name, "..") == 0) continue;
but hey, it was easy.

Two things resulted.

First, a bad precedent was set. A lot of other lazy programmers introduced bugs by making the same simplification. Actual files beginning with periods are often skipped when they should be counted.

Second, and much worse, the idea of a "hidden" or "dot" file was created. As a consequence, more lazy programmers started dropping files into everyone's home directory. I don't have all that much stuff installed on the machine I'm using to type this, but my home directory has about a hundred dot files and I don't even know what most of them are or whether they're still needed. Every file name evaluation that goes through my home directory is slowed down by this accumulated sludge.

I'm pretty sure the concept of a hidden file was an unintended consequence. It was certainly a mistake.

It was unplanned functionality added by mistake, because it was quick & easy. It's also a modern-day standard that we're stuck with. That's the law of unintended consequences for you.

So that's a whole lot of the apparent weirdness of Unix covered. But none of it really explains why Unix was invented in 1969 and is still in widespread use today. What's the big secret? What's the source of Unix's flexibilty? The origin to the Unix philosophy? The key to unlimited power?

It's this: |

The pipe symbol.

The creators of Unix went home one evening, and came back the next morning to find Thompson had put pipes into everything.

"Thompson saw that file arguments weren't going to fit with this scheme of things and he went in and changed all those programs in the same night. I don't know how...and the next morning we had this orgy of one-liners."

"He put pipes into UNIX, he put this notation into shell, all in one night," McElroy said in wonder.

In Unix, all 'core' applications can take text in, and output it. Just like a real pipe, streams flow in and flow out - but in Unix, streams are text, not water.

So you take your raw data, and you can pass it to an application that could filter it, and to another that could transform it, and then onto another, and another...

This lead to the idea of applications to perform common tasks, that you passed the stream through. And so was born the Unix philosophy, 'Write programs that do one thing and do it well. Write programs to work together. Write programs that handle text streams, because that is a universal interface.'

For example, every web browser, every word processor, and many other applications we all use daily, have a 'find' functionality. You press Ctrl-F and a find dialogue box pops up. Then you can search for a bit of text. Great.

But that find functionality is a tiny and fairly trivial part of the big package. Very little development time is put into it. It's not that efficient, because, frankly, who cares if it takes ten milliseconds instead of two to find a bit of text on a page?

Conversely, grep does absolutely nothing but find matching text in a file. It does so in very clever ways, because many clever people have worked to make it better over the years. Because grep is used by many people in many situations; it's plumbed into many Unix pipes; it's an important and widely-used application.

And the bonus is, if somebody makes grep better, then all the applications and commands that you've built it with get better too. Because the command-line is so modular, and so many applications are built out of the same building-blocks, it's worth making each module as good as it can be, and everybody benefits from this.

So without having to resort to writing a single line of code, you can create all kinds of useful applications. Examples?

To edit the most recently-modified file in the current directory:

  • "ls" - shows you the files
  • "ls -t" - shows you the files in order of modification
  • "ls -t | head -n 1" - shows only the first entry in the file list
  • "vi `ls -t | head -n 1`" - opens the first entry in the vi editor

To open any file that contains the string "fubar" in the current directory or its sub-directories:

  • "grep fubar . -r" - recursively grep for files containing 'fubar'
  • "grep fubar . -rl" - only output the filenames, not the matching lines
  • "vi `grep fubar . -rl`" - open the matching files in vi

And to make that a simpler command to type, in your .bashrc add:
function vig { vi `grep $1 . -rl` }
and you can now just run "vig fubar" to get the same effect.

So without writing a single line of C or any other coding language, you have created a new application, indistinguishable from any C-coded, binary-compiled application.

And as observed before, if somebody else makes "grep" better, your "vig" command will improve as well!

And that's the power inherent in Unix that leads to it being alive and well and powering phones, computers, and servers all around the world several decades after it was created.

Seems such a simple and obvious idea, doesn't it? But as Terry Pratchett observed, "This man had invented the ball-bearing, such an obvious device that no one had thought of it." - the best ideas are the ones that, after being discovered, everyone believes were totally obvious.

And I hope this post has in some way contributed to your belief that Unix and the CLI is, after all, an obvious invention.


27 comments

Kyrall
Comment from: Kyrall [Visitor]
This should be in the GCSE ICT syllabus!
01/10/12 @ 10:12
Chandan
Comment from: Chandan [Visitor]
Pretty interesting to know the backgrounds of these commands :)
01/10/12 @ 16:51
Dion Moult
Comment from: Dion Moult [Visitor] · http://thinkmoult.com
An excellent post! I didn't know about any of this stuff and it was very interesting to find out.

Time to add it to my list of trivia to talk about when the going is slow.
01/10/12 @ 16:55
Roy
Comment from: Roy [Visitor]
Excellent excellent article. The "Wintendo" generation skipped this class during computer fundamentals. We can't know everything but background always helps.....so does knowing assembler.
01/10/12 @ 17:30
William Payne
Comment from: William Payne [Visitor] · http://about.me/william.payne
The modularity and composeability of unix is great. In fact, it is such a great idea (and forehead-slappingly obvious in retrospect), that I continue to be surprised that other operating systems do not try to emulate it to a greater extent.

Equally astonishing: the steepness of the unix learning curve is so well known and so well understood to be a significant barrier to adoption that I am surprised that no significant attempt has been made to improve consistency and discoverability for command-line applications.

For example, the inclusion of a set of a few dozen worked example command-line scripts (as standard) would help open things up considerably. Communications and tutoring are as much part of good programming practice and systems design as type-checking and unit-tests.
01/10/12 @ 18:31
jonathan
Comment from: jonathan [Visitor] · http://mouseroot.mousetech.org
Awesome post,learned some things about unix today :D
01/10/12 @ 18:44
Paul Ward
Comment from: Paul Ward [Visitor]
A most excellent article. I've been computing for close to 30 years now and loved DOS. I'm no longer very good with the command line, but still use it and love it's power. Your article brings to light the 'why' it's this way. Thanks for that.

I get so tired of hearing how *nix is so arcane because newcomers don't want to invest the time to learn how to use the cli.
01/10/12 @ 19:08
Jason
Comment from: Jason [Visitor]
Aaaaaaaaaaaaaaaaah, why did you have to tell me the ending of Lord of the Rings?

(Good article)
01/10/12 @ 19:31
Mayank
Comment from: Mayank [Visitor]
Thanks for sharing.
01/10/12 @ 19:50
cirrus
Comment from: cirrus [Visitor] · http://cirrusminor.info
Thank you author , tremendous read.
02/10/12 @ 00:30
Ian
Comment from: Ian [Visitor]
Wow! What an interesting article. We live and learn.
02/10/12 @ 00:36
Clayton
Comment from: Clayton [Visitor]
Excellent overview. I'd recommend $(expr) over the `expr` syntax, though. It's more sanely nestable.
02/10/12 @ 02:11
Micky
Comment from: Micky [Visitor] · http://www.pytania.biz
Thank you very much for a very interesting article! I learned a lot.
02/10/12 @ 07:26
Ivan
Comment from: Ivan [Visitor]
Unix is not composable or modular. It was all they could do in 1960s. Things have moved on since then, at least research wise.

That we are still using Unix clones and derivatives is sad. The whole selection of OS-s we have to choose from right now is pretty sad really, what with Windows, Linux, BSDs, Android, iOS - all variations on a theme with a slapped on poorly thought out graphical interfaces rooted in the same desktop metaphor. Nothing is integrated, nothing is transparent. It's just layers of crap. And things get worse when you consider all the web related crap on top of that.

And the Unix command line is horribly designed. Just about any modern programming language can serve as a better command line, from a technical perspective.
02/10/12 @ 09:18
pjmlp
Comment from: pjmlp [Visitor]
Nice article, however I don't agree with the title.

There are a few things modern CLI take from UNIX, but when UNIX was created other systems also enjoyed CLI environments. Actually they were the only way to do computing back then.

So many of us old timers, got to know many other CLI environments besides the UNIX one.
02/10/12 @ 09:34
Rohit
Comment from: Rohit [Visitor] · http://java67.blogspot.tw
Great post. Understanding of UNIX will always help, its there from almost 30 years and good for another 60 years.
02/10/12 @ 10:13
Martin Cohen
Comment from: Martin Cohen [Visitor]
Two relevant references: "In the Beginning was the Command Line" and "The Unix-Hater's Handbook". Both are fun reads, and the second is available on the web.
03/10/12 @ 02:11
Szymon
Comment from: Szymon [Visitor]
Excellent!! Thank You for this post.
03/10/12 @ 12:12
spinn
Comment from: spinn [Visitor]
I am amazed to learn that it never even occurred to me to wonder where the name "grep" came from.
03/10/12 @ 20:47
eMBee
Comment from: eMBee [Visitor]
quoting pjmlp: "Just about any modern programming language can serve as a better command line, from a technical perspective."

when doing one-off complex data manipulation i prefer to open the read-eval-print-loop of a programming language like pike, python or even lisp instead of doing it in the shell. but then, my interest is to sharpen my programming skills and not my shell skills.

but more power also means more complexity.

for simple transformations the shell can be easier.
07/10/12 @ 06:11
Excellent stuff but these reason doesn't say anythign about two letter commands like cd, ln, ls etc
08/10/12 @ 05:23
oneandoneis2
Comment from: oneandoneis2 [Member] · http://geekblog.oneandoneis2.org/
cd = change directory
ln = link
ls = list

There really isn't that much interesting to say about them, tbh..
08/10/12 @ 12:29
Guru
Comment from: Guru [Visitor] · http://www.theunixschool.com
Very nice article, a very good read.
09/10/12 @ 08:12
Rahul
Comment from: Rahul [Visitor] · http://horadecubitus.wordpress.com
"hjkl answers both issues - the four keys in a line make it possible to keep one finger on each"

except that if you're a touch typist, you don't do that. You keep your four fingers on "jkl;", and move your index finger to type "h". In fact the little bump on "j" is to remind your index finger to be there.

(Some older Apple computers had the incredibly annoying quirk of having the bumps on D and K rather than F and J -- I just could not type on those keyboards.)
10/10/12 @ 09:56
Carlo Sciolla
Comment from: Carlo Sciolla [Visitor] · http://skuro.tk
Being a 100% CLI addicted, it was both entertaining and instructive to read some juicy historical details, especially grep naming which I totally ignored.

Thanks for sharing!
17/10/12 @ 13:14
Steve
Comment from: Steve [Visitor] Email
If you want a good description of this philosophy, check out "Software Tools" by Brian W. Kernighan and P. J. Plauger.

One of the most important things not mentioned is the shell is just another program. If you do not like the syntax, write your own. I use tcsh because I like some of its features. When Unix was introduced to the company I work at in 1981, we set up the support people with menunix. It is a menu based shell to do things like backups and add users. The commands like ls, cat and vi are just programs that any shell or other program can startup as needed.

There are several shells available for Linux/Unix.
17/10/12 @ 18:21
John Paynterlee
Comment from: John Paynterlee [Visitor]
Excellent article.

The command line is truly control of the computer, and letting the computer do repetitive work. Many, if not most, Linux GUI programs also offer command line control, so if there are repetitive processes that follow one another, the Command Line is a very high level recordable "macro" language available to the user. Write once, test step-by-step, bring all the steps together in a file, execute and walk away. Works the same every time. Very comforting.

Everything command line saves time, and increases result accuracy. Of course a GUI saves time when "learning" to use a program, but the CLI frees the user from the work flow as designed by the programmer.

In other words a program could, in theory, have many GUI interfaces, like "skins," yet work with the same underlying program.

Smaller programs, better programs, less memory conflicts at the OS is the result of CLI.

I often regret that GUI programs (that have CLI methods) don't output the CLI commands they use to make a process work.
21/01/13 @ 15:51
 

[Links][icon] My links

[Icon][Icon]About Me

[Icon][Icon]About this blog

[Icon][Icon]My /. profile

[Icon][Icon]My Wishlist

[Icon]MyCommerce

[FSF Associate Member]


November 2014
Mon Tue Wed Thu Fri Sat Sun
 << <   > >>
          1 2
3 4 5 6 7 8 9
10 11 12 13 14 15 16
17 18 19 20 21 22 23
24 25 26 27 28 29 30

Search

User tools

XML Feeds

eXTReMe Tracker

Valid XHTML 1.0 Transitional

Valid CSS!

[Valid RSS feed]

blogging tool