Monday, March 3, 2014

Uploading files the unix way

I just did something which I thought was worth sharing. It shows a couple of fundamental techniques in unix systems used in a context where most people would not think of.

The context: my mom sent me a recipe of potato souffle (I hope google translate is doing the right thing here, it seems like it is). But as most of e-mails coming from non-tech people, there was a lot of noise along the actual recipe text. Since I wanted to view it in my cell phone while I cooked, I decided to clean it up a bit.

So did I: copied the text to vim (which got rid of most of the noise immediately), edited a bit and it was ready:

        450 gramas: Batata cozida
        200 gramas: Queijo ricota caseiro – Guloso e Saudável
        4 colheres de sopa: Queijo gorgonzola
        3 colheres de sopa: Farinha de trigo branca
        300 mililitros: Leite desnatado
        2 unidades: Gema de ovo
        3 unidades: Clara de ovo
        1 colher de sopa: Fermento em pó
        1/2 unidade: Cebola ralada
        4 colheres de sopa: Salsa fresca cortada

    Modo de preparo
        Amasse os queijos, reserve
        Amasse as batatas, reserve
        Preaqueça o forno a 200ºC
        Bata as claras em neve com uma pitada de sal, reserve
        Numa vasilha misture as gemas, a farinha, o leite, a cebola, a batata, os queijos, a salsinha
        Junte delicadamente o fermento e as claras em neve
        Coloque o suflê de batata em ramequins e leve ao forno por 30 minutos ou até assar
        Sirva de imediato o suflê de batata com queijos, acompanhado de salada

Now all I needed was a way to get it on my cellphone. I could use dropbox or any of the related services, but that didn't seem exciting enough[1]. So I thought: well, I'll just put it in my webserver and access it in my mobile browser.

Fair enough. But this is a file opened in vim[2]. After a little thinking (which lasted for approximately 423ms), this is what I did. From inside vim, I executed the following command:
:!ssh raspberrypi 'nc -l 31415 > /srv/http/sufle.txt' &
Which launched an ssh client connecting to my webserver[3] running the netcat command. If you are not familiar with netcat, think of it as the traditional cat command which reads and writes to sockets instead of files. Here, I'm telling it to listen (-l) on port 31415 and output anything read to the file /srv/http/sufle.txt, where /srv/http is the document root of the webserver.

Since I added the ampersand to the end of the command, the command is launched in the background and just stays there waiting for input, releasing the shell immediately so I can continue using vim. Now that we're back, another vim command:
:w !cat > /dev/tcp/raspberrypi/31415
Here we are telling vim to write the contents of the buffer (:w). Normally, we would give it the name of a file to write, but the :write command accepts a special syntax, where instead of a file name, we put a ! and the rest of the line is interpreted as a command[4]. This command is executed in a standard unix shell, but with stdin redirected to read the contents of the buffer. As an example, try opening a new empty buffer in vim (c-w n), writing something and executing the command[5]:
:w !cowsay
Continuing the analysis of the command, we are using the :w ! command to write to the program run by the shell, which is
$ cat > /dev/tcp/raspberrypi/31415
Well, you don't need to be a unix wizard to know what this is doing. cat is a command that reads the contents from a list of files (or stdin if it's invoked without arguments, as it is here) and writes them in the same order to its stdout. Since its stdin will read the contents of the vim buffer, we already know the effect of this command: the contents of the buffer will get written to a file.

But what is that funny file starting with /dev/tcp/? That is feature of bash (I don't know about other shells). You can read about using the command[7]:
$ man -P 'less -p /dev/tcp' bash
What we are doing here is telling bash to write the data coming from cat's output to port 31415 on the host raspberrypi. Remember our little ssh friend we left running in the background a while ago? He's not done much since[8], but now there is data coming from the socket it's listening on, so the os wakes him up to do its job.

And so it does, reading the data from the socket and writing it to stdout. Remember what that is? That's the file we passed to the shell, /srv/http/sufle.txt. And so our long journey comes to an end. When nc finishes reading and writing the data, it dies[9] and so does the ssh client process, which was also sitting there, waiting for it's child process do die[10]. The shell that spawned the ssh client has long died: it had nothing more to do because we started the process asynchronously (using the & on the command line), so that's the end of that (process) family tree[11].

The result? The contents of the buffer have now been transfered and written to the file /src/http/sufle.txt on the remote host raspberrypi.


You might be thinking: "What the hell? You are telling me all this is simpler than dropbox?". And I would be lying if I said "yes". But the point here is not that this is simple. In fact, it's the base of one of the most complex systems in computer history: the operating system. And even though I have detailed some parts of the processes, there are a lot, and I really mean *a lot* more things going on behind the scenes here[12].

But if you step back, you will realize that all that was needed were two commands:
ssh raspberrypi 'nc -l 31415 > /srv/http/sufle.txt'
cat > /dev/tcp/raspberrypi/31415
What I'm trying to present here is the incredible complexity and richness that can be achieved by using shell commands and unix concepts like input and output redirections and socket communication. Another goal was to show how a powerful text editor incorporates this concept in its design, taking advantage of of the powerful features of the operating system to extend its capabilities[13].

And I bet I can type these two commands faster than your dropbox client can transfer the file to your cellphone[14]. Now, I you'll excuse me, there is a potato souffle that needs to be cooked.


While doing some scientific tests on the cowsay program (trying to make the cow say the text of this post), I discovered perl has a special mention of my blog on its source code[6]:
:w !cowsay
This shouldn't happen at /usr/share/perl5/core_perl/Text/ line 84, <stdin> line 63.

shell returned 255


1. What sane geek uses dropbox, anyway?
2. Actually just a buffer, since I didn't even write it to a file.
3. A rπ.
4. The space after :w is important here. If we wrote :w!cat ..., it would be interpret as the :write! command, which is not what we want here. See :help :write! on vim.
5. What do you mean "I don't have cowsay installed"?
6. You can check it here (thanks for their awesome service).
7. Don't worry, that will only open a man page passing a special flag to the pager telling it to go directly to the right section.
8. He hasn't been doing anything, actually, thanks to the magic of process scheduling and blocking io syscalls, so no resources are wasted (except for a few bytes on the process and file tables).
9. Ok, it terminates its execution, "dies" sounds so dramatic.
10. Just when you thought it could not get more dramatic.
11. Ok, that was on purpose =)
12. Including, for example, the whole file buffer cache algorithm that makes this whole process absurdly efficient, even though we seem to be copying data around dozens of times. That is the subject of whole book chapters (like this one).
13. My trademark phrase is: "do *that* in gedit".
14. After it has sent your data, maybe without using encryption, across the internet and sent a copy of the recipe to the NSA for analysis.