Sunday, March 12, 2017

~/music

For too long my musical (if you can call them that) experiments have been scattered all over the internet.  Not anymore!  For those who, for whatever reason, find themselves on my website, there's now a music section.

There, one can find a list of all my performances (yes, even the bad ones), with videos and scores. The scores are as true to the original as they can be, and come in pdf and "source code" (musescore: if you don't know about it, you should) form.

I have some other plans for that section (and others), but this is what I was able to put together this weekend. There may be more updates in the future.

Enjoy.

Saturday, August 20, 2016

College research projects

A long time ago, when I was in college, I got involved in two research projects.

One of them involved integrating a home-(or, I should say, college-)grown spam filter based on artificial intelligence concepts with the traditional postfix mail delivery system.  That was a lot of fun and would later pave the way for my bachelor thesis, which you can find more about here.

The other one, which is the topic of this post, was a computer graphics research project.  The ultimate goal of this project was to develop a organ-surgery simulation system.  The team used a haptic feedback device (very much like this one:


to simulate liver surgery.  A computer model of the organ was displayed and could be interacted with in 3-D using the device.

My first task when I joined the project was to develop a small part to be later incorporated into the system: the suture string.  This would interact with the organ model and simulate stitching as in a real-life surgery.

To make a long story short, my involvement with the project was terminated earlier than it should (because of external reasons, nothing serious) and the code for the simulation hung around in my hard drive for years.  I always wanted to do something with it, because even at the rough stages it was in, it was already very cool (in my opinion =).

I recently revived it and finally got it to a stage where I feel comfortable sharing it with the world.  To be completely honest, I'm publishing it because I have wasted too much time playing with it.  It's really addictive.  Anyway, the entire code base is under a copy-left license (GPLv2), so you can take a look, experiment, do whatever you want with it:

https://github.com/bbguimaraes/college/tree/master/sms

Finally, here is a short demo:


Tuesday, January 20, 2015

2014 books

I read a lot of books. Last year I had some money laying around and decided to put it to good use. I ended up buying fourteen books from O'Reilly (simply the best publisher in the technology field) on May and finally finished reading them all (exception bellow) on the first days of this year. This a quick and informal review of each of them.

As you'll notice, the scheduling algorithm used was an optimistic, little-endian, depth-first, heuristic search: I always grabbed the book which seemed the smallest from the pile of unread books and read it cover-to-back.




tmux: Productive Mouse-Free Development


This is a really thin book. I struggle to even call it a book. It is meant as a quick introduction to using tmux. I had been using it for a while and the book seemed interesting, so I gave it a shot. It certainly didn't let me down.  It delivers on its promise to give you the essentials to integrate this wonderful tool on your current workflow and it's certainly worth its price (which is as small as the book itself).


The Cathedral & the Bazaar


I don't think this book needs any introduction or even a review at all. If you've ever been in contact with any kind of free software (and you have), read it.


The Developer's Code


This is one of those books about programming that have very little code in it.  I usually don't like them, but this one was certainly worth the read. An amusing look at software development as we all know it, from the perspective of a real developer. I found it interesting that I had already developed many of the techniques presented on the tips, but did so in a very empirical way. It would certainly be nice to have read this book when I was starting my career.


21st Century C


Even though I don't work with it in my current daily job (and haven't worked with it for a long time), c is a language I am very interested in. First of all, we can all argue whether there is a god, but one thing is certain: if there is one and he created the universe, he wrote it in c. Aside from that, it is the basis to pretty much everything related to computers. It always amuses me to see people dog on c without realizing their favorite language's environment is most certainly written in c.

Anyway, this book is a must-read for anyone working with c that hasn't followed recent developments on the language and the tools around it. From the first steps to automating portable compilation with autotools, this book teaches what you need to learn (and specially the parts you should forget) about this venerable language.


Test Driven Development for Embedded C


This is one of the "weirdness-coefficient" books. I work with TDD, but in a language that is very far from c (python) in an area that is very far from embedded systems (web). I always wondered how, and even if, it could be applied to these types of environments, so I decided to get this book. To my surprise, it was one of the best texts I have read about TDD and software testing in general. Not to say the "embedded" part was overlooked. If you work with TDD and specially on embedded systems, I'd say this is an obligatory read.


Practical Vim


I will take two excerpts from Tim Pope's foreword as my review:
Practical Vim tips teach lessons in thinking like a proficient Vim user. In a sense, they are more like parables than recipes.

It is for this reason that I am excited about the publication of Practical Vim. Because now when Vim novices ask me what's the next step, I know what to tell them. After all, Practical Vim even taught me a few things.

Apache Cookbook


Not much to say here. This is a pretty good reference to anyone working with apache. Well worth the read.


The Modern Web


We all have to work on things we hate sometimes, and to me that is when I have to mess with the front-end of web development. But not liking it doesn't mean I can choose not to work with it.

Ranting aside, this is a great book to get up to speed with current and future areas of modern (front-end) web development. It covers everything you need to know about css3, html5 and new javascript API's.


Understanding Computation


This book's synopsis made me add it instantly to the shopping cart and I don't regret it. A great overview of the theoretical aspects of computation. If you liked studying Turing machines in college, this book is for you. Also, a great display of one of the most elegant pieces of code I have ever seen (specially in a book).


Designing for Behavior Change


This was a big surprise. I wanted a book on software architecture and this book's title was very inviting. I saw the five-star reviews and ordered it without even reading the introduction. If I did, I'd seen it is nothing even close to what I was expecting. It even explicitly states that there will be little to no code on the book.

This is a book about software design and (human) behavior analysis. It presents a psychological study on the characteristics of software that can aid in (human) behavior change. I would never buy this book consciously, but I have to say it was a pretty interesting read, if almost completely unrelated to what I work on.


Metasploit: The Penetration Tester's Guide


Software security is one of my favorite areas. This book was recommended by so many people, I had to buy it. It is a great overview of this penetration testing tool. Recommended.


Hacking: The Art of Exploitation


As I said in the previous review, I really like software security, but there were still many "magical" parts I didn't understand well enough. This book tackled them all and much more. I would go as far as to recommend it to anyone that uses a computer, as it explains in detail how a system actually works beneath all the abstractions we use in software development.


Physics for Game Developers


I always loved math and physics since school. In computers, these are mostly used on simulations and I never had the chance to work with anything related, except on some classes in college and on a computer graphics research project.  Nevertheless, I play a lot with these things on my free time.

This book is a fun (if you, like me, think physics is fun) walkthrough of the application of physics in computer simulations (and games, obviously). I really enjoyed it and would recommend it to anyone interested in the area.


The Linux Programming Interface


When I said I read all the books, I obviously lied. As Boromir would gladly tell you, one does not simply read The Linux Programming Interface. First of all, I can't just put in in my backpack and take it out on the bus to read.  Someday I'll definitely read it, but for now it serves both as sporadic reference material and deeply intimidating table ornament.





So there you go. Now I'll be back to "normal" books for a while as my book pile has been growing steadily but surely these seven months. The first one on the list is Jane Austen's Pride and Prejudice. As for technical books, I still plan to finish Structure and Interpretation of Computer Programs, which I was very surprised to find in my college's library. I have been reading and practicing the exercises for almost a year (you can check it on github) and still haven't made it past section 2.3, but it is one of the most interesting programming books I have ever read.

After that, who knows. I may buy some more books. That depends a lot on my country's financial situation, which is nothing but a (downward) roller-coaster at the moment. To list some of the books I'm interested in: Pro Git's new edition, The Art of Memory Forensics (another O'Reilly book) and The Architecture of Open Source Applications.

Thursday, December 11, 2014

Calculating π the unix way

I nice trick I learned a while ago that is worth sharing: calculating pi the unix way (you know, on the command line, with pipes, as god intended).

I would like to give credit to the original source of this command, but I just couldn't find it. It was some of those "shell one-liners" you see on hacker news five times a day, except I didn't know half of them. The most interesting was a semi-cryptic command line with a pretentious comment besides it:

    # calculates pi the unix way

I remember as if it was today how puzzled I was by that line. As I said, I didn't know much of the incantations on that list, but this was by far the most magical. The line goes like this:

    $ seq -f '4/%g' 1 2 9999 | paste -sd-+ | bc -l

If you like a challenge (as I do), try to figure it out by yourself. A shell and the man pages are your best friends.


seq

If shell (or python) wasn't your first programming language, you were probably surprised by the way loops are done. It usually goes like this:

    $ for x in 1 2 3 4 5; do echo "$x"; done
    1
    2
    3
    4
    5

If you have a little experience with shell, you probably learned there is a more idiomatic way of doing this using the seq command and some shell voodoo:

    $ for x in $(seq 1 5); do echo "$x"; done

And if you were truly initiated on the dark arts of bash programming, you probably know this is functionally equivalent to this:

    $ for x in {1..5}; do echo "$x"; done

I won't explain how shell command substitution works, suffice to say seq is a nice utility to generate sequences (get it?) of numbers. From the first lines of the man page:

    $ man seq | grep -A 3 SYNOPSIS
    SYNOPSIS
           seq [OPTION]... LAST
           seq [OPTION]... FIRST LAST
           seq [OPTION]... FIRST INCREMENT LAST

So the main part of the first command on the pipe is no magic: we are generating numbers from 1 to 9999 with a step of 2:

    $ echo $(seq 1 2 9999 | head -5)
    1 3 5 7 9

There is a useful option to this command to control how the value is output:

    $ seq -f '%02g' 1 3 10
    01
    04
    07
    10


Programmers familiar with c will recognize the printf format string. Moving down the pipe... 


paste

There are some commands that do something so simple they seem almost useless:

    $ whatis -l paste
    paste (1)            - merge lines of files
    paste (1p)           - merge corresponding or subsequent lines of files


Nothing really interesting here, right?

    $ paste <(seq 1 3) <(seq 4 6)
    1       4
    2       5
    3       6

    $ seq 1 6 | paste - -
    1       2
    3       4
    5       6


Well, that is interesting. What if we play with the other options?

    $ paste -sd, <(seq 1 3) <(seq 4 6)
    1,2,3
    4,5,6
    $ seq 1 6 | paste -sd,
    1,2,3,4,5,6


This simple command is starting to show complex behavior. Maybe there is something interesting in those old unix books after all... Wait:

    $ seq 1 6 | paste -sd+
    1+2+3+4+5+6


Nice, a mathematical expression. If only we had some way of interpreting it...


bc

There are people who say: the python/ruby interpreter is my calculator. To that I say: screw that!

    $ bc -ql
    1 + 2 + 3
    6
    10 / 12
    .83333333333333333333
    scale = 80
    10 / 12
    .8333333333333333333333333333333333333333333333333333333333333333333\
3333333333333


Do you see that `\` character? It's almost as if it was meant to be used on a shell...

    $ seq 1 6 | paste -sd+ | bc -l
    21


Interlude: Gregory-Leibniz

There are many ways of calculating π. You can find many of them on its wikipedia page. One of them is named after two mathematicians, James Gregory and Gottfried Leibniz, goes like this (again from wikipedia):



This is an infinite series with a simple pattern, which I'm sure you can identify (you weren't sleeping on those calculus classes, were you?). Just in case you can't (and because it is a pretty equation), here it is:



Back in unix-land

So here is the challenge: how can we generate and evaluate the terms of this series? Generating each term, without the sign, can be done easily with seq and a format string:

    $ seq -f '1/%g' 1 2 9
    1/1
    1/3
    1/5
    1/7
    1/9


Remember our useful-where-you-never-imagined friend paste?

    $ seq -f '1/%g' 1 2 9 | paste -sd-+
    1/1-1/3+1/5-1/7+1/9


This may take some time to understand, it's ok. Read those man pages! But once you understand, the only thing left is to evaluate the expression:

    $ seq -f '1/%g' 1 2 9 | paste -sd-+ | bc -l
    .83492063492063492064


Hmm, not much π-like, is it? Right, this is π/4. Ok, we can rearrange the terms a bit to fit our tools (that is the essential hacker skill). Lets move the denominator on the right side to the numerator on the left.

    $ seq -f '4/%g' 1 2 9 | paste -sd-+ | bc -l
    3.33968253968253968254


That's more like it! As any infinite series approximation, we can increase the number of terms to increase accuracy:

    $ seq -f '4/%g' 1 2 9999 | paste -sd-+ | bc -l
    3.14139265359179323814


Now just for the heck of it:

    $ seq -f '4/%g' 1 2 999999 | paste -sd-+ | bc -l
    3.14159065358979323855


And there you have it. Enjoy your unix π.

Wednesday, December 10, 2014

pip --extra-index

This is a tale of debugging. That fine art of digging the darkest corners of a computer system to solve whatever problem is haunting it. This particular story is about python's package manager, pip.

Where I work, we have a server where we store all python packages used in development and production. A package "cache" or "proxy". The idea is similar to a http proxy: we don't have to hit PyPi for each package query and install.  That saves time, as local connections are much faster, both regarding latency and throughput, and bandwidth, as no packet has to leave our lan.

One day, all of a sudden, our testing and production servers started taking a long time to run a simple package update. And when I say "a long time", I mean taking more than ten minutes to run a simple `pip install --update` with roughly fifty packages on a requirements.txt file. That is a ridiculously long time. That would be crazy slow even if we were hitting the wan, but on a lan, that is just absurd. So, clearly, something fishy was going on.


Debugging begins

Step one when debugging an issue: figure out what changed. I thought long and hard (harder than longer, I must admit) about it, but couldn't think of anything I or anyone else had changed on these machines recently. So I advanced to step two: getting a small, reproducible test.

In this case, the test can be reduced to a simple command line execution:

    $ pip install -Ur requirements.txt

This is a standard pip requirements file, with the standard options to prefer our internal server over the official PyPi server:

    $ head -2 requirements.txt
    --index-url=http://pypi.example.com/simple/
    --extra-index-url=http://pypi.python.org/simple/


Here and in the next examples, I'll substitute the real domains for our servers for fake ones. Anyway, running that one simple command was all that was needed to test the strange behavior. On my personal development machine, that took quite a while:

    $ time pip install -Ur requirement.txt &> /dev/null

    real    0m41.280s
    user    0m3.557s
    sys     0m0.100s


Even more interestingly, this was way less time than we were seeing on the servers:

    $ time pip install -Ur requirement.txt &> /dev/null

    real    13m19.752s
    user    0m1.031s
    sys     0m0.184s



On the server

The three main servers affected by this issue (the ones I spend most time in) were our buildbot (i.e. continuous integration), staging and production servers. I decided the first test would be on the buildbot server, as it is the same server where the packages are hosted. That way, I can exclude many external factors that could be affecting the traffic.

So I fired my favorite tool: the amazing strace. If you don't know it, stop everything and go take a look at `strace(1)`. Since I know most people won't, here's a quick introduction: strace execve(2)'s your command, but sets up hooks to display every system call it executes, along with arguments, return values and everything. It is an amazing tool to have an overall idea of what a process is doing. If you are root (and have CAP_SYS_TRACE), you can even attach to a running process, which is an amazing way to debug a process that starts running wild.

Using it to run the command:

    $ strace -r -o strace.txt pip install -Ur requirement.txt

The arguments here are `-o strace.txt` to redirect output to a file and the super useful `-r` to output the relative time between each system call, which is perfect to identify the exact system calls slowing down the execution of the command.

After the execution was done, looking at the output, I found the culprit. Here is a sample of the log:

    $ grep -m 1 -B 5 close strace.txt
     0.000138 socket(PF_INET, SOCK_STREAM, IPPROTO_TCP) = 4
     0.000115 fcntl(4, F_GETFL)         = 0x2 (flags O_RDWR)
     0.000092 fcntl(4, F_SETFL, O_RDWR|O_NONBLOCK) = 0
     0.000089 connect(4, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("23.235.39.223")}, 16) = -1 EINPROGRESS (Operation now in progress)
     0.000128 poll([{fd=4, events=POLLOUT}], 1, 15000) = 0 (Timeout)
    15.014733 close(4)


From this point, my analysis had some flaws that delayed the final conclusion.  I will explain my line of thought as it happened, maybe you'll find the mistakes before I get to them. So, as can be seen in the output, a socket is opened to communicate with another server, which is normal behavior for pip, but then closing it takes around fifteen seconds. Hmm, that is really odd.

So I ran the command again and, while it was blocked waiting, I used another useful command to list the open file descriptors of the process:

    $ lsof -p $(pgrep pip) | tail -2
    pip     19928 django    3u  IPv4 8064331      0t0     TCP
    pypi.example.com:38804->pypi.example.com:http (CLOSE_WAIT)
    pip     19928 django    4u  IPv4 8064359      0t0     TCP
    pypi.example.com:30470->199.27.76.223:http (SYN_SENT)


Here we have two open sockets. One of them is in the CLOSE_WAIT state. Anyone who's ever done socket programming knows this dreaded state, where the local socket is closed but the remote end doesn't send the FIN packet to terminate the connection. A few minutes of tcpdump later, I was convinced that was the problem: something was preventing the connection from ending and each operation was waiting for the timeout to close the socket. That would explain why closing the socket took so long.


The mistakes

At this point, I realised my first mistake. If you take a look at the strace output again, the remote end of the socket is *not* our server. Take a look at the remote address (the `sin_addr` parameter to the `socket` call): 23.235.39.223 is not the ip address of our server, and taking a look at the rest of the output showed that the address changed over time.

There should be no other servers involved, since we explicitly told pip to fetch packages from our own server. So I thought: what other server could be involved here? So I took a guess:

    $ dig pypi.python.org | grep -A 2 'ANSWER SECTION'
    ;; ANSWER SECTION:
    pypi.python.org.        52156   IN      CNAME   python.map.fastly.net.
    python.map.fastly.net.  30      IN      A       23.235.46.223


Damn... Wait!

    $ dig pypi.python.org | grep -A 2 'ANSWER SECTION'
    ;; ANSWER SECTION:
    pypi.python.org.        3575    IN      CNAME   python.map.fastly.net.
    python.map.fastly.net.  7       IN      A       23.235.39.223


Bingo! So it was a connection to one of PyPi's servers. I went back to the strace output and realised my second mistake. If you read strace's man page section for the `-r` option carefully, the delta shown before each line is not the time each syscall took, but the time between that syscall and the last. So the operation that was getting stuck was not `close`, but the previous, `epoll`.

In hindsight, it is obvious. You can see the indication that the call timed out. You can even see the timeout is one of the parameters. And so the mystery was solved. By some unknown reason, pip was trying to make a connection to PyPi after checking our server. Since we don't allow that, the operation hang around until the timeout was reached. One final test confirmed our suspicion:

    $ time pip install -Ur requirement.txt &> /dev/null

    real    0m42.981s
    user    0m3.720s
    sys     0m0.070s
    $ sed -ie 's/^--extra-index/#&/' requirements.txt
    $ time pip install -Ur requirement.txt &> /dev/null

    real    0m1.049s
    user    0m0.853s
    sys     0m0.057s


Removing the extra index option eliminated the issue (and gave us a ~42x speed up, something you don't see everyday).


Conclusion

So, what do we take out of this (unexpectedly long) story? If you are using a local package server, don't use `--extra-index`. I have no idea why pip was trying to contact the extra index after finding the package on our server. The only reason I can think of is it is trying to find a newer version of the package, but even then, most of our requirements are fixed, i.e. they have '==${some_version}' appended.

Even on my development machine, where pip can reach the remote server, it is worth it to remove the option. The time it takes just to reach the server for each package, even just to receive a "package up-to-date" message, slows down the operation considerably:

    $ time pip install -Ur requirement.txt &> /dev/null

    real    0m46.816s
    user    0m3.853s
    sys     0m0.130s
    $ sed -ie 's/^--extra-index/#&/' requirements.txt
    $ time pip install -Ur requirement.txt &> /dev/null

    real    0m1.125s
    user    0m0.947s
    sys     0m0.053s



Coda

Thank you for making it this far. I hope this story was entertaining and hopefully it taught you a thing or two about investigation, debugging and problem solving. Take the time to learn the basic tools: ps, lsof, tcpdump, strace. I assure you they will be really useful when you encounter this type of situation.

Friday, May 2, 2014

Manufacturer's blues

The state of technology sometimes gives me the blues. In 2014, I found myself installing a package called `dosfstools` to format my pendrive. That's because the car's radio wouldn't accept any device with a filesystem that's not fat32:


Of course possessing a filesystem of another kind makes the device "not a device". After a few seconds on the command line the problem went away:
$ sudo pacman -S dosfstools
$ sudo mkfs.vfat -n BILLY /dev/sdb1
but the sadness remained. At l(e)ast I can listen to some badvoltage while driving.

Oh yes, did I mention it doesn't accept any format except mp3? And how does it signal it? By not showing the files at all, of course.

Monday, March 3, 2014

Uploading files the unix way

I just did something which I thought was worth sharing. It shows a couple of fundamental techniques in unix systems used in a context where most people would not think of.

The context: my mom sent me a recipe of potato souffle (I hope google translate is doing the right thing here, it seems like it is). But as most of e-mails coming from non-tech people, there was a lot of noise along the actual recipe text. Since I wanted to view it in my cell phone while I cooked, I decided to clean it up a bit.

So did I: copied the text to vim (which got rid of most of the noise immediately), edited a bit and it was ready:

    Ingredientes
        450 gramas: Batata cozida
        200 gramas: Queijo ricota caseiro – Guloso e Saudável
        4 colheres de sopa: Queijo gorgonzola
        3 colheres de sopa: Farinha de trigo branca
        300 mililitros: Leite desnatado
        2 unidades: Gema de ovo
        3 unidades: Clara de ovo
        1 colher de sopa: Fermento em pó
        1/2 unidade: Cebola ralada
        4 colheres de sopa: Salsa fresca cortada

    Modo de preparo
        Amasse os queijos, reserve
        Amasse as batatas, reserve
        Preaqueça o forno a 200ºC
        Bata as claras em neve com uma pitada de sal, reserve
        Numa vasilha misture as gemas, a farinha, o leite, a cebola, a batata, os queijos, a salsinha
        Junte delicadamente o fermento e as claras em neve
        Coloque o suflê de batata em ramequins e leve ao forno por 30 minutos ou até assar
        Sirva de imediato o suflê de batata com queijos, acompanhado de salada

Now all I needed was a way to get it on my cellphone. I could use dropbox or any of the related services, but that didn't seem exciting enough[1]. So I thought: well, I'll just put it in my webserver and access it in my mobile browser.

Fair enough. But this is a file opened in vim[2]. After a little thinking (which lasted for approximately 423ms), this is what I did. From inside vim, I executed the following command:
:!ssh raspberrypi 'nc -l 31415 > /srv/http/sufle.txt' &
Which launched an ssh client connecting to my webserver[3] running the netcat command. If you are not familiar with netcat, think of it as the traditional cat command which reads and writes to sockets instead of files. Here, I'm telling it to listen (-l) on port 31415 and output anything read to the file /srv/http/sufle.txt, where /srv/http is the document root of the webserver.

Since I added the ampersand to the end of the command, the command is launched in the background and just stays there waiting for input, releasing the shell immediately so I can continue using vim. Now that we're back, another vim command:
:w !cat > /dev/tcp/raspberrypi/31415
Here we are telling vim to write the contents of the buffer (:w). Normally, we would give it the name of a file to write, but the :write command accepts a special syntax, where instead of a file name, we put a ! and the rest of the line is interpreted as a command[4]. This command is executed in a standard unix shell, but with stdin redirected to read the contents of the buffer. As an example, try opening a new empty buffer in vim (c-w n), writing something and executing the command[5]:
:w !cowsay
Continuing the analysis of the command, we are using the :w ! command to write to the program run by the shell, which is
$ cat > /dev/tcp/raspberrypi/31415
Well, you don't need to be a unix wizard to know what this is doing. cat is a command that reads the contents from a list of files (or stdin if it's invoked without arguments, as it is here) and writes them in the same order to its stdout. Since its stdin will read the contents of the vim buffer, we already know the effect of this command: the contents of the buffer will get written to a file.

But what is that funny file starting with /dev/tcp/? That is feature of bash (I don't know about other shells). You can read about using the command[7]:
$ man -P 'less -p /dev/tcp' bash
What we are doing here is telling bash to write the data coming from cat's output to port 31415 on the host raspberrypi. Remember our little ssh friend we left running in the background a while ago? He's not done much since[8], but now there is data coming from the socket it's listening on, so the os wakes him up to do its job.

And so it does, reading the data from the socket and writing it to stdout. Remember what that is? That's the file we passed to the shell, /srv/http/sufle.txt. And so our long journey comes to an end. When nc finishes reading and writing the data, it dies[9] and so does the ssh client process, which was also sitting there, waiting for it's child process do die[10]. The shell that spawned the ssh client has long died: it had nothing more to do because we started the process asynchronously (using the & on the command line), so that's the end of that (process) family tree[11].

The result? The contents of the buffer have now been transfered and written to the file /src/http/sufle.txt on the remote host raspberrypi.

Conclusion


You might be thinking: "What the hell? You are telling me all this is simpler than dropbox?". And I would be lying if I said "yes". But the point here is not that this is simple. In fact, it's the base of one of the most complex systems in computer history: the operating system. And even though I have detailed some parts of the processes, there are a lot, and I really mean *a lot* more things going on behind the scenes here[12].

But if you step back, you will realize that all that was needed were two commands:
ssh raspberrypi 'nc -l 31415 > /srv/http/sufle.txt'
cat > /dev/tcp/raspberrypi/31415
What I'm trying to present here is the incredible complexity and richness that can be achieved by using shell commands and unix concepts like input and output redirections and socket communication. Another goal was to show how a powerful text editor incorporates this concept in its design, taking advantage of of the powerful features of the operating system to extend its capabilities[13].

And I bet I can type these two commands faster than your dropbox client can transfer the file to your cellphone[14]. Now, I you'll excuse me, there is a potato souffle that needs to be cooked.

Bonus


While doing some scientific tests on the cowsay program (trying to make the cow say the text of this post), I discovered perl has a special mention of my blog on its source code[6]:
:w !cowsay
This shouldn't happen at /usr/share/perl5/core_perl/Text/Wrap.pm line 84, <stdin> line 63.

shell returned 255

Notes


1. What sane geek uses dropbox, anyway?
2. Actually just a buffer, since I didn't even write it to a file.
3. A rπ.
4. The space after :w is important here. If we wrote :w!cat ..., it would be interpret as the :write! command, which is not what we want here. See :help :write! on vim.
5. What do you mean "I don't have cowsay installed"?
6. You can check it here (thanks sources.debian.net for their awesome service).
7. Don't worry, that will only open a man page passing a special flag to the pager telling it to go directly to the right section.
8. He hasn't been doing anything, actually, thanks to the magic of process scheduling and blocking io syscalls, so no resources are wasted (except for a few bytes on the process and file tables).
9. Ok, it terminates its execution, "dies" sounds so dramatic.
10. Just when you thought it could not get more dramatic.
11. Ok, that was on purpose =)
12. Including, for example, the whole file buffer cache algorithm that makes this whole process absurdly efficient, even though we seem to be copying data around dozens of times. That is the subject of whole book chapters (like this one).
13. My trademark phrase is: "do *that* in gedit".
14. After it has sent your data, maybe without using encryption, across the internet and sent a copy of the recipe to the NSA for analysis.

Sunday, July 14, 2013

A post-mortem

Part I - The raspberry
Part II - An image is worth a thousand megabytes
Part III - A post-morten

A lot has happened since my first rπ post. I've used that little thing to death (quite literally), as a ssh box, firewall, general server, web server, web browser, movie box and a whole lot more. Then, I've had the SD card die on me. For me, it was a perfect little linux computer that could stay on all the time, consuming little-to-no power, making absolutely no noise, occupying minimal space. In a few words: really, really awesome. And all that for 35 dollars!

As promised, I'll post here the steps to get from a rπ on a closed box to a little box of awesomeness. I know there a ton of posts like these, but hey, storage is free (for now) on blogger. If you haven't read the first post, this is a good time to do it. I will first make a few post-morten (again, literally) observations about my first assumptions.

SD card: let's start with the most important thing. When I first went shopping for the various attachments, I got the most inexpensive card I found. The only requisite was that it had at least a 4gb storage. After it died, I decided to spend a few extra bucks and get a decent one. Man, I was surprised. There isn't any way of comparison between the π before and now. Instead of moaning the old days, I'll just say this: get a decent, class 10 SD card and you'll have nothing to worry about. And since these things get cheaper and cheaper, get a 16gb one.

Power supply: again, when I started using the π, I just used my cell phone charger, which, by extreme luck, was compatible and provided almost exactly the amount of power needed. Then I started using a wireless adapter sometimes. Then I started using hdmi output sometimes. Then I tried to use both at the same time. And it died. Not that it does any harm to the board, but any moderate load will simply power-off the device. So, if you want to use more than the π itself, get an appropriate power supply. Those on the farnell website worked just fine for me. And just for completeness sake, mine is 1000mA.

HDMI: I did my first tests on my living room TV. But after a while (as addiction got worse), I felt the need to use it on my bedroom, where I have only an old computer monitor with VGA input. So I got a VGA-to-HDMI adapter and things worked fine (well, almost, more on that later).

OS: the first distribution I installed was raspbian, mainly because I was used to debian and, by the time I started playing, there weren't many alternatives. Now we have pidora and even riscos. But anyway, raspbian is a really good distribution and I never had a problem with it. But I am restless. Since I was going to reinstall the OS from /dev/zero, I decided to jump from the cliff and install arch. For no special reason, just to try something else. So, the instructions from now on will be arch centered, although many will be usable on other distributions.

Man, I write a lot. Let's stop right here and I'll get to the first installation steps later.

Saturday, May 11, 2013

python mocking

If you are into unit testing, you probably have been introduced to mocking. And if that is the case, you probably already have been bitten by it. Mocking requires some understanding of code execution, importing and name resolution that most people lack when first encountering such situations. In python, mocking is a relatively simple process, if you analyze carefully what needs to be done.

Mocking simply means replacing an object with another. This is usually done to avoid instantiating costly systems or to change the behaviour of a system. To begin with a simple example:
def some_function_in_your_code(a):
    if a.do_something():
        return 3.14159
    else:
        return 6.26318

def some_other_function_in_your_code():
    # ...
    a = create_complicated_object(*thousand_parameters)
    result = some_other_function_in_your_code(a)
    # ...
Here we have a function some_other_function_in_your_code that uses an object. Somewhere else in your code, some_other_function_in_your_code creates that object, which is a complicated process that involves hundreds of operations. If you just want to test some_other_function_in_your_code, you shouldn't need to go through this whole process[1].

To avoid that, we can use mocking. Notice that all we need to test on some_other_function_in_your_code is that the argument passed has a member called "do_something" that can be called with no arguments and returns something that can be converted to bool. There are many ways to do that, but to keep things short, I'll skip right to the library I use most of the times, mock.

The main component of the mock library is the Mock class, which is basically an empty object with a few useful characteristics (be sure to check the documentation, because they are really useful). Two of them are important for this discussion. First, every time you access an attribute of a Mock object, another Mock object is created and assigned to the attribute being accessed.
>>> import mock
>>> m = mock.Mock()
>>> print(m)
<Mock id='140095453080976'>
>>> print(id(m.some_attribute))
140095453131664
>>> print(m.some_attribute)
<Mock name='mock.some_attribute' id='140095453131664'>
>>> print(id(m.some_attribute))
140095453131664
As you can see, when we tried to access some_attribute, a Mock object was created for us. We could do it by hand with a single line of code, but it makes it makes the code easier, shorter and cleaner.

The other feature of Mock objects is the return_value attribute. Whatever is contained in this attribute gets returned when the object is called as a function object [2].
>>> import mock
>>> m = mock.Mock()
>>> m.return_value = u'the return value'
>>> m()
u'the return value'
Using this two techniques, we can now test our function:
import mock

def your_test():
    a = mock.Mock()
    a.do_something.return_value = True
    if some_function_in_your_code(a) != 3.14159:
        raise Exception(u'return wasn't 3.14159.')
    a.return_value = False
    if some_function_in_your_code(a) != 6.28318:
        raise Exception(u'return wasn't 6.28318.')
Let's break that down. We first create a Mock object to represent the argument passed to the function. The next line takes care of the two things we need to test the function: the do_something attribute and its return value. Then, all we have to do is call the function, passing the mocked argument, and check the return value. After that, repeat the process, this time with a different return value.

That was the easy part


This first section was easy, nothing you couldn't find out with a quick search on the internet. But the real world is not that pretty (at least mine isn't). The trickiest situation is when you have to change the behaviour of something inside a function, but you don't pass that something as an argument. Suppose we have this:
# some_file.py
import random

def f():
    if random.random() < 0.5:
        return 3.14159
    else:
        return 6.28318
How can you test that function if the value tested is conjured from oblivion in the middle of the function? Fear not, you can actually do it. The trick here is to mock the random function from the random module before the first line of f is executed. Let's start with this simple example, just to get the basic idea:
>>> import random
>>> random.random()
0.5212285734499994
>>> random.random()
0.40492920488281725
>>> import mock
>>> random.random = mock.Mock(return_value=0.5)
>>> random.random()
0.5
This seems pretty simple. But here is where everybody gets lost:
>>> import random
>>> import mock
>>> random.random = mock.Mock(return_value=0.5)
>>> import some_file
>>> some_file.f() == 6.28318 or raise ThisShouldNotBeHappeningException()
This may not explode the first time, but you have a 50% chance of getting an exception. To see why, you can put a print random.random inside f and see that the mock didn't work. To understand why, we have to dig a little deeper.

Import


What happens when you run import random? You can watch this video to know exactly what. Or you can just continue reading to get a summary. Or both. Anyway:
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', '__doc__': None, '__package__': None}
>>> import random
>>> locals()
{'__builtins__': <module '__builtin__' (built-in)>, '__name__': '__main__', 'random': <module 'random' from '/usr/lib/python2.7/random.pyc'>, '__doc__': None, '__package__': None}
In python, locals is a built-in function that returns the local variables accessible on the current state of execution (don't believe me, run print(__builtins__.locals)). When you execute import random, the interpreter does its magic to find the module and load it, but more important, it creates an entry called "random" on the current namespace referring to the module loaded. The critical part here is "current namespace". Try this:
>>> def f(): print(locals())
>>> import random
>>> f()
{}
Here, importing random didn't affect the namespace on f. The same thing applies to namespaces of other modules. Our example fails because the namespace on the some_file module is different than the namespace where we run our tests. To change the namespace of some_file, we have to do it explicitly:
>>> import some_file
>>> some_file.random.random = lambda: 0.5
>>> some_file.f() == 6.28318 or raise ThisShouldNotBeHappeningException()
You can run that many times if you don't trust me, but that will always succeed. And it does because we now are changing the correct namespace. You can check it by putting a print(random.random) on f again.

Being nice


Now you know how to mock, but there is something I must say before you leave. Always, always, ALWAYS restore any mock you do. Seriously. Even if you're sure no one will use the mocked attribute. You don't want to loose an entire day of work just to find out that the problem was an undone mock.

And doing it is so simple: store the original value on a variable and restore it after the operation. I like to do it as soon as the operation is complete, before anything else is executed, but you don't need to, if you're not paranoid. Just to clear any doubt, here is exactly how to do it:
>>> import random
>>> original_random = random.random
>>> random.random = lambda: 0.5
>>> # do something
>>> random.random = original_random
Now you have no excuse. Better yet, you can use another feature of the mock library called patch. But that would be an extension to an already long post. Maybe I'll cover it in the future. Anyway, happy mocking!

Notes


1: You shouldn't have complicated processes that involve hundreds of operations anyway, but that is another problem.

2: Curious to know what happens when you access return_value without setting it first? No? Well, I'll show you anyway:
>>> import mock
>>> m = mock.Mock()
>>> m.return_value
<Mock name='mock()' id='140095453168080'>
Since we didn't set it, we get the default behaviour of __setattr__, which is to create another Mock.