IWP9 2008

Continuing with my travel spree, I made a trip to Volos, Greece and back for the 3rd International workshop on Plan 9. I was to make short presentation on Glendix, a paper on which was selected for the workshop.

Being a brown single guy in his early twenties, traveling around Europe is not exactly fun. Just saying. I got picked out not once, not twice, but THREE times for “random passport and security checks”. Once in Munich on my way to Volos, once on the streets of Athens, and finally on my way back at the Frankfurt airport. Not that I’m complaning, they were just doing their job; but really, they need to get better at profiling.

Athens is a really nice city, though it reminds me of India: crowded trains, chaotic traffic and sketchy bus stations. I knew most of the Greek symbols, thanks to high school Math courses, but pronouncing them wasn’t easy. Thankfully, the people at the counters in the Airport, Bus and Metro stations knew English. After a 5 hour bus ride, I reached Volos in the wee hours of Thursday. After around 3 hours of sleep and about 30 minutes of slide preparation, I was set for my talk.

Considering I was at a Plan 9 conference, talking about integrating it into Linux, my talk was very well received. Certainly beyond my general expectations: I got some really excellent questions, comments and general observations, and most importantly, a lot of help on the current issues that Glendix faces. All the other talks during the conference were extremely interesting as well, I was particularly fascinated by the concept of “Upperware”, the Inferno port to Nindento DS, and the Mrph morphological analyzer. Do check out the entire conference proceedings.

It was great to finally meet all the Plan 9 and Bell Labs folks in-person, especially: Sape Mullender, Charles Forsyth, Bruce Ellis; not to mention the IRC regulars uriel, quintile, sqweek and fgb!

The return trip was a bit more scenic, thanks to it being afternoon. After spening the night in Athens, I was back in Amsterdam the next day. More adventures followed, but that’s for another blog post.

Nothing like a trip to IWP9 to humble you!

P.S. Cool Glenda goodies for sale at Cafepress :-)

Good times at 9fans

The 9fans list has produced TWO fortune-worthy quotes this weekend, a rather rare occurrence :)

If you cannot read this, reply. Otherwise, disregard. -Pietro Gagliardi

Come _on_. I’m not that subtle a “baiter,” or… am I? -Eris Discordia
you’re a master baiter. -Skip Tavakkolian

I must say the quality of traffic on the list has been deteriorating even over the relatively small amount of time I’ve been subscribed to it.

An alternative to shared libraries

There are few people I’ve met who found the experience of dealing with shared libraries pleasant. Personally, I really despise them. The whole idea of “shared code” is great and all, but the implementations in the Unix and Windows worlds are not something I would like to deal with. I would like to go as far as saying, “shared libraries suck”, but that would mean a lot of people are going to flame me – just because they are extremely prevalent and that they’re “working” for a vast majority of cases.

Before I proceed with the post, I would like to share two quotes from people that I greatly respect:

the trouble with shared libraries is that they seem at first quite reasonable, and indeed at a fairly abstract level, it seems irrational to be more opposed to them than any other form of sharing, such as shared text, but the mechanics of linking and sharing (especially on current processors), and of configuration control, have so many hard facts that the simplicity of the original is quite lost. having experienced several variants, i find it now saves time just to adopt the irrational position from the start.

i think i’d rather have (say) mondrian memory protection than either shared libraries or the vm crud they keep adding to chips and systems.

- Charles Forsyth

shared libraries are obviously a good idea until you’ve actually used them. then whether it’s obvious or not that they’re a bad idea is mostly a matter of how close you are to trying to get them to work.

- Rob Pike

I like static linking. But code these days is getting extremely complex and bloated, so people needed an alternative. Instead of focussing on making their code more cleaner and lean, they started thinking about they can share this huge piece of complex and bloated code across several applications. If you think about it, if your code is small and clean, you wouldn’t feel the need for shared libraries.

Ulrich Drepper’s famous page on why static linking is considered harmful is worth a mention here. In point 1, he says ‘fixes have to be applied to only place’, but that also means that it takes only one “fix” to introduce two bugs that has wide-reaching consequences (imagine a “fix” that introduces two new bugs in a shared libc – this *has* happened!). The other points are probably valid, but his conclusion is certainly not:

The often used argument about statically linked apps being more portable (i.e., can be copied to other systems and simply used since there are no dependencies) is not true since every non trivial program needs dynamic linking at least for one of the reasons mentioned above.

There exists an entirely functional, portable, (and superior) operating system that doesn’t support shared libraries (and for very good reasons).

If you’re not convinced that shared libraries are clunky hacks that shouldn’t be used, that’s Ok. Let’s assume that shared libraries are a great idea and are a boon to computing. That doesn’t mean we shouldn’t look at better ways of achieving the same goals in a cleaner, better manner.

Synthetic File Systems.

As an advocate of the Plan 9 operating system and it’s underlying principles, my opinion may be a bit biased; but I think synthetic file systems are frickin’ awesome. If I were to design a system where several applications were to share common cryptography code (for example), here’s how I would do it:

Schematic diagram of a synthetic Crypto filesystem

The synthetic file system in the center works something like this: applications wanting some cryptography work to be done read and write to files exposed by the FS. For example, ‘Application 1′ would write the data it needs SHA-1ed to /crypto/hash/sha1/data, and a subsequent read on the same file would return the hash (the read would block until the SHA1 was actually calculated). The great thing about such filesystems are that they are language independent, since almost any respectable language has file operations in its standard library.

However, in order to take care of things like type safety, we make this very tiny, stub library called ‘libcryptofs’. The job of this library is merely to pass on data from applications to files, while ensuring the compiler catches type safety errors (because this library is statically linked to every application wanting to use CryptoFS) and also performing the task of gracefully handling errors like /crypto being absent.

On the other side of the FS, we have ‘cryptofs’, the code that is responsible for actually providing the filesystem. It would statically link with a synthetic filesystem library (Not show in diagram: lib9p on Plan 9, maybe FUSE on Linux – but would you statically link with it? lib9p is also available on POSIX systems, by the way, thanks to the Plan 9 from User Space project). But this library is also small, because it delegates the actual cryptography operations to ‘libcrypto’, the library that has all the code implementing SHA1, RSA and so on.

But since ‘libcrypto’ is a small and awesomely written cryptography library, ‘Application 3′ may decide to directly link with it. Suitable for embedded devices or other memory-starved environments, where you want to avoid the whole FS in the middle because you know there is only going to be one application needing it.

Now, what about versioning. That is, after all, why shared libraries began to suck. With filesystems, it’s trivial to add functionality without breaking applications depending on older versions of your FS. That’s because all the compiler sees is a bunch of fopen/fread/fwrites and is not going to complain if the version of the filesystem changes because it doesn’t know. Alternatively, if you’re thinking of modifying the behavior of your filesystem; consider providing a ‘version’ file in the root of your FS right from the beginning. Applications would then write the version number they expect to be working with in that file as a way of initializing the filesystem – and multiple versions of the filesystem can live in harmony if your system implements per-process namespaces (Plan 9 has them, Linux does too thanks to CLONE_NEWNS) because every application ’sees’ its own private copy of the CryptoFS file hierarchy.

In summary, the answer is to write lean, efficient and small pieces of code (a very difficult task if you’re thinking of using the GNU toolchain!) and use filesystems in place of shared libraries. Plan 9 has been using it successfully for years, and I think we should learn something and try to apply the philosophy to other systems we love as well: Firefox extensions, COM models for Games, Plugin systems; anywhere we make extensive used of shared libraries and dynamic loading. Let me know what you think, and whether you would consider this approach for your next project! I know I certainly will :-)

The joy of combination

As some of you may already know, I’ve been working on the Glendix project for quite some time now. The basic idea is to combine the Linux kernel with utilities from Plan 9, in order to create a developer-oriented operating system distribution. I say it would combine the best of both worlds, but there are those who disagree :)

I’ve been working on the project by splitting the project into two separate modules. The first module was to make Linux understand the Plan 9 a.out binary format – and this was easily done by writing a kernel module, using existing binfmt functionality. The second part was to make Linux understand Plan 9 system calls, so it wouldn’t choke when the binaries are actually executed.

The usual way in which user-space applications communicate with the kernel in almost all modern operating systems is via system calls. What differentiates these operating systems from each other in this aspect, are the number of calls, and the mechanism by which they are invoked.

For instance, Linux applications use the INT instruction to raise software interrupt 0×80 (We’re only dealing with the x86 architecture here). The number stored in the accumulator (EAX) at the time the interrupt was raised is used to tell the kernel which system call is to be invoked. The arguments, if any, to the system call are passed via the other registers (EBX, ECX, EDX…) On the other hand, Plan 9 applications use interrupt number 0×40 (don’t ask why) to invoke a system call. The system call number is put on the accumulator, but the arguments are passed just like to any other regular function – on the (user-space) stack.

Writing the code for this part turned out to be a little tricky, since: a) Linux does not give us a clean way to capture software interrupts, and b) the argument passing convention is different. I finally resorted to patching the kernel rather than writing a module. Brute force, but it works!

So, till now, each of the two modules were working as expected when tested individually. I tested the first module by assembling a program using Linux conventions in Plan 9:

DATA string<>+0(SB)/8, $"Linux\n\z\z"
GLOBL string<>+0(SB), $8

TEXT _main+0(SB), 1, $0

/* Arguments for write(2) */
MOVL $1, BX
MOVL $string<>+0(SB), CX
MOVL $7, DX

/* Number for sys_write is 4 */
MOVL $4, AX
INT $0x80

/* Argument for exit(2) */
MOVL $0, BX

/* Number for sys_exit is 1 */
MOVL $1, AX
INT $0x80

After running `8a hello.s; 8l hello.8′, copying the executable to Linux and running it, it worked. The other module, I tested by writing a program for nasm in Linux, but this time using Plan 9 conventions:

section .data
    hello: db 'Hello World!', 10
    hlen: equ $-hello

section .text
    global _start

_start:
    ; 4 arguments for plan 9's pwrite call, last one is vlong (8 bytes)
    push 1
    push hello
    push hlen
    push 0
    push 0

    ; syscall number for pwrite is 51
    mov eax, 51
    int 64

    ; sycall number for exit is 8
    mov eax, 8
    int 64

After running `nasm -f elf hello.asm; ld -o hello hello.o; ./hello’, the output came onto the screen as expected. Now, the moment of truth, the ultimate test, was to combine the two portions of the project and run a Plan 9 executable directly on Linux :)

$ ./convert 8.out
P9: 1eb 4af9 94c 314 1034
P9: Padding 4b19 bytes from 4af9
Done! Output written to linux.out
$ ./linux.out
Segmentation fault
$ dmesg | tail -n 1
linux.out[7762]: segfault at c0000000 eip 00001051 esp bfffffb8 error 5

Damn, what went wrong? The first step was to find out what error 5 meant. The strerror function is supposed to be used for returning meaningful strings corresponding to cryptic error numbers, but all I got as output, a small program later, was ‘Input/output error’. Big help that was :p

Closer inspection of eip and esp revealed a bug in the loader I wrote earlier. The instruction at address 0×1051 was a MOVL to a stack offset (4(SP)), which resolved to 0xC0000000. However, the main function also receives arguments (namely argc and argv), so the loader had to accommodate those values and set the stack pointer to a little lower value (which is around 0xBFFFF000 in the average case). Voila, the hello world program worked after that small tweak. Ah, the joy of combination :)

We’re still a while away from getting 8c to run though, I’m going to be implementing all the system calls it needs one by one, starting with brk. Updated sources can be found here. See you later!

Adventuring with a.out

The first step in the Glendix project was to write a binary loader for the Plan 9 a.out format. Linux has a clean interface for registering new binary format handlers from a module. Basically, you define a structure of type linux_binfmt and call register_binfmt during initialization of the module. Now all that’s left to do is implement the three functions that you pointed to in your structure: load_binary, load_shlib and core_dump.

Luckily for me, all Plan 9 executables are statically linked so I can just leave load_shlib as NULL. core_dump is also not that important during the development stages, although the final product must definitely implement it. To get a feel of what I needed to do in load_binary, I decided to take a peek into some of the other binary format handlers. I tried to comprehend the code for ELF with not much luck. I then turned to UTLK, which helped me understand what was going on. I highly recommend the book for anyone interested in kernel programming.

Anyway, here is when I found out that all ELF executables have sections that are actually page aligned! That means every ELF executable contains a bunch of zeroes after the TEXT section, so that the DATA section starts at the next page address. That’s how the executable is supposed to be laid out in memory, but I had no idea someone would actually think of doing it in the file. I guess they have their reasons, all the binary format loader does is mmap the file. Maybe for ELF2 they could put in zeros for the BSS section in the file too :p

Plan 9 executables on the other hand, are just normal files with no padding. This gives me a headache because I can no longer use mmap. Recall that all addressees passed to mmap have to be page-aligned. But the DATA section in Plan 9′s a.out will start at a non-page-aligned address most of the time.

One of the first things I tried to do was to mmap the file into a high address, copy portions into the appropriate locations and then free the mapping. That didn’t work so well because:

  • memcpy works only on physical addresses. Logical addresses from the virtual process address space can’t be easily translated to physical ones because Linux delays physical memory allocation for as long as possible. Now we know why all the loaders use mmap, it is fundamental to the “Linux way” of memory management.
  • There is no generic copy_in_user implementation. There are specific ones that use assembly code for PPC, SPARC and even x86_64, but none for x86. The alternative was to use copy_from_user to move data into kernel addresses and then bring them back using copy_to_user. That didn’t work out well either – copy_to_user kept failing for some reason.

I ended up writing a userspace program called ‘pad’ that page-aligns a Plan 9 a.out executable. The loader just mmap‘s the file, like all other loaders. The solution is suboptimal, if someone knows a clean way of doing all of this in kernel-space, I’ll be grateful for the help. The ultimate goal is to run Plan 9 executables on Linux, unmodified.

The code for the loader and the pad program can be found on git here.

Back from Goa

My Goa trip was simply fantabulous. Apart from the fact that Goa is a great place for a vacation, I was accompanied by 7 of my college friends which made the trip one that I will cherish for a long time to come.We left Bangalore by bus on the 15th. The journey was pleasant and the view next morning was absolutely stunning:

The bus dropped us off at the Panaji bus terminus, and we took a shuttle from there to Vasco – where Ameya (our host) lived. After a nice lunch and a nap, we took off to the beach closest to base camp – Bogmalo. The beach was a quiet and clean with relatively few people around, which made it possible for us to play a game of beach football. We returned home after jumping around in a sea for a while.

We hired a couple of Activa’s the next day (this seems to be the norm for transportation in Goa) and reached Old Goa in an hour or so. We visited the really old church, which was really impressive – it also contained the remains of St. Francis Xavier. The archaeological museum next door was fun too, very informative about the history of Goa. We proceeded to the capital city of Panaji next, and after booking tickets for a river cruise aboard the Princess de Goa for the night, had lunch at the QuarterDeck.

After lunch, we visited Donapaula, a popular Jetty, which was unfortunately under renovation or something. The view was great though, and we enjoyed a nice little ride on the water scooter. After hanging out in Cafe Coffee Day for a while (These places are *everywhere*, I think they’re trying to become the Indian Starbucks :) ) we reached the Panaji river coast to board the Princess de Goa. These river cruises seem to be a popular attraction, they basically consist of a few dance shows, a dance floor and an amazing view. We got to see the (in?)famous floating casino on the way too.

My Summer of Code mentor, Matt Lawless, happened to be in Goa too, so we scheduled lunch for the next day. We met at the Calangute post office (which was somewhat close to Matt’s home) and proceeded to the Calangute beach after lunch. My friends, meanwhile, reached the Baga beach, which was just next door to the Calangute beach (the two most famous beaches in Goa). We splashed around in the water for a while, joined by Matt, and then a second lunch :)

Matt decided to leave, and we went on to try some of the water sports at the beach. We went for a banana ride, a water scooter trip, but the one that took the cake was the parasailing. Nothing like a gentle ride in the sky to rejuvenate you. Flying over water with the beach behind you and the sunset in front is an experience I can’t put in words :)

The third and last day began with a long ride to south Goa, where we first visited the Benauli beach. This beach was beautiful, the sand was different than the others, and the most fun part was when I was buried by the others:

The rest of the day was spent at GoaKart, which was apparently a national Karting track. Parasailing was great, but karting was really the most exhilarating, especially because we raced and went for 4 rounds :D

Flickr didn’t let me upload more than 100MB of photos at once, so I moved to Picasa Web Albums instead. I wrote a small script: backr.py that uses James Clarke’s flickr.py to back up all my photos and uploaded them to Picasa Web, which allows me to create as many albums as I want (unlike Flickr).

So, I guess that’s a few more items off my “list of things to do before I die”!

Follow

Get every new post delivered to your Inbox.