Learning about Linkers and Loaders

Every programmer knows what a compiler does, right? Well, at least most of us pretend to know, although few seem to have ever dug into the details – but that’s okay, as long we know what a compiler is good for and how to use it. How about linkers? This is where most of us, including me, have much less knowledge about. Usually we think of it as this little piece of the compiler tool-chain that typically gets called by the compiler “under the hood” and silently does its magic, unless, sometimes it blows up and throws some really annoying, cryptic error message at us. Time to learn a bit more about linkers… even if only to better understand how to make those error messages go away.

Looking around on the Web, everyone seems to ultimately refer to the same book, which is possibly the best on the subject: Linkers and Loaders by John R. Levine, published by Morgan-Kauffman in October 1999. The book’s Web site has a link to unedited draft chapters of the book, which can be read for free. The book discusses several example object and executable code formats: MS-DOS .COM files, DOS .EXE files, Windows COFF and PE formats (.EXE and .DLL files), UNIX a.out and .ELF files, and Intel/Microsoft’s OMF. Some of the older formats may not be too heavily used anymore, but serve as simpler examples than the newer, more complex formats.

Levine’s book is a treasure trove of information. For me, there is just one problem with this book: it contains way more information than I care to read. I will certainly keep its link for reference, and maybe even buy it some day. But in the meantime, I kept looking for a more concise overview. The 2002 Linux Journal article “Linkers and Loaders” by Sandeep Grover got much closer to what I had in mind. It is centered around the ELF object and executable format, which is the native format on Linux, and walks through the steps involved in turning a small C program into an executable using the GNU tool-chain, explaining well what each tool actually does.

Another excellent article, which I found an even more pleasant read, is the “Beginner’s Guide to Linkers” by David Drysdale. In this article, Drysdale starts off by showing a common linker error message, and suggests that if you immediately know off the top of your head how to fix it, you probably will not learn anything new from his article. Well, I certainly had seen that error message before, and through trial-and-error (or should I say copy-pasting from working examples) I eventually made it go away… but without at all understanding why the fixing edit (placing an extern "C" wrapper around a C include file to be used inside a C++ source file) was effective.

Using simple examples and some very nice diagrams, Drysdale goes on to explain in great depth what a C, C++, or even FORTRAN compiler produces as object code, and how the linker reworks that object code into an executable or a dynamic library. He starts with a small example C file, and discusses what the compiler might produce of it. He then shows how to use the UNIX tool nm to actually examine the object code output from a test compilation. Next, the linker is invoked, and nm is run again to look at what has changed. From these simple examples, Drysdale works his way to more complex topics, until he reaches the intricacies of dynamic libraries and the oddities introduced by using C++, which eventually gets back to the error message mentioned in the beginning.

I liked Drysdale’s article a lot, but to follow it I needed a little help from a different Web page: I am working on an OS X system, and all the articles mentioned so far were tailored towards either DOS/Windows or Linux systems. However, OS X does not only use a different format for object and executable files (Mach-O), but also different commands to examine the contents of those files. The article “How OS X Executes Applications” by Mohit Muthanna a.k.a. 0xFE was exactly what I needed to follow the experiments from the other articles through on this platform.

Okay, enough of this blah and back to reading more of those excellent articles. Maybe, once I have understood more details of linking and loading, I will add my own description here… For the meantime, at least now I understand not only the reason of extern "C", but also why I sometimes had to rearrange the ordering of object files to get ld to link them.

This entry was posted in ITE 221. Bookmark the permalink.