Rediscovering Project Oberon

Way back in the early 1990s, when I was browsing a campus bookstore for the first time in my life, I remember coming across a remarkable book called “Project Oberon – The Design of an Operating System and Compiler” by Niklaus Wirth and Jürg Gutknecht. Just from a little browsing it became clear to me, that this was a true gem among computer textbooks. This book described the design and implementation of an entire operating system, including its implementation programming language, drivers, windowing system, everything!

The only comparable book that I can think of is Andrew S. Tanenbaum’s “Operating Systems: Design and Implementation”, which describes MINIX, another minimalist operating system for educational purposes. The UNIX-like MINIX 1.0 is probably most famous for having served as an inspiration and development platform for Linus Torvalds, while he began working on Linux. MINIX also led to a number of later variants, of which the latest, MINIX 3.0, is being marketed as a full-featured operating system.

But back to my story about Project Oberon… Unfortunately, I did not buy that book when I first saw it. Throughout the following two decades my primary interest wandered away from computer science, and even though I occasionally saw an Oberon related article here or there, I mostly forgot about it. This changed recently, when I stumbled over a free online edition of that very same book that I had first seen so many years ago. While I do not intend to design my own operating system or even just get deeper involved in operating system design in general, the introduction of this online book nevertheless proved to be a very interesting read.

The original Oberon system was designed to prove that operating systems could be made much simpler than its typical contemporaries, while still offering all the features necessary to power a personal workstation. Compatibility with other systems was not a requirement, so its inventors were free to experiment with novel concepts. In fact, they originally did not even stick to some standard hardware, but targeted their in-house designed Ceres workstations instead. Only later on, Oberon was ported to run on most major hardware platforms.

Oberon’s original user interface was a tiled window manager. Objects shown in any window, be they text or graphics, can be clicked, edited, and executed as commands. Under the hood, Oberon does not subdivide applications into distinct programs, but rather into different objects belonging to different modules. If I understand things correctly, there is not much protection of these objects from each other. Although the tasking model used is cooperative, not preemptive. This requires application code to be well behaved, at least to the point where it cannot easily crash the system.

Reliable operation of the Oberon system requires that only “safe” application code would be executed. To ensure this, all programs – pardon, modules – to be executed on the Oberon system, should be written in the Oberon programming language. This language, which was developed in parallel to the Oberon system, was specifically designed to be small and “safe”. It achieves the latter through strong, static type checking, and such features as array bounds checking and garbage collection.

Developing a system language alongside an operating system seems to needlessly complicate the problem, but it is not an all that unusual approach. After all, this same approach was applied earlier during the development of the UNIX operating system, leading to the extremely successful C programming language. What’s special about Oberon is that the language is required (or at least very strongly encouraged) for any software to be developed for the Oberon system.

The Oberon language is a classic “Wirth Language”, with strong similarities to its predecessor Modula-2, which again is quite similar to Pascal. There exist some Oberon compilers for other operating systems, which led to the Oberon language being somewhat more successful than the Oberon system.

The original Oberon compiler, described in chapter 12 of Project Oberon, compiled directly to machine code for the NS-32000 processor architecture, which was used in the Ceres workstations. Although the NS-32000 is a CISC processor, it has a relatively regular design, making code generation feasible within the scope of the project, but not simple.

In the newer book “Compiler Construction”, also available in a free online edition, Wirth specified an even smaller subset of the Oberon language, called Oberon-0, which only offers 32-bit integers and Booleans as native types, but otherwise includes many features of the full Oberon language, including records, arrays, and procedures. The Oberon-0 compiler in Compiler Construction targets a virtual RISC machine, which is much simpler than the NS-32000 processor architecture. The complete VM reference interpreter consists of only two pages of Oberon code. The complete code for the Oberon-0 compiler fits on 19 pages.

I do not think that the Oberon language ever stood a chance to become a mainstream programming language (otherwise its predecessor, Modula-2 would have already done so), nor that it was ever intended for that purpose. After two decades of existence, relatively few people seem to have ever heard of it. However, for those who are interested in learning about compiler construction (including their actual programming, not just their abstract design), there are few languages with such well-documented reference implementations as Oberon. Especially the compiler for the stripped down variant Oberon-0 is actually quite easy to understand.

While setting up a full Oberon system to play around with does not appear very encouraging, I am seriously considering some experiments with the Oberon-0 compiler. Maybe I could port that compiler to a language that I am more used to? Or, maybe I could even try to come up with my own language variation? (Just yet another useless toy language… but why not?) Or I could try to target a different architecture? Will see whether I can free up a little time over the holidays to play around with those ideas… If not, I fear that I might completely forget about Oberon again – and that would be a shame.

Posted in ITE 221 | 10 Comments

Learning about Linkers and Loaders

Every programmer knows what a compiler does, right? Well, at least most of us pretend to know, although few seem to have ever dug into the details – but that’s okay, as long we know what a compiler is good for and how to use it. How about linkers? This is where most of us, including me, have much less knowledge about. Usually we think of it as this little piece of the compiler tool-chain that typically gets called by the compiler “under the hood” and silently does its magic, unless, sometimes it blows up and throws some really annoying, cryptic error message at us. Time to learn a bit more about linkers… even if only to better understand how to make those error messages go away.

Looking around on the Web, everyone seems to ultimately refer to the same book, which is possibly the best on the subject: Linkers and Loaders by John R. Levine, published by Morgan-Kauffman in October 1999. The book’s Web site has a link to unedited draft chapters of the book, which can be read for free. The book discusses several example object and executable code formats: MS-DOS .COM files, DOS .EXE files, Windows COFF and PE formats (.EXE and .DLL files), UNIX a.out and .ELF files, and Intel/Microsoft’s OMF. Some of the older formats may not be too heavily used anymore, but serve as simpler examples than the newer, more complex formats.

Levine’s book is a treasure trove of information. For me, there is just one problem with this book: it contains way more information than I care to read. I will certainly keep its link for reference, and maybe even buy it some day. But in the meantime, I kept looking for a more concise overview. The 2002 Linux Journal article “Linkers and Loaders” by Sandeep Grover got much closer to what I had in mind. It is centered around the ELF object and executable format, which is the native format on Linux, and walks through the steps involved in turning a small C program into an executable using the GNU tool-chain, explaining well what each tool actually does.

Another excellent article, which I found an even more pleasant read, is the “Beginner’s Guide to Linkers” by David Drysdale. In this article, Drysdale starts off by showing a common linker error message, and suggests that if you immediately know off the top of your head how to fix it, you probably will not learn anything new from his article. Well, I certainly had seen that error message before, and through trial-and-error (or should I say copy-pasting from working examples) I eventually made it go away… but without at all understanding why the fixing edit (placing an extern "C" wrapper around a C include file to be used inside a C++ source file) was effective.

Using simple examples and some very nice diagrams, Drysdale goes on to explain in great depth what a C, C++, or even FORTRAN compiler produces as object code, and how the linker reworks that object code into an executable or a dynamic library. He starts with a small example C file, and discusses what the compiler might produce of it. He then shows how to use the UNIX tool nm to actually examine the object code output from a test compilation. Next, the linker is invoked, and nm is run again to look at what has changed. From these simple examples, Drysdale works his way to more complex topics, until he reaches the intricacies of dynamic libraries and the oddities introduced by using C++, which eventually gets back to the error message mentioned in the beginning.

I liked Drysdale’s article a lot, but to follow it I needed a little help from a different Web page: I am working on an OS X system, and all the articles mentioned so far were tailored towards either DOS/Windows or Linux systems. However, OS X does not only use a different format for object and executable files (Mach-O), but also different commands to examine the contents of those files. The article “How OS X Executes Applications” by Mohit Muthanna a.k.a. 0xFE was exactly what I needed to follow the experiments from the other articles through on this platform.

Okay, enough of this blah and back to reading more of those excellent articles. Maybe, once I have understood more details of linking and loading, I will add my own description here… For the meantime, at least now I understand not only the reason of extern "C", but also why I sometimes had to rearrange the ordering of object files to get ld to link them.

Posted in ITE 221 | Comments Off on Learning about Linkers and Loaders

Q330 Serial Port Communication

The Quanterra Q330 is a seismic data logger originally manufactured by Quanterra Inc., now a subsidiary of Kinemetrics Inc. The Q330 saw widespread adoption in the early 2000s. While it is not the newest Kinemetrics product anymore, there are still many Q330s in service. The Q330 is the primary data logger used in the EarthScope USArray Transportable Array (TA), and it is one of two primary data loggers available through the IRIS PASSCAL Instrument Center.

Between 2007 and 2009 several Q330s were installed in the High Lava Plains (HLP) seismic experiment. It was during that experiment that I got to know those data loggers. During the same time I was experimenting with in-house software for data conversion and data logger control. While I was mainly focusing on the competing REF TEK RT130 data logger, this activity got me interested in the Q330’s communication protocol as well. So I dug into the Q330 documentation and did some experiments while having brief access to a borrowed Q330. Programming serious control software for the Q330 would have taken more time than I was willing to afford, so eventually all my observations went into a text file, which got buried deep down in my hard drive. Recently, I found those notes again, and thought that they might be turned into a useful blog post. Even if you are not at all interested in the Q330, the following overview might still serve as a neat example on how the protocols of different network layers interact in the context of a somewhat unusual communication channel.

The Q330 data logger can be controlled remotely via Ethernet using a custom protocol termed QDP, which is transported via IP/UDP. Alternatively, a computer or a Palm handheld (the old style) may be connected to a Q330 locally through one of two serial ports: one wired and the other wireless via IrDA. Interestingly, these serial communication options use the exact same QDP protocol over IP/UDP as in the case of an Ethernet connection. This is made possible by framing the IP datagrams, which are carrying the UDP and ultimately QDP packets, using the SLIP protocol, so that they can be transmitted over a continuous serial byte stream. Let’s have a more detailed look at all the involved network layers and protocols.

Physical layer: RS232

When connecting to a Q330’s CONSOLE port, one is establishing a standard RS232 serial link, although the serial lines are wired to not-so-standard pins on a 10-pin MIL-spec connector. The Q330 does support hardware handshaking, but in the standard setup this feature is disabled. This allows the use of a minimal cable with just three wires connected: TXD on pin G, RXD on pin H, and GND on pin E. Note, that the Q330 uses the exact same MIL-spec connector as the competing REF TEK RT130 data logger; however, the two systems use different, incompatible pin assignments.

The RS232 communication parameters can be set to different values, but the following are standard:

  • Bit rate: 19200 BPS
  • Format: 8N1 (8 bits, no parity, one stop bit — standard values)
  • No hardware handshaking (three wire connection); but note that hardware handshaking (using CTS, RTS, DTS, and DTR) can be enabled.

Instead of using the CONSOLE port, one can connect wirelessly to the Q330 via IrDA. I have used the IrDA port only once (on a Q330 with a damaged CONSOLE port) and it was a major pain: one needs to hold the Palm handheld within roughly 10 cm of the Q330’s IrDA port and shade the area from the sun to establish a halfway reliable connection. I will not go into technical details of IrDA here, since I know not much about it. Just note, that at the device driver level, an IrDA connection looks basically the same as a regular RS232 connection, meaning that all the following discussion applies equally to both options.

Data Link layer: SLIP

At the device driver level, a serial transmission via RS232 resembles just a continuos stream of byte-sized characters. How can one send variable length IP datagrams over such a continuous byte stream? Two popular packet framing protocols have been developed to deal with this issue: Serial Line IP (SLIP) and Point to Point Protocol (PPP). While PPP is more advanced, for the purpose of directly connecting two nodes via RS232, the older SLIP is fully sufficient, and much simpler to implement. The details of SLIP are well documented in its original RFC1055, which also includes some very useful example C code.

The idea of SLIP is to basically send all characters within a datagram as-is, but add a special END character at the end of each datagram to provide a very simple mechanism of packet framing. The SLIP END character was arbitrarily chosen to have the octal value of \300. But there is one problem: what if any data byte within the packet happens to have the same value as the END character? To handle this, any such character must be replaced by a special two-character escape sequence. Of course, this introduces another special character, ESC, which may need to be replaced with another escape sequence as well. So we and up with two possible escape sequences, which are labeled ESC ESC_END, and ESC ESC_ESC. Here is a list of all involved special characters:

  • END, octal \300 — frames a packet
  • ESC, octal \333 — to form ESC ESC_END, and ESC ESC_ESC sequences
  • ESC_END, octal \334 — second part of ESC ESC_END sequence
  • ESC_ESC, octal \335 — second part of ESC ESC_ESC sequence

Note, that even though the SLIP special characters END and ESC share their symbolic names with the END and ESC characters in the ASCII character set, they have different octal values!

While not being a strict requirement of SLIP, it is common to prepare for every new datagram by first sending an END character. This allows the receiver to flush any garbage from its input buffer before starting to assemble a new datagram. It is not uncommon to encounter several END characters in a group. They may be interpreted as a sequence of zero sized datagrams, which should be dropped to avoid spending time to try to parse them. As a further optimization, any datagram of less than 20 bytes length (the minimum length of an IPv4 header) should be dropped early on as well. This will effectively strip out any spurious bytes, which may be introduced from connecting or disconnecting a serial cable while the port is open.

One downside of SLIP is that it does not provide any means of error checking. So aside from dropping datagrams below the minimum IP header size, all further error checking must be performed at the next higher network and transport layers.

Network layer: IP Header

QDP is transmitted via UDP, which in turn is transmitted via IP, so the outermost encapsulation layer inside a SLIP frame will be the IP datagram, which always starts with an IP header. For QDP packets, this will be followed by an UDP header, which again will be followed by a QDP header. But let’s just look at the IP header at this time. Here it is as a heavily commented C struct derived from netinet/ip.h:

struct ip {
  unsigned int   ip_hl:4;   /* Header length in 32-bit octets. Default is 5, indicating 20 header bytes. */
  unsigned int   ip_v:4;    /* IP version. 4 stands for IPv4. */
  uint8_t        ip_tos;    /* Type of Service. Defaults to 0. */
  uint16_t       ip_len;    /* Total length of IP datagram. Includes header and payload. */
  uint16_t       ip_id;     /* ID sequence number. To reassemble fragmented datagrams. */
  uint16_t       ip_off;    /* Fragment offset. To reassemble fragmented datagrams. */
  uint8_t        ip_ttl;    /* Time to live. Number of allowed hops left, to be decreased by routers. */
  uint8_t        ip_p;      /* Transport layer protocol. 17 stands for UDP. */
  uint16_t       ip_sum;    /* IP header checksum. */
  struct in_addr ip_src;    /* Source IP address. */
  struct in_addr ip_dst;    /* Destination IP address. */
};

The Q330 supports only IPv4, and all QDP packets are transported via UDP. Assuming furthermore, that no datagrams will be fragmented, the following consistency checks may be applied:

ip_valid = ip_hl == 5 && ip_v == 4 && ip_len == slip_len && ip_off == 0 && ip_p == 17
        && ip_sum = crc32(ip, ip_len);

If any of these tests fail, it is probably not worthwhile to examine the datagram any further. Note, that for our “dummy” network connection via SLIP there is really no need of any addressing scheme.

Transport layer: UDP

The UDP header does not add a lot of information to the IP header. Its main purpose is to provide source and destination port numbers to address specific UDP sockets. The definition of UDP is, as the definition of its sister protocol TCP, tightly coupled to the definition of IP. This is why many people consider the network and transport layers as practically inseparable. Let’s look at the UDP packet header to spot the problem:

struct udphdr {
  uint16_t    uh_sport;   /* Source port. */
  uint16_t    uh_dport;   /* Destination port. */
  uint16_t    uh_ulen;    /* UDP length. Calculated over entire datagram. */
  uint16_t    uh_sum;     /* UDP packet checksum. Calculated over entire datagram using a pseudo IP header. */
};

The calculation of the UDP checksum uses a “pseudo IP header”, a simplified IP header made up from information only present in the real IP header. This breaks the concept of encapsulation, since correct handling of the UDP packet requires information from the IP header. What this means for a program that analyses data coming from a Q330 is that the IP and UDP packet headers should be checked and handled by the same subroutine, as if they belonged to a shared network layer.

Application layer: QDP

The payload of the UDP packets finally contains the Q330 specific application layer data encoded as QDP packets. I will not delve into the details of the QDP protocol here. So I will restrict myself to the one component, which is common among all QDP packets: the QDP packet header.

struct qdphdr {
  uint32_t   crc;      /* Packet Checksum. CRC32 including header (without CRC field) and payload. */
  uint8_t    command;  /* QDP command code. Indicates type of payload data. */
  uint8_t    version;  /* QDP version. */
  uint16_t   length;   /* Packet length. */
  uint16_t   seq_nr;   /* Sequence number. Auto-incrementing number to spot missing or duplicate packets. */
  uint16_t   ack_nr;   /* Acknowledge number. To send ACK signals for previously received packets. */
};

Interesting here is, that QDP includes sequence and acknowledgement numbers. This allows sequences of QDP packets to be sent and received in a connection oriented manner. In a sense, the designers of QDP chose to re-implement a part of TCP over the connectionless UDP. I can only guess that their motivation was to allow both, connectionless and connection-oriented behavior, depending on whether simple command or status messages are exchanged, or more complex multi-packet data downloads are performed.

To keep this post to a reasonable length, I will stop here with my analysis of the Q330 communication protocol. Digging deeper into QDP would anyhow surpass my own knowledge of the matter. I would also risk to give away information from the Q330 documentation, which might not be intended for the general public. I hope, that going just so far might already help someone working with the Q330 documentation to get a jumpstart on understanding QDP. For everyone else, hopefully this was a neat example of how the different Internet model layers stack on top of each other. Writing it up certainly helped me to get a better grip on this matter.

Posted in ITE 221, Q330 | 2 Comments

Clearing a Q330’s Baler association

The Quanterra Q330 seismic data logger does not hold any internal recording medium aside from a small memory buffer. Instead of recording the data itself, it delegates this responsibility to a networked Data Processor (DP). This might be a general purpose computer running data logging software, or a specialized recording device. For stand-alone operation of a Q330, or for local ring-buffering of substantial amounts of data at a telemetered site, a minimal DP must be co-located with the Q330. For this purpose, Quanterra developed the “Data Baler”: a small box holding a low powered DOS based PC, which records any data received from an associated Q330 to its internal hard disk.

Balers, which have been pulled from the field, should always get “cleaned” before going back out. Proper “cleaning” does not only constitute erasure of any previously recorded data, but also clearing of the “Baler association”. Balers associate themselves to a specific Q330 when receiving their first chunk of data. Thereafter, they will only accept data from that particular Q330 until this association is cleared. Normally, clearing of the Baler association should be performed right after a data download has been successfully completed. That way, downloaded and cleaned Balers can be labeled as “clean” and kept ready to go back out to the field at any time. However, mistakes do happen, and occasionally a non-cleaned baler might show up as the only spare at a seismic station.

If a Q330 refuses to dump data to a freshly hooked up Baler, that Baler might be still associated to a different Q330. If this happens, and no other clean Baler can be found, the following procedure may be used to manually clear the Baler association and get the Q330 to record data to it:

  1. Push and hold the ATTN button for about 5 seconds, until the LED turns solid red. This does not seem to work, if the Baler is already running! Workaround: Shut down the Baler by pulling the plug to power it off; next press the button, and while keeping it pressed, plug the power back in.
  2. Release the ATTN button and wait for slow green blink (idle).
  3. Press ATTN again (this time only short) to initiate a Baler shutdown.
  4. Repeat the entire sequence of steps 1-3 one more time.
  5. Now boot up the Baler in regular mode (briefly push ATTN, or send Baler command).

The Q330 should now be dumping data to the Baler. (Confirm by watching the
packet buffer usage!) If things still do not work, the problem may be more fundamental than just a wrong Baler association. In that case, the only option might be to try another Baler.

Using a non-clean Baler in this way may create some serious data mess, since there
might be still leftover data from a different station on the Baler. Sorting this out after data download is not hard, but potentially time consuming. To warn whoever might be downloading the data months later, a big fat note about this incident should be placed in the station’s field notes.

Posted in Q330 | Comments Off on Clearing a Q330’s Baler association

SVG: An Image Description Language for the Web

In the early days of the Web, most browsers allowed exactly two formats for embedded images: The lossless Graphics Interchange Format (GIF), which was a good choice for logos, diagrams, and primitive animation loops, and the lossy compressed Joint Photographic Experts Group (JPEG) format, which is better suited for photos. Dissatisfaction with technical limitations and patent issues surrounding GIF led to the development of the replacement Portable Network Graphics (PNG) format, which offered more colors and partial transparency at a reasonable lossless compression rate. PNG is a great format, so it took “only” about a decade to become widely supported by all the different browsers. This gives us three standard image formats to choose from (standard in the sense of not requiring a plugin on any of the mainstream browsers), but all of them only support raster graphics.

What about vector graphics? Raster graphics are fine if viewed at a fixed resolution, but they suffer significant quality loss when resized, for example while preparing a high quality printout. Vector graphics are by design immune to resizing artifacts. They may also lead to smaller file sizes and offer additional functionality, such as complex animations and scripting capabilities. A common vector format that exemplifies these possibilities is Flash. But unfortunately, Flash is a proprietary format, relying on proprietary browser plugins and authoring tools, which may not be supported on all platforms. For example, at the time of writing, iOS systems still lack any Flash player, and it appears unlikely that this might change in the foreseeable future.

Does the Web need to rely on some proprietary format for vector graphics? No. There is actually a very well suited open format available: Already back in 2001, at the height of the XML hype, the W3C published the XML based Scalable Vector Graphics (SVG) format. An SVG image is composed of vector shapes (lines, curves, polygons, etc.) expressed as XML elements, which are drawn by the browser independent of the display’s resolution. Upon loading, the SVG elements become part of a Web page’s DOM. This means, that these elements and their corresponding vector shapes can be manipulated through JavaScript, or used as sources for JavaScript events, thus allowing the programming of arbitrarily complex user interactions. Simple animations can be described directly in the format, while more complex animations may be programmed though JavaScript. Possibly the best way to learn about SVG is to start experimenting with the guidance of an online tutorial. Two very good tutorial series can be found at carto:net and Learn SVG.

Until recently, there was one major problem with SVG: Similar to the development of PNG, we “just” had to wait for about a decade until all major browsers started providing decent, native SVG support. With Internet Explorer 9 finally jumping onto the bandwagon earlier this year, that wait seems to be finally over. With client software finally supporting SVG, are all hurdles to its adoption taken? Not quite yet: Server software still needs better out-of-the-box SVG support. For example, WordPress, the software used to power this blog, seems to do everything to block one from including SVG images inside a post. Not only is it hard to link to an uploaded SVG image without a special plugin, but by default WordPress will block the uploading of SVG files, considering SVG an unsafe file type. It appears, that this is not pure oversight, but a rational decision based on the fact, that SVG might be a little too powerful to be trusted. Being a very flexible XML based format, SVG allows the inclusion of JavaScript code. While this enables some of SVG’s unique features, it also opens the door to potential misuse: A site that allows users to post unfiltered SVG files may suffer the same cross-site scripting vulnerabilities as a site that allows unfiltered HTML uploads.

When I started to write this post, I planned to embed an SVG sample. But with the upload restrictions in WordPress, this turned out to be such a hassle, that for now I decided to leave this blog “image-less” for a little longer. Nevertheless, SVG is a promising, powerful Image Description Language for the Web. It has been around for quite a while now, but just recently gained decent out-of-the-box support by all of the latest mainstream browsers. Many Web sites already use SVG, but usually only in conjunction with some fallback mechanism, such as displaying a PNG image if SVG is not supported. Unfortunately, there are still some non-browser related stumbling blocks, such as limited support by server-side software, which are holding SVG back. Once these last problems are solved, I believe that SVG might have a very bright future. Some day it might even have become similarly widespread as the raster image formats GIF, JPEG, and PNG are today.

Before finishing this post, I should briefly mention one other promising new Web graphics technology: The new Canvas tag in HTML5 provides an empty raster based drawing surface, which can be easily manipulated though JavaScript drawing methods. I do not think, that the Canvas tag will be a direct competitor to SVG. The two technologies rather complement each other, just as raster and vector based image formats complement each other. Mihai Sucan has written a nice comparison between SVG and Canvas, which leads to this same conclusion. I could go on to write more about the Canvas tag, but that is a topic for another post.

Posted in ITE 221, Web | Comments Off on SVG: An Image Description Language for the Web

Learning about CPU Cache

Practically every programmer, and most other computer users as well, know that CPU cache is that small but incredibly fast memory right inside or next to the CPU, which keeps copies of frequently accessed data, so that the CPU must not reach out to the slower main memory that often. Cache is good – the more cache, the better! But how does CPU cache actually work? And what are its effects on different ways to program?

For many programmers, the CPU cache is magic: It somehow just works and makes our machines faster. We know that its effectiveness depends on algorithm and data structure choices, but we typically do not really understand the involved criteria. So we simply code along and hope for the best, maybe doing some trial-and-error optimization if our benchmark results look lousy. At least that is how I dealt with this problem so far. After reading a bit about the workings of CPU cache on Wikipedia, I decided to have a closer look into this and found several good resources on the Web.

For a programmer, who likes to do some experimenting, Igor Ostrovsky’s “Gallery of Processor Cache Effects” is possibly the most entertaining entry into this subject. Ostrovsky starts off with a simple experiment: Two loops are programmed to iterate over the same large array, and thereby perform some trivial manipulation (multiplication by 3) on the array elements. The only difference between the two loops is that in one case all array elements are manipulated, while in the other case only every 16th element is manipulated, while the others are simply skipped over. In theory, that second loop should run about 16 times faster, right? Well, on most modern machines (just confirmed this on my MacBook), the two loops use almost exactly the same amount of time. The reason of course is, that in both cases the same amount of memory is swapped in and out of the CPU cache – and memory access is the clear performance bottleneck for this problem.

Next, Ostrovsky modifies his example by varying the step size in the second loop and graphs the results to show a sharp performance drop-off beyond a step size of 16. This relates to an in-memory step size of 64 bytes (he’s looping over 32-bit integers) – exactly the length of a cache line in most x86 family processors. I knew about cache lines before, but never imagined that such a simple experiment could so clearly demonstrate their effect and even be used to accurately determine the cache line length. Further experiments in the article determine the sizes of the L1 and L2 caches, demonstrate the presence of instruction-level parallelism, and explore the concepts of cache associativity and false cache line sharing.

Did this wet your appetite to learn more about CPU cache effects? Well, I found another resource that dives much deeper into this subject. But be warned: this one is heavy stuff. Ulrich Drepper of Red Hat Inc. has written an excellent paper on “What Every Programmer Should Know About Memory”. It covers many memory related topics, but mostly focuses on CPU cache. With 114 pages I consider this paper more of a textbook. No, I have not yet read it all… But I will definitely keep that bookmark handy to answer all of my upcoming computer memory related questions. While checking this paper out, also take a look at the other papers and articles on Drepper’s home page – he has written some really useful stuff.

Posted in ITE 221 | Comments Off on Learning about CPU Cache

Is there a Future for Tape Backup?

As part of my job, I am sometimes confronted with backup issues as well as aspects of “data archeology”. For example, a while ago I had to restore some 10-year old data from our tape archive. Luckily, the tapes were stored in an electromagnetically shielded vault and were all still readable. That was, after spending days tracking down our last remaining compatible tape drives and figuring out how to get them to work again. I thereby learned quite a bit about the various tape formats used throughout the past two decades – and got reminded why I used to hate tapes in the first place. This got me wondering: What role do tapes still play today, and what are their prospects for the future?

StorageSearch.com is an interesting Web site to learn all sorts of fun-facts about the pros and cons of various storage technologies. The site describes itself as an “enterprise storage focused Web directory since 1998”. Today, most new articles on StorageSearch.com appear to solely focus on Solid State Storage (SSD). But in the past, a significant amount of attention was given to other storage technologies as well, especially hard disks, optical storage, and tapes. A big discussion in the early 2000s was around the pros and cons of then relatively new disk-to-disk backup versus tape. Steve Garner summarized and compared some different views on this subject in his StorageSearch.com article: “Disk to Disk Backup versus Tape – War or Truce?” Seven years later, a 2004 article may appear outdated, but in this case, I think that many major points are still valid. Garner argued, that “disk-to-disk backup looks like a new edition of Hierarchical Storage Management (HSM)”, offering fast backup and recovery to a relatively costly medium, which should be complemented with less frequent traditional tape based backup for longer term archival. He did however envision that disk-to-disk backup might be more than this, and that it offered the potential of replacing tape eventually.

All the advantages of tapes and removable discs versus hard disks seem to boil down to the fact that they are naturally removable media. Although hard disks can be mounted to be easily removable as well, one is still removing the entire disk drive, not just the storage medium. (Of course there were various technologies involving removable magnetic disks, such as floppies and ZIP drives, but they lacked the high areal density, speed, and reliability of hard disks, and nowadays are all obsolete.) The removability of the storage medium allows the medium to be relatively cheap (but the drive may be still expensive), easy to transport, and easy to store. On the other hand, transporting mechanically complex hard disks is considered very risky, and their long-term storage may involve other aging factors (capacitors, bearing lubricant) beyond magnetic decay and leakage. A tape that gets exposed to water, dust, or smoke may be still salvageable, a hard disk less likely so.

Traditionally, tapes had the advantage of being cheaper than disk arrays when large amounts of data needed to be stored. This was due to the relatively low price of tape cartridges compared to hard disks, which quickly offset the high initial price of the tape drives. However, falling disk prices versus nearly stagnant tape prices have been eroding this advantage.

In another StorageSearch.com article from 2003, John Woelbern from Sony Electronic’s Tape Storage Solutions asked, whether tape backup had a future. He argued that to remain competitive, tapes would need to reach capacities around 1TB per cartridge by 2006 and around 10TB per cartridge by 2011. He went on to advertise Sony’s then new SAIT system, predicting that it would be able to meet and exceed these demands within the given timeframes. Today we are in the year 2011, and SAIT has not surpassed 800 GB per cartridge, while its biggest competitor, LTO-5 has barely managed to reach 1.5 TB per cartridge (all capacities uncompressed). By comparison, hard disks have just reached 3 TB per drive, thereby clearly surpassing tape storage density.

I searched some online retailers and found the following prices: An LTO-5 cartridge costs about $60, an entry-level LTO-5 drive about $2500. A 3 TB hard disk costs about $280 (probably half of that within a few months). Assuming all other hardware and software components to cost roughly the same for tape and disk solutions, this indicates that tape technology may be still cheaper than disk technology, but only if storage requirements are greater than about 47 TB. Ergo, tape backup might still make sense for huge datacenters, but for many small and medium sized businesses, disk-to-disk backup has already become a significantly cheaper alternative.

I believe that tape backup will not fully disappear any time soon, but will continue to retract into a niche market: Large data centers will still view tape as a cost effective backup medium, especially if long-term archival is crucial. However, smaller businesses will move entirely to cheaper, faster, and easier to maintain disk-to-disk backup. They may continue to use tape backup indirectly though: as a backup of their backup’s cloud based backup. Outsourcing tape backup to the cloud gives the small guys access to better economies of scale – and lets someone else deal with the tapes.

Posted in ITE 221 | 2 Comments

CPU Design for Tinkerers

One of the biggest steps in the history of computing was the integration of an entire CPU on a single chip. The so born microprocessor made the microcomputer possible, which would eventually bring affordable personal computing to the masses. This technological advancement came with just one small downside: the new microprocessors were even more of a black box (often literally) than the older multi-component CPU boards. One consequence of this is that today the study of CPU design is usually purely abstract, with few opportunities for any hands-on exercises.

In 1994, Bradford J. Rodriguez published the design of an experimental “old-school” CPU board, intended for teaching fundamentals of CPU design. His paper “A Minimal TTL Processor for Architecture Exploration” discusses the basic design of the “Pathetic Instruction Set Computer” (PISC), plus a series of increasingly complex design extensions.

The most basic PISC processor can be built of 22 standard TTL chips plus some external RAM and ROM, which makes its construction a realistic student project. Its design is fully static, meaning that it can be operated at a variety of clock speeds and even single stepped to investigate its internal operations.

The PISC uses only 16 internal control signals. This makes it feasible to define a 16-bit wide instruction set, which is simply made up of the bit pattern that needs to be fed into the internal control lines to operate the processor. Hence, no instruction decoding is required, which further simplifies the design.

The PISC has a 16-bit wide ALU, which is constructed of four 4-bit wide 74181s operating on different bit groups in parallel. This ALU provides all the usual add, subtract, and logic functions, 32 different operations in total. Selecting a specific ALU operation requires 5 control lines, which are programmed using 5 bits of each instruction word. The inputs and output of the ALU are connected to a register file of eight 16-bit registers. Which registers are selected as inputs and output, is determined by three control lines each, adding up to 9 bits to the instruction word. Finally, in addition to these internal operations, which only involve the ALU and the register file, the PISC also has some special logic to enable loads and stores from and to memory, which are selected using the remaining two instruction bits. The program counter is implemented as one of the eight registers, which hence becomes a special purpose register. Its direct manipulation by the ALU or by a load instruction can be used to implement unconditional jumps.

The basic PISC processor includes most features of a regular CPU, but it also presents some critical deficiencies: among others, it lacks conditional branch instructions, a way to load literal values from program store, and each instruction requires two separate fetch and execute cycles. The second part of Rodriguez’ paper discusses different ways to overcome some of these deficiencies without adding significant complexity to the PISC design. Some of these additions are rather straight forward, and are recommended as exercises for students, while others lead directly into advanced topics of CPU design, such as pipelining, or the RISC versus CISC discussion.

I just stumbled upon Rodriguez’ paper by chance, but read it with great fascination. Having some minimal background in electronic engineering, the PISC design served me as an eye-opener to better understand the interactions between ALU, registers, and all the rest that acts together as the CPU control unit. I think, that actually building a PISC and trying to optimize it might be a fun exercise. Unfortunately, it would be also rather time consuming… and time is something in rather short supply to me these days. But anyhow, it was a very enjoyable and informative read! Admitted, it’s a bit geeky… but if that’s the way you feel, have a look at it.

Posted in ITE 221 | Comments Off on CPU Design for Tinkerers

On Unicode

Unicode, particularly Unicode in UTF-8 encoding, is the most widely used character encoding today. It has solved one of the most annoying compatibility problems of the 1980s. Everyone is using it. Yet its details are still often misunderstood.

Back in 2003 Joel Spolsky published an excellent article on his site “Joel on Software”, covering “The Absolute Minimum Every Software Developer Absolutely, Positively Must Know About Unicode and Character Sets (No Excuses!)”. This post is an attempt to summarize even further some of the highlights of that article. Unfortunately this means, that I will not even cover the absolute minimum here…

Joel starts out in his article by giving a historical perspective: he writes about the origins of the ASCII code, and how it led to various ASCII extensions for use with different languages. These extensions, called code pages, define the upper 128 possible characters in a byte that are left undefined by ASCII. Of course, this led to tremendous complications, whenever the code page of a document was not known. Unicode was devised as a solution to the code page problem. It defines a single standardized character set not bound by the 256 characters limit, which could grow to eventually include all characters of all known languages.

One of the most common misconceptions about Unicode is that it simply defines a 16-bit code reserving 2 bytes per character, and therefore allowing 65,536 different characters instead of 256. Actually, things are a bit more complicated than this: In Unicode, characters do not map directly to some strictly defined bit pattern to be stored in memory. Instead, characters map to something called a code point.

Not surprisingly, Latin letters map to code points, which are identical to their ASCII codes. Code points are written in a special notation; for example, the code point of the letter ‘A’ is usually written U+0041. The ‘U’ indicates, that this is Unicode, and the four digits are in hexadecimal. The common convention to use 4 digits is a hint, that Unicode was indeed originally envisioned to be a 16-bit coding scheme, but today this is not a limit anymore.

How are code points represented in memory? Restricting oneself to 16-bit code points, one can simply write out 16 bits per character in sequence. This is called UTF-16 encoding. So “Hello” may become the byte sequence 00 48 00 65 00 6C 00 6C 00 6F. That is, if we interpret 16-bit integers in big-endian byte order! On little-endian machines, the same string might be stored as 48 00 65 00 6C 00 6C 00 6F 00. As a result, even though characters map uniquely to code points and we have restricted ourselves to UTF-16, we still have ended up with two different actual encodings.

As long one stays on a big or little-endian machine, the different UTF-16 encodings cause no problem. But if one transfers Unicode text from one machine to another, the receiver must somehow be informed about the used byte order. For this purpose, people introduced the special code point U+FEFF, which is called the Unicode Byte Order Mark (BOM). This code point does not represent any printable character, but is supposed to be placed at the very beginning of each UTF-16 document for byte order identification. Depending on whether the document starts with FEFF or FFFE, a program can tell whether it was stored in big or little-endian byte order.

The complication with the two possible byte orders dampened the initial enthusiasm about Unicode. Another perceived problem was the presence of plenty of zero bytes, which wasted memory and prevented the zero byte from being used as an end-of-string mark. As a result, years passed with little Unicode adoption. Finally, a new Unicode encoding called UTF-8 was invented. UTF-8 uses only one byte for code points up to 127, and more bytes (up to six) for higher code points. UTF-8 is very compact and familiar for English language users, since English documents in UTF-8 look identical to their ASCII representation. Also, zero bytes are reserved for use as end-of-string markers again, allowing much software written for ASCII to handle Unicode with little or no change.

Most file formats nowadays use UTF-8. The only disadvantage of UTF-8 is, that characters have variable length, making random access within strings complicated. Therefore, many programs still use UTF-16 internally, even though they import and export data in UTF-8. Besides of UTF-16 and UTF-8, other encoding conventions, such as UTF-7 and UTF-32, have been defined. However, these less common encodings are rarely seen outside of a few niche applications.

After covering the fundamentals of Unicode in much more detail than given here, Joel goes on in his article to explain how to apply this knowledge to HTML, specifically the “http-equiv” meta-tag. I am not going to repeat any of this, since I could not put it in better words anyway. If you do not consider yourself a Unicode guru, and have not yet read Joel’s article, do it now! After all, as Joel states in his title: “It’s The Absolute Minimum… (No Excuses!)”

Posted in ITE 221 | Comments Off on On Unicode

Truly Old-School Computing

My very first computing experience was tinkering around with a friend’s Sinclair ZX81. That was around 1985, I believe. In a moment of nostalgia, I recently tried out a ZX81 emulator found on the Web: it made me re-live some sweet childhood memories, while trying to remember those old BASIC commands. However, it did not take long, before the severe limitations of that primitive computer became apparent. The ZX81 was already considered obsolete back when I first got to know it. The mid-1980s were the era of such home computers like the Commodore 64, which allowed playing arcade style games, something unthinkable on the ZX81. All this got me thinking: If the ZX81 was that primitive, how primitive were its predecessors, such as the minicomputers of the 1970s?

I remembered an older colleague telling me stories from when he learned to program assembler on a DEC PDP-11. Of course I had heard about that legendary machine before, but just now realized how little I actually knew about it. Time to consult Google and Wikipedia… One thing that struck me while browsing was, that on most pictures of the PDP-11 it seemed to be a tall cabinet with just a bunch of switches on its front. I knew, that there was usually some sort of terminal attached to it, but from the pictures it appeared as if that was just an optional component. Could this machine be programmed through those switches alone? How would that have been actually done?

Luckily, DePauw University has made available a series of YouTube videos, which address exactly these questions: Programming the PDP-11

The humorous background storyline of these videos might not be everyone’s taste, but I found it quite amusing: A student frustrated from debugging one of her programs is invited by her professor to try out the department’s new “holodeck” to time travel into the 1970s and experience how hard programming used to be back then. On their trip they encounter a hacker who is bragging about his newest toy, a PDP-11/10 with an incredible memory of 32 kilobytes. He agrees to give the time travelling student a quick programming lesson, which starts with entering a machine language program by flipping the switches on the front of the machine. In later videos the lesson progresses through more advanced programming techniques using paper tapes, an editor, and a two-pass assembler.

I greatly enjoyed this set of instructional videos, and recommend them to anyone interested in computing history. To me they finally answered my questions about the mysterious switches, and once again reminded me of how far we have come in just three decades. Also, that old ZX81 does not seem all that primitive anymore.

Posted in ITE 221 | Comments Off on Truly Old-School Computing