The XD2031 architecture explained

Posted by André on 2014-08-02.

Recently I have written about refactoring a 20 year old code base, namely of the file system server code for my old Commodore computers (see here: Refactoring-20-year-old-code... ). Here is a description of the internal structure of that code. I do this description in the form of an "architecture overview", a form that I think is missing for many a software project.

It is in the middle between the outline of a project (that really only is boxes with names), and a component diagram, which is even more detailled, esp. with object-oriented software. In my (not so humble) opinion the architecture overview gives the real frame of how the software should look like, i.e. which modules are there, how the modules and their components interact at a high level. The architecture overview however is often missed because projects are hushed to start, or have just grown into what they are over time, or are the brain child of a single person who has it in his or her head. This software in part used to be the latter case, so I find it fit to write an architecture overview. Even for such a project it is good to have a write-up on the base principles and structure of the software - if only just to revisit it from time to time to see how it evolves. I'm not doing the overview in the full blown, formalized form I do at work, but something that should be "right" for such a free time project.

The system I'm going to describe is the "XD2031" firmware and server code, that emulates a disk drive for Commodore computers. You can find it at https://github.com/fachat/XD2031 . The system uses an AVR-based device (of which multiple types are supported) for the communication with the Commodore, as it requires fiddling around with hardware signal lines. It also has hard timing requirements (which are managed using assembly language interrupt routines) that are not available on the average PC with its multitasking operating systems. The AVR-based device then talks to the PC via a serial line connections (usually over USB).

Please note that it is not a 6502-based emulation. The Commodore disk drives were computers in their own right, with their own single (or even two for older drives) 6502 processors. There are 6502 emulators for the AVR out there, and while it could probably be possible to emulate a real Commodore drive using a 6502 emulation, this is not the way used. Instead the AVR receives the Commodore disk commands from the IEEE488 or IEC busses similar to the Commodore drives, but handles them natively in AVR (C-based) code.

I.e. the Commodore computer (e.g. a C64 via serial IEC bus or a PET via IEEE488 bus) send commands like "OPEN file 'ABC' on drive 0 for channel 0 for reading" or "COPY over file 'ABC' on drive 0 to 'EFG' on drive 1" to the drive, or send or receive file data on the opened channels. Note that Commodore drives know "drives", because the original Commodore disk drives actually were dual drives, with two drives for a single unit controller (similar to the PC's later A: and B: drives). The idea of the software discussed here is that it interprets these commands, and provides access to files on the PC file system, or file on disk images, or even on the internet. The following diagram shows the runtime architecture.

Both main parts of the architecture, the firmware and the server, are built in a modular way, so they can easily be extended. In the following I will discuss the architecture of each of the parts.

Firmware Architecture

The main principle of the firmware architecture is separation of concerns. Also implementation hiding is applied where useful, e.g. by separating out the provider implementation from the actual API, so you can replace one provider with another.

In the diagram orange boxes are already implemented, blue ones with dashed lines are possible, if not already planned. The arrows show the calling direction, so in the main loop the main module calls ieee, and that calls bus, and so on.

The layering separates out the different functionalities, from left to right:

Hardware interface: the IEEE488 and the IEC interfaces handle the hardware lines. These are separated out into device-specific directories, so you can have different layouts on different devices. But this code only does the low level handling - the actual IEEE488 and IEC protocol handling is done by the general IEEE and IEC modules, so there is no duplicate code in the device subdirectories. The "sock488" interface is a special case as it is only for the test driver.

The IEEE488 and IEC interfaces call the bus layer. This receives the transferred bytes from the IEEE/IEC layer, and provides the bytes to be sent to the Commodore computer. It handles the actual IEEE488 protocol with LISTEN, TALK and other commands. It collects the file names and commands, and executes them on the selected provider. File data is sent via the channel layer to the listening provider, or received from there.

Providers are selected by the bus by using either the drive number, which has a provider assigned to it, or a temporary provider in place of the drive number, like the "tcp" provider in "tcp:localhost/telnet". I.e. you can assign a provider to a drive number - which can be 0-9 here - with the ASSIGN disk command. Providers can be either internal (like SD card) or external (if no internal is found), which defaults to the serial line packet communication to the PC server. The providers themselves basically "speak" the packet communication format as seen on the serial line to the server. I.e. they get read or write requests, as well as open, close, or command packets and have to answer them appropriately.

The direct and relfile modules are a bit special in this respect. Both are actually implemented as own providers. The direct provider is automatically taken when a direct file "#" is opened. A direct file on a Commodore disk is used to directly access blocks on the disk, not files. Opening a direct file reserves a buffer to handle the requests. Read and write requests on that channel then go into the respective buffer. When a buffer command is then sent to the direct module (from the cmd module, this arrow is not shown), the buffer is read from or written to disk for example. As the buffer commands contain the drive number, its provider is determined and the request sent appropriately. So you could, for example, copy a block from an image on a D64 on the server to an image on a D82 on the SD card (once disk image functionality on SD cards is implemented...) by reading the block into the buffer from one drive and writing it from the buffer to another drive.

The relfile module again is special. A REL file in Commodore terms is a record-oriented file, and the only one where you can actually "seek" on a real Commodore drive. The relfile provider is used only when the actual file provider returns the special return code that a REL file has been opened on a file_open. This is necessary, as the Commodore drives could open REL files automatically without the code actually telling the drive to open a REL file (what a bummer!). The drive derives the file type from the metadata on disk. As we don't have this data - because the disk image might reside on the server, or rel files are supported without a disk image, like in a R00 file - we have to rely on the provider to tell us when a REL file has been detected. Then the relfile provider is inserted as proxy provider before the actual provider. The relfile code also reserves a buffer to handle the relative file records, and then only reads and writes full records from or to the actual provider.

The firmware execution is pseudo-asynchronous. The main() loop just calls the different busses (ieee, iec and sock488) as available. These modules check their I/O lines, and forward commands and file operations. These then in the end may send packets to the serial line. The loop, however, is not done at this level. When a packet is sent, a callback is registered and then execution usually commences. So for single-direction files (like read-only or write-only OPENs) two buffers are used alternatively, one being filled by the bus, the other being sent and waiting for a reply. Only some of the commands or open calls require immediate feedback and wait.

Server Architecture

The server architecture is, in a way, simpler than the firmware architecture. What it basically does is distribute the commands it receives from the devices to the providers and file handlers. To achieve this, endpoints are registered as drives, i.e. a provider type (like filesystem) with appropriate parameters (like path to the directory to serve). Once a file is opened on a drive, it is represented by a file structure that is registered for the channel given in the open call.

Again a main principle is separation of concerns, and implementation hiding. So once an endpoint or file is identified by the main dispatcher, it simply calls the relevant function on that structure, without knowing what implementation is actually behind it.

In the initialisation phase the serial and socket modules handle the initialisation of their respective devices. Later the devices are only used via their file descriptor. fscmd is the main command dispatcher. It receives wire packets and handles them. I.e. it identifies endpoints (for commands) or files (for data transfer), and calls the appropriate functions on them. The channel module is in fact just a registry for open files by channel number.

The resolver is an important part. It resolves paths and file names into files. For this it recursively goes through the directory levels given in the path, looking for matching handlers and providers. In this stage, each level is looked at to see if it is not an x00 file (P00, R00, etc) that has the real file name in it, or whether it is a disk image that can be used as a directory. Here the providers and handlers are stacked, so that you can use a P00 file in D64 disk image within a P00 file within a D82 disk image on a file system directory for example!

The provider is a registry for the providers and also handles the charset conversion for the providers (so that disk image providers get their Commodore file names without a single back-and-forth character set conversion...). The handler module is a registry for the file type handlers.

The fs, di, tcp, curl modules are the current set of providers. The disk image provider (di) is special, in that it wraps a disk image file from any of the other providers into an endpoint - so it can handle disk images on the file system, or within x00 files for example. The x00 module is the currently available handler (it is for the Commodore-specific Pxx, Rxx, ... files that have DOS-compatible short names and the real name is within the file).

Future Outlook

The refactoring on the server side that has brought the recursive resolver has brought enormous opportunities. It can now be possible to load and use a disk image file directly from an internet provider - if a caching handler is written to allow seeks on such files. Also a zip-file handler is now easily possible, or one that handles my linux files with their ",P" extension to denote the file type (my own P00 replacement back then).

Other extensions could be the emulation of some more common memory-read or memory-write commands that allowed manipulating the Commodore disk drive's internal memory. This was used to for example change the unit numbers.

Some of the code already is shared between the server and the firmware, like the character conversion. In the future this could be extended to the name parser (so you can use the server console to enter commands), or even more imporant, to the disk image handling code, so that it can be used in both the firmware and the server.

For a further list of current issues and planned enhancements see the github tracker https://github.com/fachat/XD2031/issues.