This is the first of a two articles on adding native encryption to the Unix ‘dump’ application. We begin with an overview of the Fast File System (FFS) and the format of DUMP tapes/files.
Fast File System
Most non-journaled Linux filesystems, including ext2, are ultimately derived from the Unix Fast File System (FFS). Some journaled fileystems, e.g., ext3, are also ultimately derived from the FFS. This format strongly drives the format of DUMP tapes/files. The FFS dates to the late 70s, a time when memory was extremely constrained and a large hard disk might be in the single megabyte range.
The most important thing about the the FFS is that it separates the disk into inodes and data blocks. The disk space required for inodes is allocated during formatting and cannot be modified. In the past files were typically fairly small and it was possible to run out of inodes before you filled a disk but today files are so large that many people recommend reducing the amount of space allocated to inodes since it can represent a substantial loss of usable memory.
An contains all of the standard metadata about a file (owner, permissions, etc.) and a list of the data blocks indexes that contain the contents of the file. The inode does not contain the name of the file.
The structure of the data block list is highly optimized for small files. If more than 14 (iirc) data blocks are required then the final two slots contain the index of data blocks that contain the first- and second-tier indirect tables. (I’m not 100% certain that all filesystems store the indirect tables in the data blocks – some or all may store them in the inode space.)
The FFS does not require that all data blocks be allocated – it allows there to be ‘holes’ in a file. These are called sparse files. This might seem useless but it can greatly simplify some tasks. In fact the SQLite3 library used by, e.g., firefox, will create sparse files.
In these files a read operation simply returns a buffer containing null values. A write operation will trigger allocation of the necessary space. This is important to keep in mind since a sparse file can be far larger than the size of the media. The backup format needs to keep this mind or else you may have a file that can’t be backed up or restored because of its size.
The data blocks contain directories and files. Directories are standard files that contain a simple list of filename and inode numbers. The first two entries in a directory are always ‘.’ (the directory itself) and ‘..’ (the parent directory). An inode can be referenced by more than one directory entry, each reference is called a ‘hard link’. (The inode itself may contain a reference to a directory entry. In this case the reference is called a ‘soft link’.)
The original FFS used a simple unordered list of data blocks and performance could be seriously degraded if there were more than a hundred entries. More recent designs added optimizations such as keeping the directory entries sorted or even creating a tree structure.
Files are… files. Arbitrary data.
More recent designs also store extended attributes in data blocks. These are arbitrary attributes associated with an inode, e.g., SELinux labels.
Limitations of TAR and ZIP formats
There are two major limitations of the TAR and ZIP formats with FFS files. The first limitation is that most implementations of TAR do not intelligently handle sparse files. This means that sparse files are blown up to their full size in the archives and subsequently extracted at their full size. A real-world example of this is creating a 20 GB sparse file, mounting it via a loop back device and then installing a Linux distribution on the virtual device. The virtual system will see itself on a 20 GB disk but the actual disk space required may be only 3-4 GB.
The second limitation is that TAR does not support extended attributes. I think ZIP does have support for extended attributes via a standard extension but many implementations will not implement it.
The DUMP format is a streaming format based on a low-level understanding of the FFS format. It has four segments:
- CLRI (start of volume marker) bitmap)
- BITS (bitmap)
- list of INODE (inode information) and ADDR (file data)
- END (end of volume marker)
(In addition there’s a TAPE segment discussed below.)
The CLRI and BITS bitmaps contain information used by incremental backups. I will not discuss them.
The next segment contains a sequence of INODE and ADDR segments. The INODE segment contains the file attributes and a bitmap indicating any ‘holes’ in the file. This means that an INODE segments may require more than a single block. If the inode is associated with data the INODE segment is followed by one or more ADDR segments that contain the contents of the file, extended attributes, etc. Like TAR the data blocks must contain the entire disk block and are neither compressed or encrypted. (ZIP allows the file contents to be compressed and/or encrypted although the record header and footer must not be.) There can be more than one ADDR segment per inode, e.g., if there are extended attributes.
The DUMP utility has an optimization that all directory entries are written to the tape before the other regular (and special) files. This can be used, together with an external index file, to seek directly to any desired file.
The END record indicates the end of the volume. For historic reasons involving the limitations of tape drives there will typically be multiple END records. There is no ‘end of archive’ marker.
Magnetic tapes were the only backup media used in the early days of Unix. (Earlier backup media included punch cards and paper tape!). The earliest media were reel-to-reel tapes, with self-contained cartridges introduced later.
Tape media has a number of physical limitations – it can be stretched or broken, different drives may be calibrated slightly differently, etc. Sequential access, e.g., for backups, is straightforward but in actual use it’s often necessary to seek to a specific location on the tape, read it, do some processing, and then overwrite the tape. There’s always positioning errors. Finally even if you’re careful a spot defect in the media during manufacturing may cause single-bit errors.
To address all these problems tapes are written in small segments with a bit of empty space, a header, a payload of ca. 10 blocks, then some more empty space. This is a practical tradeoff between performance (you want larger tape segments) and reliability (you can recover data starting at the next tape segment). This is handled by a mixture of software and hardware.
The DUMP format discussed above is modified so that a stream is tokenized in the same manner. That is, the data stream is broken into a number of tape segments consisting of a TAPE segment header followed by a ‘blocksize’ number of data blocks. Historic archives may have a small blocksize but new files will routinely have a blocksize of 32k blocks or even higher.
The TAPE segment contains the dump date, hostname, devname (e.g., “/dev/sda3″), filesys (e.g., “/home”), dump level (0-9), human readable label, and misc. other values.
It cannot be overemphasized that breaking the archive into tape segments is entirely blind to the contents of that archive. It is nothing like the per-file headers and footers in other archive formats. In practice everyone writes their software to use the format specified above and then wrap it in a shell that handles nothing but the tape segments.
Tape compression was originally supported at the hardware level – the tape drive would write an uncompressed header followed by the compressed data. This is blind compression and makes no effort to understand what the data actually contains. This is very different from ZIP compression where each file is compressed separately and in total.