Directory records

Primary records

Primary directory records define the beginning of a directory entry, and may be followed by secondary records that add more information to the same entry.

Boot Record

On-disk data structure:

  • 32-bits = Entry type (always 1)
  • 32-bits = Magic ("CLFS")
  • 32-bits = Bytes per cluster
  • 64-bits = Total Number of clusters
  • 64-bits = First cluster of root directory
  • 36-bytes = Volume name.

Description:

The first sector of the disk must contain a boot record. Since the boot record is small, the first sector can also be used for the root directory. The boot record was added as a directory record so that it can be included within the root directory (or any other directory for that matter). If the First cluster of the root directory is not zero, then the boot record is not part of the root directory, which has to be read from the specified location.

The volume name has room for 36 bytes, which can normally represent 36 characters (normally because UTF-8 may use more than one byte per character). Should a longer name be given to a volume, the boot record may be followed by Name records containing additional characters (see secondary records for a description of the Name record).

File Record 

On-disk data structure:

  • 32-bits = Entry type (always 2)
  • 32-bits = File Attributes
  • 64-bits = File Size
  • 64-bits = First Cluster
  • 32-bits = Number of contiguous clusters
  • 32-bits = Creation Date and Time
  • 32-bits = Modified Date and Time
  • 32-bits = Extended Create/Modified time (LoWord=Modified, HiWord=Created)
  • 32-bits = Name CRC32
  • 32-bits = Case Folded CRC32
  • 16 bytes = File Name in UTF-8 encoding

The File record contains information regarding a file or subdirectory. The meaning of each field is as follows:

  • File Attributes are 32 bits (bit 0 being the least significant bit), and its use is defined as follows:
    • Bit 0: Directory - Set when the file in question is a directory.
    • Bit 1: Hidden - Set when the file is not supposed to show on normal directory listing
    • Bit 2: Immutable - When this bit is set, any attempt to write to the file should fail, including renaming and deletion.
    • Bit 3: Case-Insensitive - When set, the name of this file is not case sensitive.
    • Bit 4: Dir-Case-Insensitive - When set, all new files in the directory will be case-insensitive by default.
    • Bits 3-31: Reserved for future use
  • File Size is a signed 64-bit number indicating the size in bytes. Files having a size of zero bytes do not have any clusters associated with them, and therefore the Number of Clusters value should also be zero. If the Directory attribute is set, the File Size is meaningless and should be set to zero (unless this is the special entry '.', as explained below). Negative values are reserved for future use and should not be used. The maximum file size is (2^63)-1 bytes.
  • First Cluster and Number of clusters specify the first fragment of the file (also called an extent). If the file is not fragmented, then this pair of values completely define the physical location of the file in the disk. When the file is fragmented, this is the first fragment of clusters, and additional fragments should be included in a Fragment secondary record. An empty file has Number of Clusters equal to zero. In total, there has to be enough clusters in all the fragments to contain 'File Size' bytes. Cluster numbers are always represented with signed 64-bit numbers. Negative values are reserved for future use. The Number of Clusters is a signed 32-bit number.
  • Creation Date and Time and Modified Date and Time are self-explanatory. The internal format stores dates and time as a signed 32-bit value with the number of seconds elapsed since January 1, 1970, not counting leap seconds. This is the exact same definition of time as returned by the Posix time() function. The Extended date and time use additional 16 bits to store the fraction of a second where a high resolution clock is available. Implementations without a high resolution clock can set the extended time to zero. There are two groups of 16 bits, one for the Creation date and one for the Modified date, stored in the high and low words respectively of the 32-bit word.
  • The CRC32 of the file name follows. It is computed from the entire file name excluding the null terminator. Individual bytes in UTF-8 encoding must be used in sequence to compute the CRC32. Unicode characters must be encoded in UTF-8 before running them through the CRC32 algorithm.
  • The CRC32 of the file name after applying proper UTF-8 case folding. It is computed also excluding the null terminator.
  • The last 16 bytes in the record are the null-terminated file name string, encoded in UTF-8. Should a name require more than 16 bytes, additional Name secondary records should follow until a null terminator is found or the last secondary record is found. If a name fits exactly in a record but there's no room for the null terminator, there's no need for an additional Name record. The implementation must handle the special case where the string is not null-terminated, but ends exactly at the end of a record. After the null-terminator, if there's any space left in the record, it must be zero-padded.

 

A File Record may be followed by various secondary records. The most common secondary records are Name and Fragment as described above.

The use of the File Record is slightly different when the file is a directory. By design, directories must be self-contained, and therefore they must contain a File Record with the name '.' (single period, which has historically represented the directory itself), in the first cluster of the directory. The '.' File Record contains the entire information about a directory. However, there has to be a File Record in the parent directory that links to this directory. The File Record in the parent dir contains essentially a copy of the same information except:

  • The Name field is set to the name the user gave the directory (rather than '.'), and there might be additional secondary Name records as needed.
  • The File Length must be set to Zero, while the '.' entry has the real directory size in bytes.
  • The Number of Clusters value is meaningless and should be ignored by the implementation. No Fragment secondary records are needed, since all that information belongs only in the '.' entry. The First Cluster, however, must correctly point to the first cluster of the directory.

 Secondary Records

Secondary records are additional data about a primary record. They are used when the primary record has insufficient space to store all the information, for example for long names or to describe the clusters occupied for heavily fragmented files. Secondary records cannot exist by themselves, only as part of a primary record, and their location within the directory list is immediately following the primary record. The implementation shall assume that after a primary record, all secondary records belong to that entry until another primary record is found.

On-disk data structure:

  • 32-bits = Record Type type (always > 255)
  • 60 bytes = Payload data, depending on the Record Type

 Secondary records are defined by the 32-bit Record Type value. Any value > 255 indicates a secondary record (just as any value < 256 implies a primary record).

The presence of any secondary record at the beginning of a directory (before the first primary record) is invalid and should be ignored by the implementation, but at the same time should be preserved. This allows traditional boot sectors with code in the first few bytes (in x86, typically the BIOS loads the boot sector and jumps to the first byte which is itself a JMP instruction that executes other parts of the code), which might be present if an OS is trying to boot on a CleanFS drive on an x86 platform. However, the code in the boot sector must be crafted so that it doesn't create any primary records. This means every 64 bytes the expected Record Type value must be > 255, which is normally the case but this must be verified. This is only if the specific platform implementation wants to have the root directory in the boot sector. For this specification, it suffices to say that the implementation must allow for secondary entries at the beginning of a directory and must ignore them but not remove them.

Secondary records might be used in sequence, so implementation must be careful not to alter the sequence of secondary records of the same type. If secondary records must be reorganized, at least the order of same-type records must be preserved. 

Empty Record

On-disk data structure:

  • 32-bits = Record Type type (always = 256)
  • 60 bytes = Unused payload.

 

The empty record serves no purpose and must be ignored by the implementation. Empty records may occur for example when a directory entry is modified, and the new information requires less records than before. Rather than moving all entries to "shrink" the directory, the extra records are marked as Empty by changing their type. Empty records are available for the OS to reuse.

  

Name Records 

On-disk data structure:

  • 32-bits = Record Type type (always = 257)
  • 60 bytes = Name string in UTF-8 encoding.

Name records are present to continue the Name of a file. There must be enough Name records to include the entire string. Name records must be stored in the directory in the correct sequence to produce the complete name string. For example, for a name that requires 100 bytes, including the null terminator, a total of 2 secondary Name records must be used, and the Directory entry of a file will be stored as follows:

  • File Record: The first 16 bytes of the string
  • First Name Record: The next 60 bytes (total 76 bytes so far)
  • Second Name Record: The last 24 bytes.

After the null terminator, all remaining bytes must be filled with zeros. This is done to prevent arbitrary memory garbage to be stored in a directory entry, for security reasons.

Storing a string that fits exactly within the records but there's no space left for the null character is allowed, and no additional records are needed. The implementation must handle the case where a string is terminated by the lack of additional name records rather than by a null terminator.

 Fragment Record  

On-disk data structure:

  • 32-bits = Record Type type (always = 258)
  • 60 bytes = 5 times the following structure:
    • 64-bits = First cluster in fragment.
    • 32-bits = Number of clusters in fragment.

 

The fragment record can store up to 5 fragments of a file. A fragment is defined as a contiguous group of clusters. The implementation must always try to minimize the number of fragments by combining adjacent fragments into one where feasible.

The Number of Clusters in each fragment is a 32-bit signed number, therefore if a file contains more than 2^31-1 clusters, it must use two fragments to represent the file, even if the clusters are contiguous.

Unused entries must be filled with zeros. 

A file can contain as many fragment records as needed to represent the entire chain of clusters.