Details: Published: 10 December 2012

Procedures

Mounting a volume

To mount a volume, the implementation must follow the sequence:

Read the first sector.
Scan the sector for a Boot Record
When found, Get the Cluster Size, Total Number of Clusters and the first cluster of the root directory.
Read the root directory as described in the section below.
If write access is required, open the free space bitmap (see section below) if it exists.
Check the 'Dirty Bit' status in the bitmap (Dirty Bit is the bit that corresponds to Cluster 0, which is always used).
If the Dirty Bit is set, then the volume wasn't unmounted properly and the free space bitmap needs to be rebuilt.

Opening a directory

Since directories are self-contained, the implementation must make no assumptions about a directory that is being opened. Only the First Cluster number is known from the File Record in the parent directory, or from the Boot Record (when the directory being open is the root dir).

The correct procedure to open a directory is:

Read first cluster.
Start reading records until a primary File Record is Found.
Verify that the name is exactly equal to '.' (a period followed by the null terminator).
Get First fragment's number of clusters.
Read secondary records, get additional fragments from Fragment Records.

Once all fragments are known, the directory is ready to be used. Please note that while this specification requires that the '.' File Record be in the first cluster of the directory, its secondary records might span multiple clusters. The implementation has to be able to read the next cluster from the newly discovered fragments as they are read and interpreted, rather than read the entire entry (primary + secondary records).

In general, the implementation must ensure that the directory is readable, which is easily achievable by adding the '.' entry at the beginning of the directory.

An example of an unreadable directory would be a directory with two fragments of 1 cluster each, in which:

First Cluster: The last record contains the '.' File Record, which has a 1-cluster fragment, so there's no information yet to read the second cluster.
Second Cluster: Contains the secondary record with the second fragment information.

While reading the directory above, there is no way for the implementation to know where the second cluster is, since the Fragment Record is not in the first cluster. It is the implementation responsibility to avoid this case. Normally, it suffices to store the '.' entry as the first entry in the directory.

File Names

File names are arbitrary UTF-8 strings terminated with a null character, and as such, the implementation must verify when creating a file or renaming it, that:

The string is a valid UTF-8 stream.
The string does not contain the forward slash character '/' which is the separator used in paths.
The string does not contain Unicode control characters, as follows:

0000 - 001F COMMON ASCII CONTROL CHARS (INCLUDING NEWLINE)
007F ANOTHER ASCII CONTROL CHAR
0080 - 009F ISO 6429 CONTROL CODES (C1 CONTROL CODES)
2028 LINE SEPARATOR
2029 PARAGRAPH SEPARATOR
FEFF BYTE ORDER MARKER
FFF0 - FFFF VARIOUS SPECIAL CHARACTERS, INCLUDING REVERSE BYTE ORDER MARKER
FDD0 - FDEF UNICODE NON-CHARACTERS
nnFEFF SPECIAL CHARACTER IN ALL PLANES
nnFFFF SPECIAL CHARACTER IN ALL PLANES

There is no hard limit on the Name length, and implementations should avoid setting a hard limit. However, in embedded implementations this may not be possible and a hard limit might be defined by the implementation, but the implementation should be capable of handling the case where a name longer than the hard limit was stored in the file system by other implementation that doesn't have such limits. Handling in the previous sentence is intended as: not crashing, and not modifying the existing on-disk data by truncating the names.

The implementation can optionally provide a replacement for the forward slash with the Unicode character "Division Slash" U+2215 which looks similar and is allowed, but such replacement and possibly an escaping mechanism to determine whether a slash is part of the name or a separator is implementation defined and not part of this specification.

The particular implementation may optionally define additional special characters, for example an implementation looking to simulate exactly FAT32:

Optionally, for Windows compatibility the backwards slash '\' can be defined as a separator too.
Optionally, for Windows compatibility, the following can be defined as invalid characters ':' '*' '<' '>' '"' '?' '|'.

An implementation with additional special characters should be designed to handle the presence of these characters within a name that was created on a different device by a "standard" implementation. It is the responsibility of the implementation to properly handle the presence of these characters.

Case Sensitivity - Comparing names

The CleanFS file system is both case sensitive and insensitive. An attribute of the file determines whether the name is case-sensitive or not. The attribute in question is the Case-Insensitive bit, which when set indicates that the name comparison must be done after the name is case-folded into a lowercase string.

When creating a new file, the file system uses the case sensitivity as defined in the directory where the file will be created. Another attribute bit is reserved for that purpose, and is meaningful only when the Directory bit it set. The bit is the Dir-Case-Insensitive attribute, and when set on a directory, causes all files created within that directory to be case-insensitive. In other words, any file created within the directory will initially have the Case-Insensitive bit identical to the Dir-Case-Insensitive bit of its directory.

When creating a file, the following logic must be adhered to:

The new file Case-Insensitive bit is taken from the directory Dir-Case-Insensitive bit
For each existing file in the directory, get the Case-Insensitive bit
If either the new file or the existing one have the bit set, then the name comparison must be case-folded first (a case-insensitive comparison). Otherwise (if both bits are clear), perform a case-sensitive comparison.
If the name comparison is a match, then the file already exists and a new one cannot be created.

Case sensitivity on any file can be changed at any time, by changing the Case-Insensitive attribute. The implementation must guarantee that when attempting to set the bit, a full directory scan must be completed similar to the one described above for creating a file. This is in order to verify that the name is not in conflict with another existing file. Clearing the bit can be done without checking for conflicts.

For case-insensitive comparison, both strings must be properly case-folded per Unicode standards.

It is always recommended to keep the Case-Insensitive bit clear by default, since case-insensitive comparisons are more resource intensive, and only activate the bit in directories where compatibility with case-insensitive systems is required (for example in a shared folder that might be accessed by Windows clients).

The actual name comparison can be done also with the help of the optional CRC32 values stored in the directory to speed up the process.

The full process to compare a name using CRC32 must be:

If the name to compare is case-insensitive, compute the case-folded CRC32, otherwise compute the CRC32 of the unmodified name.
For the on-disk file to compare, get the Case-Insensitive bit and determine if the comparison will be case-sensitive or not. Use the CRC32 stored in the directory to compare with the computed CRC32.
If the CRC32 is a match, or if the on-disk CRC32 is zero, do a full string comparison (case-sensitive or insensitive same as the CRC32 comparison). If the strings match, then a match was found, otherwise continue with the next directory entry.
If the CRC32 does not match, move to the next directory entry and repeat the comparison.

This approach allows faster name comparison with large directories, since a string comparison is only performed when there's a CRC32 match. The implementation should NOT rely solely on the CRC32 to determine a match, since there might be more than one string that matches the same CRC32.

The special case of CRC32 stored in the volume as zero is for implementations that do not support CRC32. A zero value should be interpreted as "CRC32 not supported" and therefore the implementation must fall back to full string comparison.

Free disk space management

The implementation is free to use any strategy to allocate clusters to new files. Such strategy should try to minimize fragmentation of the files, and protect recently erased files from being overwritten where feasible.

To account for free disk space, a cluster bitmap must be stored on the root directory of the disk. The bitmap is stored as a file, with a minimum size enough to contain 1 bit for each cluster in the system. The first bit (lsb) of the first byte corresponds to the first cluster in the file system. A bit is set when the cluster is used, clear when free.

The bitmap file in the root directory must be named ".{~~Free Space~~}" (without the quotes), and its attributes shall be: Hidden and Immutable, to prevent other processes from accessing the file.

Unlike a FAT table, the free space bitmap is not required to mount the file system, as it is only used to allocate clusters. If the .{~~Free Space~~} file is not present, the implementation must either mount the file system as read-only, or try to create the free space bitmap from scratch.

The first bit of the bitmap, which corresponds to Cluster 0 has a special meaning. Since Cluster 0 is always used for the boot record, that bit is used as the 'Dirty Bit'. This bit must normally be clear. When the system makes the first write to the file system, the implementation must set this bit, and the bit must only be cleared when the volume is unmounted.

If the volume is not unmounted properly, then this bit will be found set on next mount, indicating that the free space bitmap might have inconsistent contents, and the entire disk must be scanned to rebuild the bitmap.

Deleting a file

When a file is deleted, its entry in the directory is marked as a Deleted primary record, by changing its record type. All other information is maintained unaltered until the implementation decides to reuse the entry. All its clusters must be marked as free in the free space bitmap.

Creating a file

When a file is initially created, the following procedure must be adhered to:

Unless overriden by the user, get the Dir-Case-Insensitive attribute from the directory.
Scan the directory looking for a name match, following the guidelines on the Case Sensitivity section
If a match is found, return an error, otherwise allocate a new directory entry.
Write the primary record and any required Name records to the directory.

Directory entry allocation strategy

The strategy to allocate directory entries is implementation-defined. A standard strategy is suggested here but is not mandatory.

For fast allocation, the following rules can be used:

All new entries are allocated at the end of the directory.
Deleting an entry marks it as deleted.
When modifying a file's entry, if there's enough space or the size didn't change, store it in-place.
When modifying a file's entry, if there's not enough room, mark all entries as free and add a new entry at the end of the directory.

This simple strategy is fast, but can cause directories to grow quickly and have sigificant amounts of wasted space. Directory entries that don't change often will "fall" to the beginning of the directory, and the ones that change will be pulled towards the end, making the beginning of the directory more "stable".

To optimize a directory, its entries might need a "packing" procedure, as follows. The packing procedue should be performed regularly by the file system at appropriate moments, for example when closing a directory or while unmounting the volume.

A suggested strategy is simply to read the directory and rewrite it skipping the empty or deleted entries. Another simple strategy is to create a new file with contiguous clusters (single fragment or extent), and write all entries skipping the empy ones. Then rename it as the original directory and delete the old directory. This strategy is slower and requires more disk space, but has the advantage of also defragmenting the directory.

Clean FS Specification - Part IV