Notes on NTFS

<< Click to Display Table of Contents >>

Navigation:  Tips & Annotations >

Notes on NTFS

The file system NTFS can be used with the operating system Windows NT or later. It offers some special features which also have effects for TreeSize. We will describe some of these features and their impacts on this software in the following paragraphs.

Access Control Lists

The way users can access files and folders can be restricted. One can grant or deny other users or groups certain rights like reading, writing, executing or deleting. That way one can even deny administrators to access files and folders. If an administrator tries to access a folder in the Windows Explorer to which the owner denied any other users reading access, an "Access Denied" error message will be displayed. However, TreeSize is able to scan such folders, if you are logged in as administrator or as a user that has the right to perform backups (This option can be changed at "Control Panel > Administrative Tools > Local Security Policy" and with the user editor of Windows).

File Based Compression

NTFS supports compression on an individual file basis. Files that are compressed on an NTFS volume can be read and written without first being decompressed by another program. Decompression happens automatically and transparently during the reading of the file. The file is compressed again when it is saved.

The space occupied by a compressed file is usually much smaller than its normal size. As a consequence, for folders that are partially or completely compressed, the allocated space reported by TreeSize may be smaller than the size reported for this folder. TreeSize is able to show the compression ratio in an extra column on the "Details" tab. Additionally it can show compressed files and folders in a different color. These features can be turned on or off in the Options dialog.

TreeSize is able to compress and decompress entire file system branches using the context menu.

In Windows 10 Microsoft introduced new transparent compression-features in NTFS, designed to compact the files of the operating system, mainly DLL and EXE files. In contrast to old file based compression, these files are not flagged as compressed in their file attributes.

Sparse Files

Files which are large but only partially used are called sparse files. Because the operating system does not allocate disk space for the unused parts of a sparse file, it occupies less disk space than its actual size is. TreeSize treats sparse files like compressed files and also calculates the compression ratio for them.

Reparse Points: Volume Mount Points and Symbolic Links

A volume mount point is an existing path where you "mount" another volume. Given this, users and applications can refer to the mounted volume by that path. There is no need to assign a drive letter to this volume. It allows you to unify multiple file systems into one logical file system. Symbolic links, also known as junction points, work similar: If you for example have an empty folder "C:\Documents\Images", you can create a symbolic link to "E:\Pics" for it. Applications will then see the content of "E:\Pics" in "C:\Documents\Images". Unlike an NTFS junction point a symbolic link can also point to a file or remote SMB network path.

If the Option "Follow Mount Points and Symbolic Links" is turned on, TreeSize will include the contents of these folders when scanning. Since they are not physically stored on the drive you are scanning, this may produce results for the allocated space that are larger than the total size of the drive.

Alternate Data Streams (ADS)

In NTFS, a file consists of different data streams. One stream holds the security information (access rights and such things), another one holds the "real data" you expect to be in a file. There may be alternate data streams, holding data the same way the standard data stream does. These alternate data streams are hidden. That means that you can have a file with 1 byte in the official main data stream and some hundred MB in one or more alternate data streams. The dir command, file managers or windows explorer will show 1 byte as the size of this file, but it actually allocates much more space on your hard drive.

MainWindow_AlternateDataStreams

TreeSize can detect alternate data streams and add their sizes to the allocated file size.
Please note: ADS may store information in the same cluster as the main data stream, so if a file has one or more ADS, this file does not necessarily allocate more disk space.

You can choose to detect alternate data streams, to get a more accurate allocated space of directory branches, in the TreeSize Options dialog. This option is deactivated by default, because querying the ADS takes some time and increases the overall time needed for a scan. You can search for files containing alternate data streams using the Custom File Search of TreeSize.

Hardlinks

In a Windows environment a hardlink is a reference, or pointer, to physical data on a NTFS storage volume. All named files are hardlinks. The name associated with the file is simply a label that refers the operating system to the actual data. On NTFS volumes, more than one name can be associated with the same data. Though called by different names, any changes made will affect the actual data, regardless of how the file is called at a later time. Hardlinks can only refer to data that exists on the same file system. The data is accessible as long as at least one link that points to it exists. When the last link is removed, the space is considered free. Please note that all hardlink pointing to the same file share also the same Security Descriptior (access permissions).

To create a hardlink, the user must have write permissions for file attributes on the respective folder branch and on the share, if the drive is not a local drive.

If more than one hardlink points to a file's data, the space is allocated only once by these files, no matter how many hardlinks exists. In the Options dialog you can tell TreeSize to detect hardlinks, to get a more accurate allocated space of directory branches. This option is deactivated by default, because querying the hardlinks takes some time and increases the overall time needed for a scan.

Automatic Data Deduplication

Windows Server 2012 and later offer a data deduplication feature: The data deduplication segments files with fractionally equal content into so-called "chunks" which are moved into the subfolder "System Volume Informaton\Dedup\ChunkStore\" located on the corresponding NTFS partition. After the deduplication has been applied by Windows, the original data is replaced by a pointer to the corresponding chunk in the ChunkStore directory. After they have been deduplicated by the NTFS deduplication two identical files will only require half of the disk space they occupied before. Since the original files now only contain a small pointer instead of the data, the allocated disk space will be indicated by Windows with a much smaller value than before (for two identical files the occupied disk space would be indicated as "0 Byte"). To make TreeSize show the original file and folder sizes, simply switch the view mode from "Allocated Space" to "Size". The "Allocated Space" shown in TreeSize is the disk space you would obtain by deleting the corresponding file.

Offline Files

Windows Server and some 3rd party tools and appliances offer a feature called "offline files": Files that have not been used for a long time will be automatically moved to cheaper and slower storage, and a small stub file remains at its original location. Usually TreeSize reports the allocated space of such a stub file correctly, which is often only the size of one file system cluster.

There is however one situation in which the allocated space for stub files may not be reported correctly. In case TreeSize runs into Access Denied errors, it uses Windows API functions intended for backup software in order to be able to scan also those parts of the file system and provide values for their size and allocated space. We have seen some appliances which reported the full file size as allocated space in this case for the stub files, most likely because this would be the size occupied in a backup. To avoid this, ensure that the user which runs the scans has full read access to the scanned file system.