Navigation:  File Search >

Duplicate File Search

Previous pageReturn to chapter overviewNext page

This page lists files which seem to exist more than once on your disk. Such redundant files increase the allocated space of your disk, unnecessarily.

You can choose between different strategies on how to identify duplicate files.
Here is a list of these strategies:

Compare Size, Name and Date

Choose this to identify duplicate files by equal name, size and last change date.
This is much faster than using checksums to indicate duplicates, but it is also less accurate.

Compare using MD5 Checksum

When using MD5 checksums, a so called hash value is calculated based on the contents of each file. Files with the same content will have the same hash value, files with different content will almost certainly have different values.
Empty files are ignored, since there is no content to compare.
This is more accurate then comparing files by their name, size and date but it is also much slower.

Compare using SHA256 Checksum

This works like the MD5-Checksum mechanism, but uses the SHA256 algorithm instead of MD5.
While it is very unlikely that the MD5 hash algorithm produces the same hash value for different files, the SHA256 algorithm further reduces the statistical risk of such hash collisions.
However, the SHA256 algorithm is significantly slower than MD5.

Compare by Name only

Choose this to find all files with equal file names only. This is not really a strategy to identify true duplicate files, but this compare type may be helpful when you are searching for certain redundancies or undesired copies (E.g.: Documents which are copied and modified locally).

There are some additional options to customize the duplicates file search for your own desires:

Minimum Size in KB

Defines the minimum size for files, which are subject to the duplicates search.
Note: Using a minimum size will reduce the number of files to compare. This will increase the speed of the search, especially when comparing by checksums.

Use Exclude Patterns

If checked, the exclude patterns defined on the "Search Options" tab will be used. Activating this option will also reduce the number of files to compare and so lead to a performance increase.

Ignore NTFS Hard Links

If checked, NTFS hard links will be ignored. This is important as hard links will otherwise show up as duplicates.

Like in all searches, you can move all files in the result list which are check-marked to the left of their file name, to a position in the file system that you may specify, by using the Move Checked button.

 
Deduplicate:

This button is located right beneath the "Move Checked" button and only visible if the "Duplicate Files" tab is selected.
Use this button to replace all checked but one duplicate file in a group by NTFS hard links to the one remaining file. This will reduce the disk space allocated by your duplicate files (See: NTFS Hard links). If you check only one file of a group, this file is replaced by a link to the "newest" (file with the latest last change date) unchecked file of this group.

In the shown configuration form you may choose a log file to log the performed replacements to. You can also decide here how to handle files on another hard disk, which cannot be replaced by NTFS hard links. You have the option to ignore those files or to replace them using symbolic links.

The context menu of the duplicate files list offers a feature named "Replace Duplicates by Hard Links". This function works the same as "Deduplicate", besides it handles selected files instead of checked files.

Important: Please note that TreeSize Professional does not offer the functionality to "undo" a deduplication!
All hard links pointing to the same file share the same "Security Description" (access permissions). Deduplication will apply a union set of permissions to the one physical remaining file. Considering this it is difficult to undo a deduplication manually.