Duplicate files

<< Click to Display Table of Contents >>

Navigation:  Using the TreeSize File Search >

Duplicate files

Context tab: Duplicate Files

FileSearch_RibbonTab_Duplicates

 

Searches for duplicate files on the selected drives or shares.
In this context, duplicate files are files which seem to exist more than once. Such redundant files increase the allocated space of your disks unnecessarily.

A detailed step by step example of how to use the duplicate search can be found here.

 

Check all

Checks all items in the active result list.

Uncheck all

Unchecks all items in the active result list.

Check if

Checks all items for which the full path matches certain patterns in the currently visible result list.
You can configure the patterns in a new window.

Please note: The check state of other items will not be changed by this function.

Uncheck if

Unchecks  all items for which the full path matches certain patterns in the currently visible result list.
You can configure the patterns in a new window.

Please note: The check state of other items will not be changed by this function.

Check all but newest

Check-marks all files of the selected duplicates, except the newest file of each group.

The drop-down menu provides access to additional options that allow to select which duplicate files should be checked. The dropdown also includes the option "Ensure one unchecked file per group".

Remove invalid elements

Through changes on the file system, such as the manual deletion of files via Windows Explorer, it may be possible that previously found duplicate search results have become invalid. This function checks all currently shown elements and removes those that cannot be found on the file system anymore.

Deduplicate

This function will replace duplicate files with NTFS hardlinks.

Please find a detailed description of this function below.

Delete items

Delete all checked search results.

See "Move checked files".

Move items

Move all checked search results to a destination of your choice.

See "Move checked files".

Search Filters:

Defines which criteria should be used to identify files as duplicates. Here is a list of the available strategies:

Compare Size, Name and Date

Select this option to identify duplicate files by looking for equal names, sizes and last change dates.

This is much faster than using check sums to indicate duplicates, but it is also less accurate.

Compare using MD5 Checksum

When using MD5 checksums, a so called hash value is calculated based on the contents of each file. Files with the same content will have the same hash value, files with different content will almost certainly have different values.
Empty files are ignored, since there is no content to compare.

This is more accurate then comparing files by their name, size and date but it is also much slower.

Compare using SHA256 Checksum

This search option works like the MD5-Checksum mechanism, but uses the SHA256 algorithm instead of MD5.

While it is very unlikely that the MD5 hash algorithm produces the same hash value for different files, the SHA256 algorithm further reduces the statistical risk of such hash collisions. However, the SHA256 algorithm is significantly slower than MD5.

Compare by Name only

Select this option to find all files with equal file names.

This is not really a strategy to identify true duplicate files, but this compare type may be helpful when you are searching for certain redundancies or undesired copies (e.g. documents which have been copied and modified locally).

There are some additional options to customize the duplicate file search:

Minimum Size in KB

Defines the minimum size for files which are subjected to the duplicates search.

Please note: Using a minimum size will reduce the number of files to compare. This will increase the speed of the search, especially when comparing by checksums.

Ignore NTFS hardlinks

If checked, hardlinks will not be regarded as duplicate files. You can find this setting in the options dialog under "Duplicate File Search" > "Filter".

Please note: NTFS hardlinks do not allocate any space so you will not free up disk space by deleting them. Also, TreeSize uses hardlinks for the Deduplication.

Duplicate filter

This option allows you to restrict the duplicate search to a specific preselection of files. Depending on whether you define an exclude, or include filter, all files that match these filter patterns will be excluded from, or restricted to in the duplicate search.

By using this option, you can prevent listing files of certain directories (e.g. your local system directories) as duplicates. Additionally, this option will reduce the number of files to compare and thus lead to a performance increase.

Deduplicate:

Use this button to replace all but one checked duplicate files by NTFS hardlinks to one file. This will reduce the disk space blocked by your duplicate files (See: NTFS hardlinks). However, this will not influence the disk space shown in the "properties" dialog of the Windows Explorer (See: Deduplication FAQ). If you check only one file of a group, this file is replaced by a link to the "newest" (latest last change date) unchecked file of this group.

Please note: You cannot use hardlinks to replace files located on different hard drives.

In the configuration window you can select a log file to log the performed replacements to. You can also define how TreeSize will handle files located on different hard disks. You can either replace files located on the same hard disk with hardlinks separately or simply select a reference drive and replace all files located on other hard disks with symbolic links. Please note that in case the permission to create symbolic links can not be granted, a Windows shortcut (.LNK file) will be created instead as fallback.

The context menu of the duplicate files list offers a feature named "Replace duplicates by hardlinks". This function works just like the "Deduplicate" function, but will handle all selected files instead of checked files.

To use deduplication with hardlinks you need these NTFS permissions in all affected folders: Read permissions, write permissions, create files, delete files

Important: Please note that TreeSize does not offer the functionality to "undo" a deduplication!
All hardlinks pointing to the same file share the same "Security Description" (access permissions). Deduplication will apply a unified set of permissions to the one physical remaining file. Undoing a deduplication manually is very difficult.

FileSearch_Duplicates_Deduplicate

Ensure one unchecked file per group:

Activate this option, if you want to ensure that one file per duplicate group remains unchecked. This can be useful when using a custom selection mechanism, such as "Check if", to ensure that at least one of the duplicate files will not be included in a move or delete operation.