Tips & Tricks Storage Management

How To Find Duplicate Files

Find and delete all file duplicates from your storages.

Blog Author Joachim Marder

Joey

CEO
Remove Duplicates with TreeSize
Published on 03.08.2021

Working together in the same file system or SharePoint server is challenging. Especially when several people are involved within a project, the work documents must be well organized to avoid any mess or even chaos. That's why it's important to use the right tools to support your project work professionally.

With TreeSize, you can remove duplicates with an ease:

  • Look for duplicates across all your hard drives, cloud storage and network shares
  • Remove duplicates, move or archive them to another storage
  • Deduplicate redundant files without data losses

Duplicate files waste hard disk space

This is what happened to one of our customers: A construction company organizes its many projects into subdirectories where it stores the respective planning and work files. In the process, each project receives many PDF files with universally applicable norms and standards.

As a result, the construction company constantly accumulates true duplicates of these source documents over time. Thus, hundreds of gigabytes of storage space would be wasted over time just for the file duplicates. A manual search for these files is impossible and uneconomical due to the sheer volume of files.

With TreeSize it is possible to effectively search for duplicate files on local file systems and servers and deduplicate them.

Deduplication is possible on NTFS-formatted storage media. In this case, the file duplicates are replaced by NTFS hard links - so that each file duplicate no longer consumes its own storage space, but points to the same location on the hard disk in each case.

Thus, TreeSize prevents losses of important files and frees up new storage space at the same time.

Our customer now regularly runs a scan of their file systems at the end of the month, looking for duplicate files as well. A simple click is all it takes to deduplicate duplicate files with TreeSize. 

But how does it in practice?

 

Find duplicate files with TreeSize and remove duplicates safely

The TreeSize file search is a very powerful tool, with which you can professionally support the organization of your project work. To do so, open TreeSize and select the entry "Duplicate files" under "Open TreeSize file search". Here you can initially select the drives that are to be included in the search for file duplicates.

Do you also want to select a server as scan target, for example, to search for duplicate files on your SharePoint? To do so, click on the plus icon, select "SharePoint" and enter your login credentials.

Besides SharePoint, TreeSize can also scan Amazon S3 Cloud Storage, Linux and Unix servers with SSH and WebDAV.

 

Before starting the duplicate search, you can also set filter rules to exclude files that are smaller than a certain size from the results list, for example. This will increase the clarity of your results.

When you have set everything, let TreeSize search for duplicates. TreeSize offers different checking methods, like MD5 or SHA256. Or simply name, date and file size.

TreeSize lists the results clearly, so you can see exactly which files have duplicates at which locations. For example, sort the results list by size to identify the biggest space wasters.

 

Finally, you can mark either all or individual result entries for deduplication. Now you can simply delete the file duplicates - or cleverly deduplicate them.

In the "Deduplication" window you can once again get an overview of the storage space you can potentially reclaim with TreeSize.

 

Now just click on "Run". Done! You now have more storage space available again.

Please note: Deduplication with NTFS hardlinks only works on NTFS-formatted storage media, such as your internal Windows hard drives or external storage media like hard drives and USB sticks that are NTFS-formatted. If file duplicates are in other storage locations, such as SharePoint, you'll need to delete them as usual instead.

 

Next, schedule the search for duplicate files with TreeSize firmly into your project management process.

For the big chunks: Search for duplicate folder structures with TreeSize

Not every use case is covered by a simple file search. Another customer told us some time ago that his use case is even more specific. A state archive faces the problem that not only individual files, but entire folder structures are present on its servers multiple times. In total, more than 10,000 file duplicates can be identified with the TreeSize file search.

It would take way too much time to check them all individually on file level. Luckily, TreeSize can perform a duplicate search on folder level, too.

TreeSize is a must-have for neatly organizing projects and a lifesaver for overfilled hard drives. TreeSize Duplicate Search can detect duplicate files in a quick and easy way, both on servers and on local drives. This allows you to resolve file duplicates without having to worry about losing important content. 

 

Want to always stay up to date? Subscribe to our newsletter now!

Do you like what you've just read, have new ideas or feedback? Visit our contact form and let us know your thoughts!