DEV Community

0x3d Site
0x3d Site

Posted on

🧠 I Built a Python Script That Finds Duplicate Files So I Can Stop Wasting Storage

GET A 50% DISCOUNT—EXCLUSIVELY AVAILABLE HERE! It costs less than your daily coffee.

Just it, Enjoy the below article....


Let me paint you a picture:

You're doing a little digital spring cleaning, and you realize you've got triplets of the same file:

  • invoice_final.pdf
  • invoice_final_v2.pdf
  • invoice_final_2_REAL_FINAL.pdf

Or worse… you downloaded the same meme 17 times in 2 years.
You don't want to be a hoarder, but your files say otherwise.

So here’s what I did:
I built a simple, powerful Python script that finds and removes duplicate files — even if they have different names.

And yes, I now sleep better at night.


🚨 The Pain: You Don’t Know What You Have Anymore

You back up your stuff. You organize things (well, once).
But then: cloud syncs, Slack downloads, renamed copies, and panicked backups all pile up.

Suddenly:

  • Your storage is full
  • You can't find things
  • You're afraid to delete anything in case it's "the important version"

✅ The Cure: Check Files by Content, Not Name

Forget filenames.
The secret is to check each file’s hash — its digital fingerprint.

If two files have the same hash, they’re identical.
Even if one is called resume.pdf and the other is copy-of-resume-2022-old.pdf.

Let’s do it.


🧪 Step-by-Step: The Python Script That Sniffs Out Duplicates

Step 1: Calculate a file's hash

import hashlib

def file_hash(filepath, chunk_size=8192):
    hasher = hashlib.md5()
    with open(filepath, 'rb') as f:
        while chunk := f.read(chunk_size):
            hasher.update(chunk)
    return hasher.hexdigest()
Enter fullscreen mode Exit fullscreen mode

This reads a file in chunks and builds its hash. MD5 works fine here — we’re not encrypting nuclear secrets.


Step 2: Scan a directory for duplicates

import os

def find_duplicates(folder):
    hashes = {}
    duplicates = []

    for root, _, files in os.walk(folder):
        for file in files:
            path = os.path.join(root, file)
            try:
                filehash = file_hash(path)
                if filehash in hashes:
                    duplicates.append((path, hashes[filehash]))
                else:
                    hashes[filehash] = path
            except Exception as e:
                print(f"Skipped {path}: {e}")

    return duplicates
Enter fullscreen mode Exit fullscreen mode

Step 3: Use it and print results

if __name__ == "__main__":
    folder_to_scan = os.path.expanduser("~/Documents")
    dupes = find_duplicates(folder_to_scan)

    if dupes:
        print("Found duplicates:")
        for dup, original in dupes:
            print(f"{dup} == {original}")
    else:
        print("No duplicates found!")
Enter fullscreen mode Exit fullscreen mode

💥 What You Just Got Back

This script:

  • Works on any folder
  • Finds real duplicate files
  • Ignores names, cares about content
  • Shows you which files are taking up double space

You can even tweak it to:

  • Auto-delete duplicates
  • Log everything to a file
  • Sort duplicates into a ~/Duplicates folder

I found tons of smart ways to extend this script on python.0x3d.site. Especially under:

Worth bookmarking — I keep it open like a daily command center.


💡 Final Thought: Clean File Life = Clear Mind

You don’t need a rocket launcher app.
You need small scripts that fix annoying things. That actually give you back time and brain space.

Start small. Keep it weird.
Python can do a lot more than fetch weather data — it can declutter your life one hash at a time.


Want a version that runs automatically every week? I’ve got a version that does just that — happy to share it.

And if you're the kind of dev who likes collecting quirky little problem-solvers, python.0x3d.site is where I usually find ideas, tools, and inspiration that don’t make me snore.

Stay tidy, my friend. 🧹🐍



🎁 Download Free Giveaway Products

We love sharing valuable resources with the community! Grab these free cheat sheets and level up your skills today. No strings attached — just pure knowledge! 🚀

🔗 More Free Giveaway Products Available Here

We’ve got 20+ products — all FREE. Just grab them. We promise you’ll learn something from each one.


Turn $0 Research Papers into $297 Digital Products

🧠 Convert Research Papers into Business Tools The Ultimate Shortcut to Building Digital Products That Actually MatterMost people scroll past groundbreaking ideas every day and never realize it.They're hidden in research papers.Buried under academic jargon.Collecting digital dust on arXiv and Google Scholar.But if you can read between the lines —you can build something real.Something useful.Something valuable.This is not another fluffy eBook.This is a system to extract gold from research……and turn it into digital tools that sell.Here's what you get: ✅ Step-by-Step GuideLearn how to find high-impact papers and convert them into cheat sheets, prompt packs, and playbooks. ✅ Plug-and-Play ChecklistNo thinking required. Follow the steps, build your product, publish it. ✅ ChatGPT Prompt PackGet the exact prompts to decode complex research, turn insights into product formats, and even write your sales copy. ✅ Mindmap WorkflowA visual blueprint of the whole process. From idea to income — laid out like a circuit. Why this matters:Most people are drowning in low-quality content.You’re about to do the opposite.You're going to create signal — not noise.You’ll build products that are: Research-backed Fast to create High in perceived value And designed to help people win It’s a full loop: You learn → You create → Others win → You profit.What happens when you buy?You’ll feel it.The clarity.The power of execution.The momentum of turning raw knowledge into real-world value.You’re not buying a file.You’re buying a shortcut to products that earn, not just exist.If that excites you — let’s get started.No code. No waiting. Just results.👉 Grab your copy now.

favicon 0x7bshop.gumroad.com

Take ideas from research papers and turn them into simple, helpful products people want.

Here’s what’s inside:

  • Step-by-Step Guide: Go from idea to finished product without getting stuck.
  • Checklist: Follow clear steps—no guessing.
  • ChatGPT Prompts: Ask smart, write better, stay clear.
  • Mindmap: See the full flow from idea to cash.

Make products that are smart, useful, and actually get attention.

No coding. No waiting. Just stuff that works.

Available on Gumroad - Instant Download - 50% OFFER 🎉

Top comments (0)

OSZAR »