What is the best duplicate file finder, that preserves my source of truth? - eviltoast

Having been so meticulous about taking back ups, I’ve perhaps not as been as careful about where I stored them, so I now have a loads of duplicate files in various places. I;ve tried various tools fdupes, czawka etc. , but none seems to do what I want… I need a tool that I can tell which folder (and subfolders) is the source of truth, and to look for anything else, anywhere else that’s a duplicate, and give me an option to move or delete. Seems simple enough, but I have found nothing that allows me to do that… Does anyone know of anything ?

  • Sergiow13@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    czkawka can easily do this OP!

    In this screenshot for example, I added 3 folders and marked the first folder as reference folder (the checkmark behind it). It will now look for files from this folder in the other folders and delete all identical files found in the non-reference folders (it will off course first list all of them and ask you to confirm before deleting)

  • kslqdkql@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Alldup is my preferred de-duplicator, it has options to protect folders and seems like what you want but it is windows only unfortunately

  • some_guy@lemmy.sdf.org
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    I use fclones. Not sure if this works the same as fdupes (likely does). Not sure if it will help you. It’s just a thing I use. Hope it helps.

  • root_switch@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    Only YOU can tell which is the source of truth but czawka can easily do what you need, what issues did you have with it?

    • parkercp@alien.topOPB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      I’ll have to reinstall it to remind myself what it was, if I recall correctly it was not easy to work out what I needed to do, as I simply wanted to say scan everything for duplicates that are in the (directory hierarchy e.g. multimedia/photos/) I have deemed as being the source of truth)…

  • nemec@alien.topB
    link
    fedilink
    English
    arrow-up
    1
    ·
    10 months ago

    If you’re 100% sure that the dupes are only between your source of truth and “everything else”, you can run fdupes then grep -v /path/to/source/of/truth/root the output - all the file paths that remain are duplicate files outside your source of truth, which can be deleted.

  • speculatrix@alien.topB
    link
    fedilink
    English
    arrow-up
    0
    ·
    10 months ago

    Write a simple script which iterates over the files and generates a hash list, with the hash in the first column.

    find . -type f -exec md5sum {} ; >> /tmp/foo

    Repeat for the backup files.

    Then make a third file by concatenating the two, sort that file, and run “uniq -d”. The output will tell you the duplicated files.

    You can take the output of uniq and de-duplicate.

    • jerwong@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      I think you need a \ in front of the ;

      i.e.: find . -type f -exec md5sum {} \; >> /tmp/foo

    • tech2but1@alien.topB
      link
      fedilink
      English
      arrow-up
      1
      ·
      10 months ago

      Well it won’t. You either tell it to assume that say oldest is always source and if there are identical files then you get asked to choose.

  • CrappyTan69@alien.topB
    link
    fedilink
    English
    arrow-up
    0
    ·
    10 months ago

    Only runs on windows but I’ve been using double killer for years. Simple and does the trick

    • parkercp@alien.topOPB
      link
      fedilink
      English
      arrow-up
      0
      ·
      10 months ago

      Thanks @CrappyTan69 - I ideally need this to run on my NAS, and if possible be opensource/free - looks like for what I’d need Double Killer for, it’s £15/$20 - maybe an option as a last resort…

      • Lorric71@alien.topB
        link
        fedilink
        English
        arrow-up
        1
        ·
        10 months ago

        Can’t you edit the OP and add the requirements? You haven’t even told us what NAS you have.