How do you feel about storing binary files with Git? - eviltoast

I think it’s generally agreed upon that large files that change often do not belong while small files that never change are fine. But there’s still a lot of middle ground where the answer is not so clear to me.

So what’s your stance on this? Where do you draw the line?

  • FizzyOrange@programming.dev
    link
    fedilink
    arrow-up
    10
    ·
    4 months ago

    The main downside is Git downloads all history by default, and so any large files will bloat the download for people cloning your repo forever. It isn’t about binary vs text. It’s just the size that matters.

  • count_dongulus@lemmy.world
    link
    fedilink
    arrow-up
    9
    ·
    4 months ago

    If it’s a build artifact, put it in a registry. If it’s resource type files, Git LFS can be used if it’s not an absolute ton.

    • EarMaster@lemmy.world
      link
      fedilink
      arrow-up
      2
      ·
      4 months ago

      This. If the file can be generated from the repository it should not be put inside it, but if you need it to build the project it should (unless it is an easy to install external dependency that should be declared in a Readme file).

  • borf@lemmynsfw.com
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    2
    ·
    4 months ago

    hiss The point of source control is keeping a history of the code that produces binary artifacts. For convenience’ sake or because of some wild project-specific constraints people can do all kinds of things but IMO binaries in source control are yucky yucky bad bad bad.

    • Limonene@lemmy.world
      link
      fedilink
      arrow-up
      5
      ·
      4 months ago

      I agree that the repository should only contain source files, not the output of the build process. However, in some cases (like icon images) the source files may be binary. I think small binaries that are required to build and/or run the software, and that are not an output of any build process, do make sense to put in the repository.

        • Windex007@lemmy.world
          link
          fedilink
          arrow-up
          4
          ·
          4 months ago

          That’s not really responding to their point. Are you saying that there are no project resources that aren’t (or couldn’t be) encoded as text representation in any conceivable project that is stored in a git repo?

          I agree with your general point. I’m just curious if you have some unique project pattern techniques that allow you to draw such a hard line so confidently.

  • deathgrindfreak@lemmy.world
    link
    fedilink
    arrow-up
    5
    ·
    4 months ago

    Fyi, there’s a fun project designed for handling the syncing of large files that uses git under the hood called git-annex. Fun fact, it’s written in Haskell as well.

  • terrehbyte@ani.social
    link
    fedilink
    English
    arrow-up
    4
    ·
    4 months ago

    I don’t like it, but if they’re part of the project files, then they belong in version control. I do worry about the challenges of combining the difficult-to-merge nature of binaries with the distributed workflows that Git encourages. While data doesn’t get lost, the inability to merge them may mean that someone needs to spend extra time re-performing their changes if they “lose” the push/merge race.

    Game engines have been doing a better job of transitioning away from large monolithic binaries by either serializing them in somewhat mergeable text files or at least splitting them into large numbers of smaller binaries to reduce file contention.

    Git LFS does offer the ability to off-load them from the repository, reduce download and checkout times as well as the ability to lock files (which does introduce centralization…), but it doesn’t seem to be as ubiquitous and can be more expensive to use, depending on the team’s options for Git repo providers.

    Note: I assume you mean binaries as in “non-text files”, not build artifacts, which definitely don’t belong in version control at all.

  • Ephera@lemmy.ml
    link
    fedilink
    arrow-up
    3
    ·
    4 months ago

    I’ll go to quite a bit of effort to avoid them. Arguably too much effort, but I often find that the path that avoids them is also useful in other ways.

    For example, for a personal project, I automated rendering a PNG fallback icon from an SVG, so now I can have as many different resolutions as I want and don’t need to manually update them, if I want to tweak the icon.

    I’d also like to publish a screenshot of the project. The simple solution is to check a PNG into the repo and link it in the README.md. But what would be a lot nicer, is to set up a project webpage, which with Codeberg Pages isn’t even that much effort, but I would have less motivation to do it otherwise.

  • Lodra@programming.dev
    link
    fedilink
    English
    arrow-up
    6
    arrow-down
    4
    ·
    4 months ago

    Never do this.

    Git is all about tracking changes over time which is meaningless with binary files. They are bloat for your repo, slowing down operations. Depending on the repo, they are likely to change from CI with every commit. That last one means that every commit turns into 2 commits btw. They are can ruin diffs. I could go on for a long time here.

    There are basically 0 upsides. Use an artifact repository instead!

    • FizzyOrange@programming.dev
      link
      fedilink
      arrow-up
      7
      arrow-down
      1
      ·
      4 months ago

      Git is all about tracking changes over time which is meaningless with binary files.

      Utter codswallop. You can see the changes to a PNG over time. Lots of different UIs will even show you diffs for images.

      Git can track changes to binary files perfectly well. It might not be great at dealing with conflicts in them but that’s another matter.

      The only issue is that binary files tend to be large, and often don’t compress very well with Git’s delta compression. It’s large files that are the issue, not binary files. If you have a 20 kB binary file it’s going to be absolutely fine in Git. Likewise a 10 GB CSV file is not going to be such a good idea.

  • tiredofsametab@kbin.run
    link
    fedilink
    arrow-up
    2
    ·
    4 months ago

    I think the only binaries I have are tiny samples used by a couple of tests in that repo. I generally try to avoid them altogether.

  • magic_lobster_party@kbin.run
    link
    fedilink
    arrow-up
    1
    ·
    edit-2
    4 months ago

    I think assets like app icons are ok. They rarely change, and are often quite small. It’s convenient to have those kinds of things bundled together with the code.