I recently moved my files to a new zfs-pool and used that chance to properly configure my datasets.

This led me to discovering zfs-deduplication.

As most of my storage is used by my jellyfin library (~7-8Tb), which is mostly uncompressed bluray rips I thought I might be able to save some storage using deduplication in addition to compression.

Has anyone here used that for similar files before? What was your experience with it?

I am not too worried about performance. The dataset in question is rarely changed. Basically only when I add more media every couple of months. I also have overshot my cpu-target when originally configuring my server so there is a lot of headroom there. I have 32Gb of ram which is not really fully utilized either (but I also would not mind upgrading to 64 too much).

My main concern is that I am unsure it is useful. I suspect just because of the amount of data and similarity in type there would statistically be a lot of block-level duplication but I could not find any real world data or experiences on that.

  • nottelling@lemmy.world
    link
    fedilink
    English
    arrow-up
    6
    ·
    6 days ago

    ZFS dedup is memory constrained, and the memory use scales with the block hashes.

    If performance isn’t a concern, you’re better off compressing your media. You’ll get similar storage efficiency with less crash consistency risk.

    • IsoKiero@sopuli.xyz
      link
      fedilink
      English
      arrow-up
      4
      ·
      6 days ago

      ZFS in general is pretty memory hungry. I set up my proxmox sever with zfs pools a while ago and now I kind of regret it. ZFS in itself is very nice and has a ton of useful features, but I just don’t have the hardware nor the usage pattern to benefit from it that much on my server. I’d rather have that thing running on LVM and/or software raid to have more usable memory for my VM’s. And that’s one of the projects I’ve been planning for the server, replace zfs pools with something which suits my usage patterns better, but that’s a whole another story and requires some spare money and some spare time, which I don’t really either at hand right now.

      • emptiestplace@lemmy.ml
        link
        fedilink
        English
        arrow-up
        1
        ·
        4 days ago

        Just adjust it if you actually need the RAM and it isn’t relinquishing quickly enough.

        options zfs zfs_arc_max=17179869184 in /etc/modprobe.d/zfs.conf, update-initramfs -u, reboot - this will limit ZFS ARC to 16GiB.

        arc_summary to see what it’s using now.

        As for using a simple fs on LVM, do you not care about data integrity?

        • IsoKiero@sopuli.xyz
          link
          fedilink
          English
          arrow-up
          1
          ·
          4 days ago

          this will limit ZFS ARC to 16GiB.

          But if I have 32GB to start with, that’s still quite a lot and, as mentioned, my current usage pattern doesn’t really benefit from zfs over any other common filesystem.

          As for using a simple fs on LVM, do you not care about data integrity?

          Where you get that from? LVM has options to create raid volumes and, again as mentioned, I can mix and match those with software raid however I like. Also, single host, no matter how sophisticated filesystems and raid setups, doesn’t really matter when talking about keeping data safe, that’s what backups are for and it’s a whole another discussion.