7

Does ZFS deduplicates across datasets or only inside a single dataset? In other words if I have two nearly identical volumes will they be deduplicated?

Braiam
  • 35,991
  • I would say it only affects the dataset. +1 I am curious to see what someone more experienced with zfs will say. – Rui F Ribeiro Jul 28 '17 at 06:48
  • 3
    According to Aaron Toponce: Data in the dataset is deduplicated. The data is matched against all the data in the pool, which includes data outside of that dataset. However, data in other datasets is not deduplicated. In other words, deduplication is handled per dataset, but the data that it's being deduplicated against can be any block on the pool, in or out of the dataset itself. – Thomas Jul 28 '17 at 10:07

1 Answers1

16

ZFS deduplication is pool-wide. Identical blocks in all datasets that have dedup=on will be shared.

  • To understand better: I can turn on deduplication per dataset, and those datasets are all checked deduplicating. So if I have 2 datasets in 2 different spots of the same zpool, they will both be deduplicated according to the pool's deduplication set rather than than their files correct?

    What if I have a single dataset with multiple same-files?

    – Sawtaytoes Dec 17 '22 at 07:56
  • Hey Matthew, is it possible to refine your answer? I see at https://linuxhint.com/zfs-deduplication/ , that it looks like you can put de-duplication on the filesystem as well. Please may I have you address this in your answer? Thank you. – TelamonAegisthus May 15 '23 at 02:23
  • @TelamonAegisthus His answer is pretty clear to me; the deduplication table will reference blocks in the entire pool for any dataset which has the dedup property set to on. If this is still confusing, I recommend reviewing basic zfs/zpool concepts. – Josh Enders Dec 17 '23 at 20:26