Does ZFS deduplicates across datasets or only inside a single dataset?

Question

Does ZFS deduplicates across datasets or only inside a single dataset? In other words if I have two nearly identical volumes will they be deduplicated?

I would say it only affects the dataset. +1 I am curious to see what someone more experienced with zfs will say. — Rui F Ribeiro, Jul 28 '17 at 06:48
According to Aaron Toponce: Data in the dataset is deduplicated. The data is matched against all the data in the pool, which includes data outside of that dataset. However, data in other datasets is not deduplicated. In other words, deduplication is handled per dataset, but the data that it's being deduplicated against can be any block on the pool, in or out of the dataset itself. — Thomas, Jul 28 '17 at 10:07

score 16 · Accepted Answer · answered Jul 28 '17 at 19:01

16

ZFS deduplication is pool-wide. Identical blocks in all datasets that have dedup=on will be shared.

answered Jul 28 '17 at 19:01

Matthew Ahrens

176

To understand better: I can turn on deduplication per dataset, and those datasets are all checked deduplicating. So if I have 2 datasets in 2 different spots of the same zpool, they will both be deduplicated according to the pool's deduplication set rather than than their files correct?
What if I have a single dataset with multiple same-files?
– Sawtaytoes Dec 17 '22 at 07:56
Hey Matthew, is it possible to refine your answer? I see at https://linuxhint.com/zfs-deduplication/ , that it looks like you can put de-duplication on the filesystem as well. Please may I have you address this in your answer? Thank you. – TelamonAegisthus May 15 '23 at 02:23
@TelamonAegisthus His answer is pretty clear to me; the deduplication table will reference blocks in the entire pool for any dataset which has the dedup property set to on. If this is still confusing, I recommend reviewing basic zfs/zpool concepts. – Josh Enders Dec 17 '23 at 20:26

Does ZFS deduplicates across datasets or only inside a single dataset?

1 Answers1