2

Consider a ZFS Pool with deduplication enabled. Now consider multiple directory trees that contain identical content [D1 thru D10]. The first [D1] takes up TB of "real" disk space. The others [D2-D9]have some bookkeeping overhead, but are basically re-using the same blocks...

If one of those threes was deleted, the free space on the volume should not change (appreciably). IT would only change once ALL of the trees had been deleted....

So far so good.

But if there as another tree that had it's own data [Q1].... this takes up TD of data...

Now if I do an "ls" I see 11 directories... D1 thru D10, and Q1....

And we are ready for the base question: How can I tell that deleting Q1 will free real space, but deleting any single D? directory will NOT? How can I quantify what the recovered diskspace for any given (hypothetical) operation would be???

  • Welcome to Unix & Linux. Please take the tour to learn how Unix & Linux works. Read how to ask to improve the quality of your question. Then take a look to the help center to see if some on-topic questions are already asked. – freezed Mar 21 '23 at 21:43
  • Thanks for the welcome, took the tour. Checked that this topic has not been directly addressed before (at least that I can find). If you have something constructive so I can determine the actual differential space impact of specific directory trees [i.e. how many of the content blocks has a use-count of exactly 1] from a linux command line, that would be great. – David V. Corbin Mar 22 '23 at 12:54
  • I cannot figure out your bracket notation, could you include some command line returns, please? – freezed Mar 23 '23 at 00:00
  • 03/23/2023 01:08 PM d1

    03/23/2023 01:08 PM

    d2

    03/23/2023 01:08 PM

    d3

    03/23/2023 01:08 PM

    d4

    03/23/2023 01:08 PM

    d5

    03/23/2023 01:08 PM

    d6

    03/23/2023 01:08 PM

    d7

    03/23/2023 01:08 PM

    d8

    03/23/2023 01:08 PM

    d9

    03/23/2023 01:09 PM

    q1
    – David V. Corbin Mar 23 '23 at 17:11
  • The contents of D1 thru D9 directories are identical. Therefore deleting any one of these sub-directories will not free up significant space, it will just reduce the block reference counts from 9 to 8.... However Q1 has completely different 10GB of data, with NO overlap with the data in the D* direcoties, so deleting it will fee up 10GB...

    Now, how can I tell this from a command line?

    – David V. Corbin Mar 23 '23 at 17:11
  • I don't have time to actually try this out, but what is the behavior of du on those directories; does it pretend that each takes up the whole (say) 10GB of storage, or does it list only the first as taking up space? – dhag Mar 23 '23 at 17:35
  • @dhag - Assuming each wqas 10GB it would show that for each with 100GB as the total. EVen worse, if the drive was a 120GB (for example),then df would only show 20GB free ... even tough the actual usage was barely 10% of the drive. This means that attempting to upload a few more copies (of the identical duplicate content) fails with "insufficient space" (even tough there would be miniscule space actually consumed and 80% of the blocks are indeed free... – David V. Corbin Mar 24 '23 at 12:14
  • So far the only way I have been able to find to addres the later is to manually have to keep lying and adjusing the drieve size... so in the first I sould bump it to a 200GB size so it thought there 100GB free (10x10GB used) – David V. Corbin Mar 24 '23 at 12:14
  • Trying things out, du does what you say (deduplicated blocks are counted multiple times), but df shows completely unexpected results; i.e., the volume's size increases as I create more deduplicated files, and decreases as I delete them. All this to say that I probably don't know what the proper tool would be; there has to be something that lists a file's blocks, and could thus help find which are used by multiple files. Sorry! – dhag Mar 25 '23 at 14:11

0 Answers0