I am currently working in a students project (machine learning) where we get access to a companies resources. They store their data on Windows servers, but we use Linux machines to access the data. It seems not to be possible to set up quotas. The reason seems to be that the data is stored on a Windows server and that my advisers don't have access to the machine where it is stored. The problem is that it happens once in a while that students use ENORMOUS amounts of disk space accidentally which leads to ermormous wastes of space in backups. For example, I trained a model for 3 days and created snapshots of the model on a regular basis. This resulted in 100GB disk usage. This is a problem.
Is it possible to prevent something like this?
I was thinking about a CRON job which executes for every user who is logged in every 30min or so. The CRON job checks the disk usage in the users home folder (e.g. du -s .
) and kills all jobs of the user if he uses too much memory. My adviser had concerns that this would cost a significant amount of time to calculate (CPU time).
I've just tried it and the first execution of du -s .
takes significantly longer then subsequent executions. Why is that the case?
Would my proposed solution work or are there better solutions in the environment I described? (We have root access to the machines we use, but not to the machine where our home folders are)