Monsoon - Disk quota exceeded

Dear CMS User Desk

I am running a suite (u-ct417) that has successfully run before, but it is now failing with disk quota exceeded:
lgray@xcslc0:~/cylc-run/u-ct471/log/job/20090101T0000Z/atmos_main_r001i1p00000/NN> ls
job job-activity.log job.err job.out job.status
lgray@xcslc0:~/cylc-run/u-ct471/log/job/20090101T0000Z/atmos_main_r001i1p00000/NN> more job.err
[FAIL] [Errno 122] Disk quota exceeded
2023-01-18T02:15:35Z CRITICAL - failed/EXIT

However, it does not look as though my /projects disk quota has been exceeded - my understanding is that we have 8T quota and only 2.7T is being used:
lgray@xcslc0:/projects> du -sh step
2.7T step

I’ve submitted this several times, but it repeatedly fails (there is nothing else being run under the step project).

Any ideas what the problem might be?
thank you
Lesley

Hi Lesley,

Sorry, I can’t see quotas from other projects can you run the following for me please and post the output on the ticket:

  • quota.py -u lustre_home

  • quota.py -g step lustre_multi

Regards,
Ros.

Dear Ros, Dave
I checked to make sure all the previous fixes are in the u-ct417 suite - it was copied from u-ct078, which had previously run successfully (and I had done a commit fcm on ct078, so all the fixes were copied across).

But I will ask Monsoon team to remove me from acsis, just in case this is still the problem.

Here is the output from Ros’s disk quota queries:

lgray@xcslc0:/projects/step> quota.py -u lustre_home
Disk quotas for user lgray (uid 30359):
Filesystem TB Quota % | Files Quota %
-------------- ------- ------- ------ | --------- --------- ------
/.lustre_home 1.02 1.02 100.10 | 238965 0 0.00

lgray@xcslc0:/projects/step> quota.py -g step lustre_multi
Disk quotas for group step (gid 40151):
Filesystem TB Quota % | Files Quota %
-------------- ------- ------- ------ | --------- --------- ------
/.lustre_multi 2.58 8.00 32.23 | 193803 0 0.00

Lesley

Hi Lesley,

So looks like you’ve max’ed out on your /home quota. Try deleting some files there and resubmitting.

I don’t know if you’re aware but although you have a 1Tb quota, /home directories are only backed up if they are 10G or less.

Cheers,
Ros.

Thank you so much - by mistake I’d filled my home directory with files that should have been moved across to the /projects/step directories.

I’ve resubmitted the suite - I’m pretty sure it will run OK now!
thanks again
Lesley

Hi Ros

Can I pick up on this disk quota issue again please.

I have another suite fall over because of disk quota problems.

If I do:
lgray@xcslc0:~> quota.py -u lustre_home
Disk quotas for user lgray (uid 30359):
Filesystem TB Quota % | Files Quota %
-------------- ------- ------- ------ | --------- --------- ------
/.lustre_home 1.02 1.02 100.07 | 235936 0 0.00

it does indeed look as though I’ve gone over my allocation.

But if I go into my home directory and do
du -sh .
1.4G .
it doesn’t look anything like 1Tb.

I tried deleting some data from my home directory and then repeated your quota.py, but it still says I have 1.02 Tb, so deleting data from home directory is having no effect.

So I don’t understand where the 1.02Tb of data is, so I’m not sure what I can delete to avoid this quota issue. What do you suggest?

thanks
Lesley

(In case it’s relevant, the group space is well under quota:
Disk quotas for group step (gid 40151):
Filesystem TB Quota % | Files Quota %
-------------- ------- ------- ------ | --------- --------- ------
/.lustre_multi 2.53 8.00 31.62 | 193466 0 0.00

Hi Lesley,

I have to confess, I’m confused. I think the information that quota.py runs off of gets updated once a day but if that 1.4Gb was before you started deleting files I don’t understand.

Can you try running:
xcs-c$ lfs quota -u lgray /home
and let me know what the output is please.

I’ll talk to the Met Office and see what they have to say.

Cheers,
Ros.

Hi Ros
lgray@xcslc0:~> lfs quota -u lgray /home
Disk quotas for user lgray (uid 30359):
Filesystem kbytes quota limit grace files quota limit grace
/home 1096039644* 0 1094713344 - 201258 0 0 -

Last night I deleted several branches (4) that I no longer use (I had 1.6Gb and managed to get it down to 1.4 Gb) and tried re-running my suite but it still fell over.

I do have lots of suites in my roses directory - perhaps I should prune those too.

thanks
Lesley

Hi Ros
Thanks for your help on this - I’ve pasted Roger’s reply below, in case it’s of help to others.

Can you tell me how is the best way to prune some of the old suites that I no longer need?

I’d like to remove them from the list of suites I get when I use ‘rosie go’ if possible. I can remove them from my roses directory, but is there anything else I should do, to make sure it’s all clean and consistent?
thanks a lot
Lesley

Hi Ros, Lesley,
I think this is due to the fact that that /home quota is shared with /working (and similarly /projects with /scratch), which is due to the underlying storage architecture of the machine, and frequently causes confusion.
So Lesley should reduce her /home usage (ideally to below 10GB, because anything over that means that the backup is suspended), and /working in order to have sufficient disk space.

quota.py runs off cached information generated every morning I believe.

There is also the “file-listing” utility to investigate the age and distribution of files across /projects, /home, and /working. This works off information that may be up to three weeks old (that’s how long it takes to gently index all the data on XCS without impact user access), but is useful for finding old files that haven’t been accessed in a while. You need to login with the “-X” option enabled as the following commands will open a firefox session at a page showing your information…

file-listing .file_listing/home.xz
file-listing .file_listing/working.xz
file-listing /projects/.file_listing/step.xz
etc…

This is described in the Monsoon user guide at Met Office Science Repository Service, and Met Office Science Repository Service. but this could be clearer. I’ll discuss updating this with Mahammed.

Let me know if you continue to have problems.
Roger