Thank you for doing squeue. The job in question was not listed on your output, however: it’s u-cq915. Doing the command myself I get:
48731862 long-seri u-cq915. tmarthew PD 0:00 1 (ReqNodeNotAvail, Reserved for maintenance)
Is the “ReqNodeNotAvail” the problem? Does this mean the job will sit there until tomorrow after the maintenance?
I did an squeue -p long-serial, and I see:
49227560 long-seri u-ct751. tmarthew PD 0:00 1 (ReqNodeNotAvail, Reserved for maintenance)
49228185 long-seri u-cv906. tmarthew PD 0:00 1 (ReqNodeNotAvail, Reserved for maintenance)
49228084 long-seri u-cv905. tmarthew PD 0:00 1 (ReqNodeNotAvail, Reserved for maintenance)
49229358 long-seri u-cs296. tmarthew PD 0:00 1 (ReqNodeNotAvail, Reserved for maintenance)
There is scheduled JASMIN maintenance tomorrow.
On Mon, Apr 24, 2023 at 11:27 AM Toby Marthews wrote:
Apologies to email you rather than Jasmin IT Support, but my previous problem with Jasmin queues seems to be arising again.
I have a suite u-cq915. I compiled and submitted it on 18th April and it has been sitting there ‘submitted’ since then (see screenshot attached from today).
The CPU loads on Jasmin don’t seem to be extreme (see below), so I’m wondering why I seem to be stuck in a queue like this. My walltime on this job is the maximum of 1 week (168 hrs) and when it gets there tomorrow it’ll just fail my job without ever having actually run it.
Before it does that tomorrow, could I ask whether there are any diagnostics you can run on the queue to help me understand why this is happening? It doesn’t happen to all my jobs (many just go straight to ‘running’) but every once in a while I get a job which falls into this ‘submitted’ waiting area and then eventually fails (because the walltime applies to the whole job, this ‘submitted waiting time’ is included). Obviously, I lose a week of waiting every time this happens (!).
Have I kind of used up my ‘credits’ on Jasmin by doing too many jobs over the last few months perhaps?
Anything you can tell me to shed further light on this situation would be much appreciated.
** JASMIN shared host status at 10:31:47 on 2023-04-24 **
Average load on each VM over the last hour:
Host Users Free memory CPU
sci1.jasmin.ac.uk 17 25.2G 31.0%
sci2.jasmin.ac.uk 19 9.0G 2.0%
sci3.jasmin.ac.uk 40 1017.8G 9.0%
sci4.jasmin.ac.uk 16 30.0G 29.0%
sci5.jasmin.ac.uk 19 26.6G 2.0%
sci6.jasmin.ac.uk 39 736.6G 24.0%
sci8.jasmin.ac.uk 31 270.7G 73.0%
**Dr Toby Marthews