JULES rose suite unable to run on Jasmin LOTUS2

Hi there,

I am working on JULES, using rose suite u-dg275 on Jasmin. This has previously run fine, but is now returning the error:

“Please verify that both the operating system and the processor support Intel(R) X87, CMOV, MMX, FXSAVE, SSE, SSE2, SSE3, SSSE3, SSE4_1, SSE4_2, POPCNT, F16C and AVX instructions.”

I note that this has previously been suggested as due to fire switches: Error Running a Suite on JASMIN on par-single. I am working on the fire module, however, I haven’t changed any settings since the suite began to fail.

Is this due to the LOTUS / LOTUS 2 switchover? The JASMIN documentation currently just states “more details to follow” on recent migrations / updates. JASMIN Help Site - Migration to Rocky Linux 9 2024

Please could you let me know what I might need to do to fix this?

All best

Ol Perkins

Hi Ol

I assume that you are submitting the job from the server cylc, in which case I don’t think the problem is LOTUS2. My understanding (based on Centre for Environmental Data Analysis - JASMIN updates: new cluster and maintenance day) is that only sci-ph-03 is currently connected to LOTUS2. Your problem might be due to changes made to the old LOTUS cluster on the recent maintenance day. I’ll be running one of my own JULES suites later today, and I’ll see whether I encounter errors similar to yours. I’ll let you know when I know more.

David

Hi David - thanks, that’s much appreciated.

Ol

Hi Ol

My suite ran fine, although that was on par-multi rather than par-single.

I believe that the error you are seeing happens when you try to run Intel-compiled code on an AMD node, but that shouldn’t be happening because suite.rc for u-dg275 has --constraint=“intel” in it.

Can you point me towards the log files from your run that failed? I’ll have a look at them and see whether I can work out what is going wrong.

David

Hi David,

Thanks so much for this. I am still getting the same error, so it must be something in the suite / code? It fails at RECON rather than fcm-make, which seems strange for an error like this.

The log files are at /home/users/tightasa/cylc-run/u-dg275. Let me know if you have issues accessing that and I can zip them over.

As an aside - I have been trying to get hold of a suite to run JULES vn7+ on JASMIN @ n96. I am currently working/prototyping with JULES vn5.4 (version from the last fire model intercomparison project), but need to update my branch changes for vn7. If you could point me to a good suite to use as a basis that would be really helpful.

All best

Ol

edit: correcting link address

Hi Ol

I can’t get into your home directory. Could you give me read access to it? The command is:

chmod g+rx ~

Best wishes
David

Hi David,

Apologies, I have set the access as you asked.

OP

Hi Ol

I think I see the problem. JULES has been compiled for Intel, but RECON submits it to the test partition, which no longer has any Intel nodes (see the output from sinfo -o ‘%16P %35N %f’). The quick solution is to edit suite.rc and under [[RECON]] to change

--partition = test

to

--partition = <some partition that still has Intel nodes>
--constraint = intel

I’m not an expert on JASMIN partitions, but since RECON also specifies --ntasks = 1, I think short-serial should work as the partition.

In the longer term we are all going to have to stop compiling code for Intel processors because there won’t be any on LOTUS2. I’m still working out how to do this myself.

Best wishes
David

Thanks so much, David. that is much appreciated. The suite appears to be running now. :+1: :slightly_smiling_face:

It would be great to stay in touch on the Intel issue - it’s an issue for several folks I know working on the model & nobody has yet been confident in fixing it! My understanding from the update page was that all LOTUS nodes would be inaccessible as early as February?

I wondered if it was worth emailing the JULES list about this?