Trying out JULES Fluxnet u-al752

Hi Patrick, I’m working from home and it seems to only work when I connect to Uni of Bristol VPN (I’m currently at Uni of Exeter but still have my username from my old job at Bristol). Do you know why that’s the case?

I can run u-al752 now - thank you so much for all your help!

A bit of a cheeky question, I tried running u-au253 this morning and my fcm_make failed with this error:
/apps/slurm/spool/slurmd/job7044469/slurm_script: line 64: /usr/local/bin/prg_ifort-12.0: No such file or directory

Do you know how to fix this?

Thanks,

Ayesha

Hi Ayesha:
I am glad you have a VPN from Bristol to use, so that ssh works with Xwindows. Are you using ssh -AY with login2? That might work in more places than ssh -AX with login1. You might talk to the University of Exeter people and/or the JASMIN people if you continue to have ssh problems.

If you want to use login1 (it is better to use login1 if possible), did you do the step about reverse_dns_check in Login problems? - JASMIN help docs ?

About your error with u-au253, you can do:

  1. cd ~/roses/u-au253
  2. grep -r ifort *

This will tell you which file is trying to open the ifort file. You’ll see that the path to the ifort file doesn’t exist on JASMIN. Did you make the necessary changes to the suite to make it work on JASMIN? (i.e., have you ported the suite to JASMIN?)

When I type ifort at the cylc1 command line, then ifort is there. So maybe you need to change those ifort lines in the suite?

Please create a new ticket on the CMS Helpdesk if you need or want further help with this other suite.
Patrick

Hi,

When connected to the Bristol VPN, I used login1 and managed to cache my password and use the Rose GUI.

When I do reverse_dns_check I get this:

External IP address: 144.173.23.121
Reverse DNS lookup failed

Thank you for the help with u-au253. I’ve started a new ticket on the helpdesk to ask for more advice.

Thanks,

Ayesha

Hi Ayesha:
I am glad you can use login1 with the Bristol VPN. I am surprised a little that it works even though your Reverse DNS lookup failed.
If you need to get your Reverse DNS working, then you might speak with the Bristol IT and/or inquire with the JASMIN support email.
Patrick

Hi Patrick,

Sorry, me again! I tried fixing this by myself but not having too much luck.
When I run u-al752 it fails at the make_plot stage (it says submit-failed).

On job.err it says:
ERROR: file not found: /home/users/ash221/cylc-run/u-al752/log/job/1/make_plots/01/job.err

This is what is says on the job-activity.log on /home/users/ash221/cylc-run/u-al752/log/job/1/make_plots/01:
[jobs-submit cmd] cylc jobs-submit – /home/users/ash221/cylc-run/u-al752/log/job 1/make_plots/01
[jobs-submit ret_code] 1
[jobs-submit out] 2022-07-19T13:09:08+01:00|1/make_plots/01|1|None
2022-07-19T13:09:08+01:00 [STDERR] sbatch: error: Batch job submission failed: Requested time limit is invalid (missing or exceeds some limit)
[((‘event-mail’, ‘submission failed’), 1) ret_code] 0
~

Can you please help? I’m not getting any output files.

Thanks, Ayesha

Hi Ayesha:
If you give us read permission to your home directory and subdirectories, I could take a look at your setup and log files. You can do this with:
chmod -R g+rX /home/users/ash221/

If you have anything private or confidential, you might want to change back the read access on those items.
Patrick

Hi, I just gave you permission. Thanks, Ayesha

Hi Ayesha:
The job.err file is not there because the make_plots app hasn’t run yet.

I looked from the command line with
vi /home/users/ash221/cylc-run/u-al752/log/job/1/make_plots/01/job-activity.log
And I do see the error that you report:
sbatch: error: Batch job submission failed:
Requested time limit is invalid (missing or exceeds some limit)

(Often it is easier to look at the log files from the command line instead of in the cylc GUI.)

So the problem is that your job file:
/home/users/ash221/cylc-run/u-al752/log/job/1/make_plots/01/job
says:
#SBATCH --partition=test
#SBATCH --time=08:00:00

The test queue only allows up to 4 hour runs. See:

You can change this with:
vi ~/roses/u-al752/site/suite.rc.CEDA_JASMIN
(Or with some other editor).
to:
[[PLOTTING_CEDA_JASMIN]
[[[directives]]]
--partition = test
--time = 04:00:00

It is quite possible that the plotting can’t get done in 4 hours. If you have permission to use the short-serial queue/partition, then you can run for 8 hours in there instead of in the test queue/partition. You can even run up to 48 hours in the short-serial partition.

After you change it, then:

  1. cd ~/roses/u-al752
  2. do a rose suite-run --reload
  3. if a GUI doesn’t pop up, then do a rose sgc
  4. and then in the GUI, use a right mouse-click to retrigger the make_plots app.
  5. the 2nd-iteration version of the job file /home/users/ash221/cylc-run/u-al752/log/job/1/make_plots/02/job should then have the new wallclock time limit in it, and it should be able to start running when the queue lets it run.

Patrick

Hi Patrick,

Thanks for your help! I’ve done the change and now ~/roses/u-al752/site/suite.rc.CEDA_JASMIN looks like:
[[PLOTTING_CEDA_JASMIN]]
inherit = None, JASMIN_LOTUS
env-script = “”"
eval $(rose task-env)
export PATH=/apps/jasmin/metomi/bin:$PATH
module load jaspy/2.7
module list 2>&1
env | grep LD_LIBRARY_PATH
“”"
(#) [[[remote]]]
(#) host = sci3
[[[directives]]]
–partition = test
–time = 04:00:00
–ntasks = 1

So …/make_plots/02/job has this at the start:
#SBATCH --partition=test
#SBATCH --time=04:00:00
#SBATCH --ntasks=1

But now when I retrigger the make_plots app it fails. The error log just says:
2022-07-20T10:28:50+01:00 CRITICAL - failed/EXIT

Do you know why it’s doing this?

Thanks,
Ayesha

Hi Ayesha
I don’t know exactly what is wrong right now. But you do need to change your paths in your rose-suite.conf file.

Right now, you have:

OUTPUT_FOLDER='work/scratch-nopw/ayeshahussain/fluxnet/u-al752/jules_output'
PLOT_FOLDER='work/scratch-nopw/ayeshahussain/fluxnet/u-al752/plots'

These are missing the initial slash / before work.

Furthermore, you don’t have any output data existing from your JULES runs in the OUTPUT_FOLDER path ‘/work/scratch-nopw/ayeshahussain/fluxnet/u-al752/jules_output’.

If you want to you data from your prior runs, you might want to change it to:
OUTPUT_FOLDER='/work/scratch-nopw/ayeshahussain/fluxnet/run11a/jules_output'
and maybe you want to change the PLOT_FOLDER to:
PLOT_FOLDER='/work/scratch-nopw/ayeshahussain/fluxnet/run11a/plots'

After you make the changes, then:

  1. cd ~/roses/u-al752
  2. rose suite-run --reload
  3. retrigger the make_plots app from the menu found with a right mouse click in the GUI.
  4. when it starts running, this job script file should have the correct paths for OUTPUT_FOLDER and PLOT_FOLDER:
    /home/users/ash221/cylc-run/u-al752/log/job/1/make_plots/08/job

Patrick

Hi Ayesha:
Is it working better now?
Patrick

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.

Hi Ayesha
I noticed an issue with make_plots for the u-al752 suite on JASMIN. Since your ticket has been closed, I just reopened the ticket, to respond properly.
The make_plots app fails for some weird reason unless this line is deleted in the [[PLOTTING_CEDA_JASMIN]] section of site/suite.rc.CEDA_JASMIN :
env | grep LD_LIBRARY_PATH
Does that help you?
Patrick