Migration to MASS failed due to undefined ROSE_TASK_CYCLE_TIME

Hi CMS

I am running the suite u-cg162 on the ecmwf machine. After post-processing, the outputs should be moved to MASS. The transfer to MASS failed due to an undefined environment variable :

[FAIL] [UNDEFINED ENVIRONMENT VARIABLE] ROSE_TASK_CYCLE_TIME

u-cg162 is a copy of u-bv806 with minor changes in the um physical settings. u-bu806 had no trouble executing the file transfer with moose a few months ago.

I found that the ticket #3055 reported a similar error on Archer, which was due to an update of the rose version. I am not sure how to check the consistency of the rose version I use.

Many thanks
Benoit

Hi Benoit,

The issue with that ticket was a mismatch in rose version from the machine where you submit the suite (puma), and the HPC (Archer). So it might be a similar issue for you. I can’t seem to log in to ECMWF right now. Can you try checking the rose versions on ecgate and the HPC by running the following on both machines:

rose -V

Annette

Hi Annette

rose -v gives the same version of rose on both sides ( ecgate / hpc ) :
Rose 2019.01.3 (/perm/ms/gb/frmi/rose-2019.01.3)

This is the same version that appears at the start of both jobs (the one which worked a few months ago and the one which fails now).

Thanks
Benoit

Hi Benoit,

I have had a look on ECMWF and I’m not quite sure what is going on. Can you try adding a couple of debugging lines to that job script and re-running it directly.

Go to the directory ~ukbv/cylc-run/u-cg162/log/job/20160801T0000Z/moose_only/04

And edit the job file to add some lines just before the rose task-run line, as follows:

# SCRIPT:
env
rose -V
rose task-run ...

Then just submit the job script: qsub job. It should overwrite the previous job.out and job.err when it’s done.

Annette

Hi Annette

I think the job is failing before reaching #SCRIPT as I didn’t get any change when I added the extra commands after #SCRIPT. So I added “env” and “rose -V” after # ENV-SCRIPT too.

Among the environment variables is listed :
ROSE_VERSION=2019.01.3
However, ROSE_TASK_CYCLE_TIME is not listed.

rose -V gives
Rose 2019.01.3 (/perm/ms/gb/frmi/rose-2019.01.3)

Thanks
Benoit

Hi Benoit,

After a lot of investigating, the problem seems to be due to a recent upgrade to the default python2 module. The new version works OK on the normal nodes, but not the moose nodes which are a different architecture.

The simplest solution would just be to revert to the older module. Edit your .user_profile to specify:

module load python/2.7.15-01

Annette

Hi Annette

Thank you very much for your time on that issue!
I’ll make that change.

All the best
Benoit

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.