Migration to MASS failed due to undefined ROSE_TASK_CYCLE_TIME

vanniere · 15 July 2021 11:13

Hi CMS

I am running the suite u-cg162 on the ecmwf machine. After post-processing, the outputs should be moved to MASS. The transfer to MASS failed due to an undefined environment variable :

[FAIL] [UNDEFINED ENVIRONMENT VARIABLE] ROSE_TASK_CYCLE_TIME

u-cg162 is a copy of u-bv806 with minor changes in the um physical settings. u-bu806 had no trouble executing the file transfer with moose a few months ago.

I found that the ticket #3055 reported a similar error on Archer, which was due to an update of the rose version. I am not sure how to check the consistency of the rose version I use.

Many thanks
Benoit

AnnetteOsprey · 15 July 2021 12:43

Hi Benoit,

The issue with that ticket was a mismatch in rose version from the machine where you submit the suite (puma), and the HPC (Archer). So it might be a similar issue for you. I can’t seem to log in to ECMWF right now. Can you try checking the rose versions on ecgate and the HPC by running the following on both machines:

rose -V

Annette

vanniere · 15 July 2021 13:09

Hi Annette

rose -v gives the same version of rose on both sides ( ecgate / hpc ) :
Rose 2019.01.3 (/perm/ms/gb/frmi/rose-2019.01.3)

This is the same version that appears at the start of both jobs (the one which worked a few months ago and the one which fails now).

Thanks
Benoit

AnnetteOsprey · 19 July 2021 10:12

Hi Benoit,

I have had a look on ECMWF and I’m not quite sure what is going on. Can you try adding a couple of debugging lines to that job script and re-running it directly.

Go to the directory ~ukbv/cylc-run/u-cg162/log/job/20160801T0000Z/moose_only/04

And edit the job file to add some lines just before the rose task-run line, as follows:

# SCRIPT:
env
rose -V
rose task-run ...

Then just submit the job script: qsub job. It should overwrite the previous job.out and job.err when it’s done.

Annette

vanniere · 19 July 2021 12:47

Hi Annette

I think the job is failing before reaching #SCRIPT as I didn’t get any change when I added the extra commands after #SCRIPT. So I added “env” and “rose -V” after # ENV-SCRIPT too.

Among the environment variables is listed :
ROSE_VERSION=2019.01.3
However, ROSE_TASK_CYCLE_TIME is not listed.

rose -V gives
Rose 2019.01.3 (/perm/ms/gb/frmi/rose-2019.01.3)

Thanks
Benoit

AnnetteOsprey · 22 July 2021 15:06

Hi Benoit,

After a lot of investigating, the problem seems to be due to a recent upgrade to the default python2 module. The new version works OK on the normal nodes, but not the moose nodes which are a different architecture.

The simplest solution would just be to revert to the older module. Edit your .user_profile to specify:

module load python/2.7.15-01

Annette

vanniere · 22 July 2021 15:15

Hi Annette

Thank you very much for your time on that issue!
I’ll make that change.

All the best
Benoit

system · 24 July 2021 15:15

This topic was automatically closed 2 days after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Failure from rose suite-run (follow-up) ARCHER2	6	35	22 May 2025
Failure from rose suite-run Rose/Cylc and FCM ARCHER2	11	96	14 May 2025
Rose HTTP error? Rose/Cylc and FCM ARCHER2	2	288	17 March 2022
RosePopenError: rose: command not found Rose/Cylc and FCM	4	51	18 March 2025
Rosie go unresponsive / time mis-match Rose/Cylc and FCM ARCHER2	3	131	15 September 2023

Migration to MASS failed due to undefined ROSE_TASK_CYCLE_TIME

Related topics