[MITgcm-devel] sx8 testing

Jean-Michel Campin jmc at ocean.mit.edu
Tue May 5 12:48:33 EDT 2009


Hi Martin,

The first fail (for adjustment.cs-32x32x1.nlfs ) is perfectly
normal: I tweaked the mpi test so that it compiles & run with a 
different # of tiles corresponding to 44 = 48 - 4 blank-tiles, 
to so that the blank-tiles get tested. It should only work for
the 1rst test (input) and fail for the 2nd (input.nlfs)
when checking tile connection (in this new S/R exch2_check_depths).

I've just realised that I did not document this very well in doc/tag_index,
(there was more comments in the cvs checking msg) and will change this.

I have no idea concerning
aim.5l_cs, global_ocean.cs32x15.icedyn & global_ocean.cs32x15.thsice

Unrelated:
All the _RL/_RS/ & Macros changes that I made last week seem to 
benefit to your "rays" (sunos_sun4u_g77) test:
solid-body.cs-32x32x1 is now passing (this is one of the 2 where
RS=real*4) but was failing before.

Thanks,
Jean-Michel

On Tue, May 05, 2009 at 08:59:52AM +0200, Martin Losch wrote:
> Hi Jean-Michel,
>
> finally I managed to have a quick look at the sx8 tests. I managed to  
> get them going again (I had to remove the automatic restart test that  
> was somehow stalling and I had to modify my hack for making the script  
> wait for the "qsubbed" job to finish), but now there are still a few  
> failures, which I do not understand. Most of them may be related to my  
> hacks for testing but some are not, here are some error messages and  
> comments from yesterdays test, maybe you have an idea, what's going on  
> with the first one.
> Cheers,
> Martin
>
>> sx8::verification> cat adjustment.cs-32x32x1/tr_run.nlfs/STDERR.*
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile #     4  
>> (bi,bj=   4,   1 ):
>> (PID.TID 0000.0001) *** ERROR *** E.Edge has    8 unconnected points  
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile #     6  
>> (bi,bj=   6,   1 ):
>> (PID.TID 0000.0001) *** ERROR *** E.Edge has    8 unconnected points  
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile #     9  
>> (bi,bj=   9,   1 ):
>> (PID.TID 0000.0001) *** ERROR *** N.Edge has   16 unconnected points  
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile #    10  
>> (bi,bj=  10,   1 ):
>> (PID.TID 0000.0001) *** ERROR *** N.Edge has   16 unconnected points  
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile #    15  
>> (bi,bj=  11,   1 ):
>> (PID.TID 0000.0001) *** ERROR *** S.Edge has   16 unconnected points  
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile #    16  
>> (bi,bj=  12,   1 ):
>> (PID.TID 0000.0001) *** ERROR *** S.Edge has   16 unconnected points  
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile #    25  
>> (bi,bj=  21,   1 ):
>> (PID.TID 0000.0001) *** ERROR *** S.Edge has    7 unconnected points  
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile #    26  
>> (bi,bj=  22,   1 ):
>> (PID.TID 0000.0001) *** ERROR *** S.Edge has    7 unconnected points  
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** S/R EXCH2_CHECK_DEPTHS: Fatal Error
>> (PID.TID 0000.0001) *** ERROR *** occurs    1 time(s) among all  
>> Threads and Procs
>> (PID.TID 0001.0001) *** ERROR *** occurs    1 time(s) among all  
>> Threads and Procs
>>
>>
>> aim.5l_cs:
>> sx8::run> cat /home/sx8/mlosch/out_sxf90
>> MPI process (universe 0, rank 0) terminated by signal(9); Kill
>> sx8-2: mpid: MPI process terminated by signal(9)
>> MPI process (universe 0, rank 1) terminated by signal(9); Kill
>> sx8-2: mpid: MPI process terminated by signal(9)
>>
>>
>> fizhi-cs-32x32x40:
>> unclear, but was never OK
>>
>> global_ocean.cs32x15.icedyn, global_ocean.cs32x15.thsice:
>> complete with output.txt, no idea what went wrong, something in my  
>> testing scheme
>>
>> lab_sea:
>> cat STDERR.000*
>> (PID.TID 0001.0001) *** ERROR *** NetCDF ERROR:
>> (PID.TID 0001.0001) *** ERROR *** MNC ERROR: opening 'mnc_test_0001/ 
>> phiHydLow.0000000000.t004.nc'
>> most likely that has to do with memory issues (for some reason the SX8 
>> allocates a lot of memory,
>> just for opening a netcdf file. If you have too many netcdf files,  
>> than you need a lot memory,
>> so that the 32GB I ask for are not enough).
>>
>>
> On Apr 15, 2009, at 8:57 AM, Martin Losch wrote:
>
>> Hi Jean-Michel,
>>
>> there have been a few problems with the SX8: a file system had crashed 
>> and there was some temporary rearrangement of the remaining systems, in 
>> particular a scratch system that I use for the tests was probably not 
>> available all the time.
>>
>> What happened last weekend is not clear to me, but I can compile these 
>> experiments by hand that failed during testreport. It may have to do 
>> with my scripting (which I haven't changed, but maybe the machine 
>> changed). I'll rerun testreport by hand and then we'll see what 
>> happens.
>>
>> Martin
>> On Apr 14, 2009, at 8:05 PM, Jean-Michel Campin wrote:
>>
>>> Hi Martin,
>>>
>>> Welcome back !
>>> I don't know what happened to the sx8 testing (looks like we missed
>>> one in late March and an other beginning of April) and the latest
>>> has a sequence of fails in the middle of the list.
>>> I made some changes in genmake2 and I hope it's not causing
>>> those problems.
>>>
>>> And just a comment regarding the pkg/seaice stuff: I was feelling
>>> a little embarassed when I insisted to have the old code still
>>> available with a CPP flag, but now I don't regret it too much.
>>>
>>> Cheers,
>>> Jean-Michel
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel



More information about the MITgcm-devel mailing list