[MITgcm-devel] sx8 testing
Jean-Michel Campin
jmc at ocean.mit.edu
Tue May 5 12:48:33 EDT 2009
Hi Martin,
The first fail (for adjustment.cs-32x32x1.nlfs ) is perfectly
normal: I tweaked the mpi test so that it compiles & run with a
different # of tiles corresponding to 44 = 48 - 4 blank-tiles,
to so that the blank-tiles get tested. It should only work for
the 1rst test (input) and fail for the 2nd (input.nlfs)
when checking tile connection (in this new S/R exch2_check_depths).
I've just realised that I did not document this very well in doc/tag_index,
(there was more comments in the cvs checking msg) and will change this.
I have no idea concerning
aim.5l_cs, global_ocean.cs32x15.icedyn & global_ocean.cs32x15.thsice
Unrelated:
All the _RL/_RS/ & Macros changes that I made last week seem to
benefit to your "rays" (sunos_sun4u_g77) test:
solid-body.cs-32x32x1 is now passing (this is one of the 2 where
RS=real*4) but was failing before.
Thanks,
Jean-Michel
On Tue, May 05, 2009 at 08:59:52AM +0200, Martin Losch wrote:
> Hi Jean-Michel,
>
> finally I managed to have a quick look at the sx8 tests. I managed to
> get them going again (I had to remove the automatic restart test that
> was somehow stalling and I had to modify my hack for making the script
> wait for the "qsubbed" job to finish), but now there are still a few
> failures, which I do not understand. Most of them may be related to my
> hacks for testing but some are not, here are some error messages and
> comments from yesterdays test, maybe you have an idea, what's going on
> with the first one.
> Cheers,
> Martin
>
>> sx8::verification> cat adjustment.cs-32x32x1/tr_run.nlfs/STDERR.*
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile # 4
>> (bi,bj= 4, 1 ):
>> (PID.TID 0000.0001) *** ERROR *** E.Edge has 8 unconnected points
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile # 6
>> (bi,bj= 6, 1 ):
>> (PID.TID 0000.0001) *** ERROR *** E.Edge has 8 unconnected points
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile # 9
>> (bi,bj= 9, 1 ):
>> (PID.TID 0000.0001) *** ERROR *** N.Edge has 16 unconnected points
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile # 10
>> (bi,bj= 10, 1 ):
>> (PID.TID 0000.0001) *** ERROR *** N.Edge has 16 unconnected points
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile # 15
>> (bi,bj= 11, 1 ):
>> (PID.TID 0000.0001) *** ERROR *** S.Edge has 16 unconnected points
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile # 16
>> (bi,bj= 12, 1 ):
>> (PID.TID 0000.0001) *** ERROR *** S.Edge has 16 unconnected points
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile # 25
>> (bi,bj= 21, 1 ):
>> (PID.TID 0000.0001) *** ERROR *** S.Edge has 7 unconnected points
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** EXCH2_CHECK_DEPTHS: tile # 26
>> (bi,bj= 22, 1 ):
>> (PID.TID 0000.0001) *** ERROR *** S.Edge has 7 unconnected points
>> with non-zero depth.
>> (PID.TID 0000.0001) *** ERROR *** S/R EXCH2_CHECK_DEPTHS: Fatal Error
>> (PID.TID 0000.0001) *** ERROR *** occurs 1 time(s) among all
>> Threads and Procs
>> (PID.TID 0001.0001) *** ERROR *** occurs 1 time(s) among all
>> Threads and Procs
>>
>>
>> aim.5l_cs:
>> sx8::run> cat /home/sx8/mlosch/out_sxf90
>> MPI process (universe 0, rank 0) terminated by signal(9); Kill
>> sx8-2: mpid: MPI process terminated by signal(9)
>> MPI process (universe 0, rank 1) terminated by signal(9); Kill
>> sx8-2: mpid: MPI process terminated by signal(9)
>>
>>
>> fizhi-cs-32x32x40:
>> unclear, but was never OK
>>
>> global_ocean.cs32x15.icedyn, global_ocean.cs32x15.thsice:
>> complete with output.txt, no idea what went wrong, something in my
>> testing scheme
>>
>> lab_sea:
>> cat STDERR.000*
>> (PID.TID 0001.0001) *** ERROR *** NetCDF ERROR:
>> (PID.TID 0001.0001) *** ERROR *** MNC ERROR: opening 'mnc_test_0001/
>> phiHydLow.0000000000.t004.nc'
>> most likely that has to do with memory issues (for some reason the SX8
>> allocates a lot of memory,
>> just for opening a netcdf file. If you have too many netcdf files,
>> than you need a lot memory,
>> so that the 32GB I ask for are not enough).
>>
>>
> On Apr 15, 2009, at 8:57 AM, Martin Losch wrote:
>
>> Hi Jean-Michel,
>>
>> there have been a few problems with the SX8: a file system had crashed
>> and there was some temporary rearrangement of the remaining systems, in
>> particular a scratch system that I use for the tests was probably not
>> available all the time.
>>
>> What happened last weekend is not clear to me, but I can compile these
>> experiments by hand that failed during testreport. It may have to do
>> with my scripting (which I haven't changed, but maybe the machine
>> changed). I'll rerun testreport by hand and then we'll see what
>> happens.
>>
>> Martin
>> On Apr 14, 2009, at 8:05 PM, Jean-Michel Campin wrote:
>>
>>> Hi Martin,
>>>
>>> Welcome back !
>>> I don't know what happened to the sx8 testing (looks like we missed
>>> one in late March and an other beginning of April) and the latest
>>> has a sequence of fails in the middle of the list.
>>> I made some changes in genmake2 and I hope it's not causing
>>> those problems.
>>>
>>> And just a comment regarding the pkg/seaice stuff: I was feelling
>>> a little embarassed when I insisted to have the old code still
>>> available with a CPP flag, but now I don't regret it too much.
>>>
>>> Cheers,
>>> Jean-Michel
>>> _______________________________________________
>>> MITgcm-devel mailing list
>>> MITgcm-devel at mitgcm.org
>>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>>
>> _______________________________________________
>> MITgcm-devel mailing list
>> MITgcm-devel at mitgcm.org
>> http://mitgcm.org/mailman/listinfo/mitgcm-devel
>
> _______________________________________________
> MITgcm-devel mailing list
> MITgcm-devel at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-devel
More information about the MITgcm-devel
mailing list