[MITgcm-support] building with MPI on a dual-core mac

Klymak Jody jklymak at uvic.ca
Wed Jul 22 22:52:14 EDT 2009


Thanks a lot Constantinos,

that is very clear!

(PID.TID 0000.0001)   Seconds in section "ALL                     
[THE_MODEL_MAIN]":
(PID.TID 0000.0001)           User time:   1318.5500146746635
(PID.TID 0000.0001)         System time:   522.38001954555511
(PID.TID 0000.0001)     Wall clock time:   1977.2751381397247

For 2 machines over gigabit enet.  For me, the 10% imbalance is  
probably acceptable given the cost of infiniband cards....

Cheers,  Jody

On 22-Jul-09, at 7:33 PM, Constantinos Evangelinos wrote:

> On Tuesday 21 July 2009 5:22:00 pm Klymak Jody wrote:
>
>> While I suppose I can guess, what is the technical difference between
>> "user", "system" and "wallclock"?  I suppose a large difference
>> between "wallclock" and "system" means lots of MPI overhead?
>
> user time "u" is processor time spent on behalf of a process in user
> (non-privileged) code.
> system time "s" is processor time spent on behalf of a process in the
> operating system kernel (or equivalent, depending on the operating  
> system)
> wallclock time "w" is self explanatory.
>
> Essentially all of your computational code should be user time. Part  
> of your
> I/O will count as user and part as system time (how big each part is  
> depends
> on the O/S). Communication time is treated similarly to I/O  
> (communicating
> over Ethernet has a significant system component - and corresponding  
> overhead
> of switching to kernel mode - communicating over a high speed  
> interconnect
> like Myrinet, Infiniband etc. should be mainly user time).  Moreover  
> time
> spent waiting for data (with the process not relinquishing the cpu  
> busy
> spinning away) counts as user or system time (e.g. time spent  
> waiting for
> data from main memory or time spent waiting for data from disk). If  
> however
> the O/S switches control of the cpu away from an idling process,  
> that time
> only counts as wallclock time.
>
> Given the above, u+s <= w (to within precision - u and s usually are  
> to 0.01s
> while w is to 1us.) and ideally u+s=w (that would indicate a process  
> that
> spends no time idling waiting for data). A large discrepancy between  
> u+s and
> w indicates either load imbalance or significant network or disk I/O  
> issues.
>
> Constantinos
> -- 
> Dr. Constantinos Evangelinos
> Department of Earth, Atmospheric and Planetary Sciences
> Massachusetts Institute of Technology
>
> _______________________________________________
> MITgcm-support mailing list
> MITgcm-support at mitgcm.org
> http://mitgcm.org/mailman/listinfo/mitgcm-support




More information about the MITgcm-support mailing list