Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Hardware - Troubleshooting and Discussion > Mac Desktops > Craig Hunter speaks about his NASA dual G5 benchmarks

Craig Hunter speaks about his NASA dual G5 benchmarks
Thread Tools
Eug Wanker
Posting Junkie
Join Date: Jun 2003
Location: Dangling something in the water… of the Arabian Sea
Status: Offline
Reply With Quote
Jul 10, 2003, 09:40 PM
 
See here. He also includes a summary of his numbers:

dual G4-1GHz Xserve (single CPU only): 105
dual G4-1GHz Xserve (both CPUs): 207
dual G4-1.25GHz PowerMac (single CPU only): 129
dual G4-1.25GHz PowerMac (both CPUs): 256
dual G5-2GHz PowerMac (single CPU only): 254
dual G5-2GHz PowerMac (both CPUs): 498
single P4 2GHz: 192
single P4 2.66GHz: 255
single P4 3.2GHz (extrapolated): 307

Compilers used:

G4/OSX: Absoft, NAG
I like them both. NAG catches bugs that compilers on many other platforms have missed. Performance varies from app to app, but generally, Absoft and NAG are close. We only have these two FORTRAN compilers on OS X, but I think G5 will bring more companies into this platform.

P4/Linux: Portland Group, Lahey, Intel (ifc)
P4/Windows: Compaq Visual Fortran
In general, these are all good compilers, but performance varies from app to app. In some cases, I have seen bogus output come out of executables compiled with more than one of these compilers, so you really have to examine output carefully. I think this is true on any platform really, but it's bit me more often on the P4. I'd rather not say which ones were the worst in public!

Craig
     
CubeBoy
Junior Member
Join Date: Dec 2002
Status: Offline
Reply With Quote
Jul 10, 2003, 10:00 PM
 
Interesting, I don't particularly like NAGware compiler or at least not their F95 compiler considering the C compiler is what generates the final code.

Anyways I've run across a few more Scalar Jet 3d benchmarks with a Pentium 4 using IFC. Results and estimations below.

Pentium 4 2 GHz/RAMBUS
RH Linux 7.3 / IFC
1 CPU-SCALAR(?)
842 MFLOPS

Pentium 4 3.2 GHz (assuming only linear increase in clockspeed)
RH Linux 7.3 / IFC
1 CPU-SCALAR(?)
1347 MFLOPS

PowerMac G4 1GHz
OS X 10.1.5 / Absoft f77
1 CPU-SCALAR
236 MFLOPS

PowerMac G5 2 GHz (Extrapolating from Craig's Numbers)
OS X 10.2.7 / Absoft f77
1 CPU-SCALAR
566 MFLOPS

http://asda.bio.bnl.gov/asda/bb/arch...0208/7049.html
( Last edited by CubeBoy; Jul 11, 2003 at 09:01 AM. )
     
Eug Wanker  (op)
Posting Junkie
Join Date: Jun 2003
Location: Dangling something in the water… of the Arabian Sea
Status: Offline
Reply With Quote
Jul 10, 2003, 10:02 PM
 
cube, linky no worky.

Anyways, here are some more of his comments:

The latest version of Portland is very fast, but ifc is faster in some cases (25-50%). I think ifc uses more agressive optimization, and this can cause problems with some codes (including Jet3D). I'm still working on an ifc-compiled Jet3D.

The G5 was a machine in the developer lab at WWDC, and the benchmarks were run by a friend of mine at the conference. So, I have yet to lay hands on a G5. Looks like it would make a nice workstation, however.
     
CubeBoy
Junior Member
Join Date: Dec 2002
Status: Offline
Reply With Quote
Jul 10, 2003, 10:06 PM
 
There we go, the link should work now.
     
Eug Wanker  (op)
Posting Junkie
Join Date: Jun 2003
Location: Dangling something in the water… of the Arabian Sea
Status: Offline
Reply With Quote
Jul 10, 2003, 11:14 PM
 
here we go, the link should work now.
PowerMac G4 1GHz-DP OS X 10.1.5 / Absoft f77 1CPU-SCALAR - 236 MFLOPS

PowerMac G4 1GHz-DP OS X 10.1.5 / Absoft f77 2CPU-SCALAR - 425 MFLOPS

Pentium 4 2GHz/RAMBUS RH Linux 7.3 / ifc - 842 MFLOPS

Cray SV1ex - 500MHz CPU UNICOS 10.0.1.1 / f90 - 888 MFLOPS

PowerMac G4 1GHz-DP OS X 10.1.5 / Absoft f77 1CPU-ALTIVEC - 1160 MFLOPS

PowerMac G4 1GHz-DP OS X 10.1.5 / Absoft f77 2CPU-ALTIVEC - 1843 MFLOPS
PowerMac G5 2 GHz (Extrapolating from Craig's Numbers) OS X 10.2.7 / Absoft f77 1 CPU-SCALAR - 571 MFLOPS
I don't think the extrapolation works when compared to the above numbers (which were done by someone else), since a Power Mac G5 2 GHz should be more than double the scalar performance of a G4 1 GHz. In fact, Craig Hunter's numbers (see first post) have a single 2 GHz G5 performing as well as two 1.25 GHz G4s.


More from the Ars thread:

Q: VERY interesting--do you have data with Hyperthreading turned on and off?

A: No, I don't. Both ifc and pgf90 automatically parallelize threads, but there isn't anything internal to Jet3D that can be run in parallel threads. On top of that, I don't have a P4 with HT anyway.

I'd need a code more amenable to internal parallelism and a HT capable P4 to test it out.
( Last edited by Eug Wanker; Jul 10, 2003 at 11:24 PM. )
     
CubeBoy
Junior Member
Join Date: Dec 2002
Status: Offline
Reply With Quote
Jul 11, 2003, 09:00 AM
 
Originally posted by Eug Wanker:
PowerMac G4 1GHz-DP OS X 10.1.5 / Absoft f77 2CPU-ALTIVEC - 1843 MFLOPSI don't think the extrapolation works when compared to the above numbers (which were done by someone else), since a Power Mac G5 2 GHz should be more than double the scalar performance of a G4 1 GHz. In fact, Craig Hunter's numbers (see first post) have a single 2 GHz G5 performing as well as two 1.25 GHz G4s.
[/i] [/B]
Check Craig's page again, they were all using single CPU only for the first graph and the Xserve had 1 GHz G4s.

Crag's Numbers:
Powermac G5 2 GHz (single CPU): 254 MFLOPS (scalar)
Powermac G4 1 GHz (single CPU): 105 MFLOPS (scalar)

The 2 GHz G5 is a whopping 2.4 times the speed of the G4 1 GHz

Serge's Numbers:
Xserve G4 1 GHz (single CPU): 236 MFLOPS (scalar)
Powermac G5 2 GHz (single CPU): 236*2.4: 566 MFLOPS (scalar)
     
Eug Wanker  (op)
Posting Junkie
Join Date: Jun 2003
Location: Dangling something in the water… of the Arabian Sea
Status: Offline
Reply With Quote
Jul 11, 2003, 09:17 AM
 
Originally posted by CubeBoy:
Check Craig's page again, they were all using single CPU only for the first graph and the Xserve had 1 GHz G4s.

Crag's Numbers:
Powermac G5 2 GHz (single CPU): 254 MFLOPS (scalar)
Powermac G4 1 GHz (single CPU): 105 MFLOPS (scalar)

The 2 GHz G5 is a whopping 2.4 times the speed of the G4 1 GHz

Serge's Numbers:
Xserve G4 1 GHz (single CPU): 236 MFLOPS (scalar)
Powermac G5 2 GHz (single CPU): 236*2.4: 566 MFLOPS (scalar)
Actually, I was talking about the numbers in the top post, vs. your numbers which are not Craig's. However, it doesn't change your latest post. I'm a little confused about your scalar numbers though since presumably the IFC numbers are heavily SSE2 optimized I'm guessing.

So would you not want to compare the P4 numbers against Altivec'd G4 numbers?

P4 3.2 (extrapolated from Intel Fortran compiler numbers in your post) - 1347 MFLOPS

G5 2.0 (extrapolated single CPU Altivec'd numbers in your link) - 2806 MFLOPS

I'm not sure who was doing the IFC code though. Craig has said he's still working on IFC code because there were some problems with it:

In many cases, ifc works right off the bat with no extra work. That's the way it should be for any compiler, in my opinion, especially for codes that are known to run across many platforms with little or no porting. For Jet3D and the larger CFD codes I use, it has been difficult to get a working executable with ifc. I've been beating on it for over a year now, as time allows.

Generally, the code will bomb after a few iterations, or it will run suspiciously fast and produce ????? in the output. It reminds me of issues I have run into with other compilers, where higher levels of optimization can break things. Only in this case, problems occur even with no optimization. And it's very difficult to troubleshoot by normal techniques.
     
CubeBoy
Junior Member
Join Date: Dec 2002
Status: Offline
Reply With Quote
Jul 11, 2003, 09:30 AM
 
Originally posted by Eug Wanker:
Actually, I was talking about the numbers in the top post, vs. your numbers which are not Craig's. However, it doesn't change your latest post. I'm a little confused about your scalar numbers though since presumably the IFC numbers are heavily SSE2 optimized I'm guessing.

So would you not want to compare the P4 numbers against Altivec'd G4 numbers?

P4 3.2 (extrapolated from Intel Fortran compiler numbers in your post) - 1347 MFLOPS

G5 2.0 (extrapolated single CPU Altivec'd numbers in your link) - 2806 MFLOPS

I'm not sure who was doing the IFC code though. Craig has said he's still working on IFC code because there were some problems with it:

In many cases, ifc works right off the bat with no extra work. That's the way it should be for any compiler, in my opinion, especially for codes that are known to run across many platforms with little or no porting. For Jet3D and the larger CFD codes I use, it has been difficult to get a working executable with ifc. I've been beating on it for over a year now, as time allows.

Generally, the code will bomb after a few iterations, or it will run suspiciously fast and produce ????? in the output. It reminds me of issues I have run into with other compilers, where higher levels of optimization can break things. Only in this case, problems occur even with no optimization. And it's very difficult to troubleshoot by normal techniques.
SSE2 (and higher levels of optimizations) seems to decrease performance on this particular benchmark which is why Craig turned it off in his benchmark. You can also be pretty sure that Serge disabled the SSE/SSE2 compiler flags for his comparison since 1) The Pentium 4 would probably perform worse, and 2) Even if it were on, it would be listed like Altivec for the G4.
     
Eug Wanker  (op)
Posting Junkie
Join Date: Jun 2003
Location: Dangling something in the water… of the Arabian Sea
Status: Offline
Reply With Quote
Jul 11, 2003, 09:38 PM
 
By "no optimization" I meant -O0 (oh zero), as in no compiler optimization, and no SIMD. Compiler optimization usually unrolls loops, inlines functions, etc... When optimization gets too aggressive, it can actually bunge up the computation. That's not the problem now, however, since I get bunged results even with no optimization at all with ifc.

Those benchmarks are unpublished results from a different version of Jet3D that I didn't finalize (it is in between the scalar and vector codes in terms of performance). The ifc results look extremely good (too good in fact, and that's likely related to the problem), but unfortunately, the output is no good. I'm still working hard to get an ifc compiled version of Jet3D that works properly.
     
Graymalkin
Mac Elite
Join Date: May 2001
Location: ~/
Status: Offline
Reply With Quote
Jul 12, 2003, 03:06 AM
 
As I recall at higher optimization levels Intel's compilers translate scalar math into SSE instructions. The SSE instructions are much faster and more efficient than x87 instructions, especially on the P4 line of processors. Using high optimization levels on Intel's compilers can cause some really weird things to happen. Intel's compilers take quite a bit of work to write efficient code for because of some of their optimizations.
     
   
Thread Tools
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 04:16 PM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,