Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Hardware - Troubleshooting and Discussion > Mac Desktops > G5 vs. G4 vs. P4 in Fluid Dynamics (NASA Benchmarks)

G5 vs. G4 vs. P4 in Fluid Dynamics (NASA Benchmarks)
Thread Tools
scottiB
Professional Poster
Join Date: Jan 2000
Location: Near Antietam Creek
Status: Offline
Reply With Quote
Jul 2, 2003, 01:41 PM
 
http://members.cox.net/craig.hunter/g5/

Read, analyze, comment...



Figure 1: Single CPU Jet3D Scalar Benchmarks - MFLOPS



Figure 2: Single CPU Jet3D Scalar Benchmarks - MFLOPS/MHz



Figure 3: Single CPU Jet3D Vector Benchmarks - MFLOPS



Figure 4: Single CPU Jet3D Vector Benchmarks - MFLOPS/MHz
( Last edited by scottiB; Jul 2, 2003 at 02:05 PM. )
I am stupidest when I try to be funny.
     
Spliffdaddy
Posting Junkie
Join Date: Oct 2001
Location: South of the Mason-Dixon line
Status: Offline
Reply With Quote
Jul 2, 2003, 02:24 PM
 
comments:

graph 1 shows a Pentium4 at 2.66GHz beating the fastest G5.

Pentium4's at 3.06GHz have been available for many months. 3.2GHz has been announced - if not released - by now.


graph 2 is an effort to show the G5 ahead of last year's P4 (in something) by graphically illustrating the 'megahertz myth'. It fails to reflect the fact that the Athlon can beat a P4 in this example, as well. MFLOPS/MHz doesn't mean very much if you don't have enough MHz to start with.

graph 3 illustrates the fact that the vector engine (AltiVec simd unit) of the G5 scaled in performance on par with MHz increases. The G5 offers no performance enhancement to the AltiVec found in the G4, in other words.

graph 4 is a redundant attempt to show the same information gleaned from the previous graph. The vector processing units are similar. The one operating at twice the clockspeed is getting twice the amount of work done.
     
scottiB  (op)
Professional Poster
Join Date: Jan 2000
Location: Near Antietam Creek
Status: Offline
Reply With Quote
Jul 2, 2003, 02:43 PM
 
Yep. I posted this because it was first 3rd party comparo that was published (I found the link from MacSurfer).

In the report, the author addressed the faster Pentiums available, and commented that this test was not compiled for the G5 (he used one compiled for the G4). He suggested that one compiled for the G5 may increase the score by 20%.

The test only required 1MB to run, so any system bandwidth was not taxed (if I'm reading it correctly). This is not a good test if one's looking to extrapolate for global illumination renderings.

If the G4 chip would've been able to scale, they would've been on par with the G5 in these tests.

The MFLOPS/MHz graphs show, in my layman's analysis, CPU (and Alti-vec) efficiency in crunching the data.

SSE stuff was disabled for the P4 because it was slower.
I am stupidest when I try to be funny.
     
Catfish_Man
Mac Elite
Join Date: Aug 2001
Status: Offline
Reply With Quote
Jul 2, 2003, 02:46 PM
 
Actually, the G5 does very well. Note that the test is single processor. Multithread that thing and the G5 would whoop the P4's arse, even the 3.2GHz one (assuming SPEC-like scaling for the P4). Using the Absoft fortran compiler isn't helping any either, from what I've read (I don't know anything about that 'Portland' compiler used on the Intel chips, it may suck as well). Overall, roughly what I'd expect for a single threaded fortran benchmark. It'd be nice to see the vector benchmarks for the P4.

<edit> it says optimization using -O2 and SSE/SSE2 on the P4 hurts its performance, so I guess that's why we don't see vector results. And, yeah, the dual processor benchmark shows the G5 beating down </edit>

<edit2> beaten to it</edit2>
( Last edited by Catfish_Man; Jul 2, 2003 at 03:03 PM. )
     
Eug Wanker
Posting Junkie
Join Date: Jun 2003
Location: Dangling something in the water… of the Arabian Sea
Status: Offline
Reply With Quote
Jul 2, 2003, 03:19 PM
 
Sounds about right. I had originally predicted that a G5 1.8 should perform on par with about a 2.53 GHz P4, or perhaps a theoretical 2.3 GHz P4 with HyperThreading (assuming HyperThreading helped it in that particular benchmark).

Of course, for bandwidth intensive applications (like in Apple's bakeoff), the G5 Power Mac would do VERY well.
     
DBvader
Mac Enthusiast
Join Date: Feb 2003
Status: Offline
Reply With Quote
Jul 2, 2003, 05:37 PM
 
i didnt even know that the processors >3 GhZ had hyperthreading.
"Take a little dope...and walk out in the air"
     
Eug Wanker
Posting Junkie
Join Date: Jun 2003
Location: Dangling something in the water… of the Arabian Sea
Status: Offline
Reply With Quote
Jul 2, 2003, 05:49 PM
 
Originally posted by DBvader:
i didnt even know that the processors >3 GhZ had hyperthreading.
Yeah they do, but my guess is that it would do worse in this particular benchmark if you turned HT on.
     
OreoCookie
Moderator
Join Date: May 2001
Location: Hilbert space
Status: Offline
Reply With Quote
Jul 2, 2003, 06:05 PM
 
The Absoft compiler sucks. Not just in terms of performance, but also in terms of compatibility. (I do not restrict myself to the PPC Absoft compiler.)
I don't suffer from insanity, I enjoy every minute of it.
     
dli537
Fresh-Faced Recruit
Join Date: Mar 2001
Location: Baltimore, md, USA
Status: Offline
Reply With Quote
Jul 2, 2003, 06:17 PM
 
The benchmark is reasonable. But, don't forget
that PowerPC G5 has two FPU and you need to recompile the code to take advantage of the second FPU. The G4-optimized code is only using one G5 FPU which is already better than the G4's FPU. If the code is recompiled for G5, G5 will blow any P4 out of water.

Read the following document for detail.
http://developer.apple.com/technotes/tn/tn2087.html
     
hmurchison2001
Senior User
Join Date: Jan 2001
Location: Seattle
Status: Offline
Reply With Quote
Jul 2, 2003, 06:44 PM
 
This is just and evaluation of the G5 for NASA. The author does not seem to have any bias towards one platform over the other.

This was run using unmodified code and that potentially makes a huge difference. As dli537 has said. You haven't seen the full FPU performance until you recompile apps to effectively utilize both FPU.

Hannibal from Arstechnica was correct abou the Altivec. In the 970 is seems like it was grafted on. I expect that future members in the 9xx family will have more functional Altivec units.

The recent results are very encouraging. Luxology backs up their benchmarks, this report and other small reports are definitely confirming that we have a good performer in the G5. While it may not be the fastest overall...Mac users won't have to fret about paying more money and watching their machine get it's rear kicked. Those days are over.
     
cowerd
Senior User
Join Date: Jan 2001
Status: Offline
Reply With Quote
Jul 2, 2003, 11:47 PM
 
I don't know anything about that 'Portland' compiler used on the Intel chips, it may suck as well
Portland writes compilers. They are working on a 64-bit compiler for the Opteron that is showing 300% gains on some SPEC benchmarks, and average of 150% gains, over the 32-bit compiler version. This may indicate that the Portland Group has some compiler expertise.
yo frat boy. where's my tax cut.
     
Catfish_Man
Mac Elite
Join Date: Aug 2001
Status: Offline
Reply With Quote
Jul 3, 2003, 12:45 PM
 
Originally posted by dli537:
The benchmark is reasonable. But, don't forget
that PowerPC G5 has two FPU and you need to recompile the code to take advantage of the second FPU. The G4-optimized code is only using one G5 FPU which is already better than the G4's FPU. If the code is recompiled for G5, G5 will blow any P4 out of water.

Read the following document for detail.
http://developer.apple.com/technotes/tn/tn2087.html
Wrong. It won't be optimally scheduled, and it won't be using fancy tricks like loop unrolling (which would be useless for 1 FPU), but no software can keep the G5 from using its second FPU. Each of the G5's FPUs are essentially identical to the G4's except for longer latencies, higher frequencies, and the presence of a hardware sqrt instruction.

In response to the post about the Portland compiler: !!! That makes the G5 benchmark even more impressive. I'd be interested in seeing what would happen with IBM's compiler.
     
CubeBoy
Junior Member
Join Date: Dec 2002
Status: Offline
Reply With Quote
Jul 3, 2003, 03:46 PM
 
Originally posted by cowerd:
Portland writes compilers. They are working on a 64-bit compiler for the Opteron that is showing 300% gains on some SPEC benchmarks, and average of 150% gains, over the 32-bit compiler version. This may indicate that the Portland Group has some compiler expertise.
From what I've seen PGI 5.0 offers on average a 20% performance increase over G77 3.3 on a 1.8 GHz Opteron system using select SPECfp benchmarks done by Portland, we'll have to wait to see how well it fares against Intel's and Compaq's lineup. I don't know how good their compiler for the Pentium 4 is though.

http://www.pgroup.com/images/pgf90vg77.jpg
     
Catfish_Man
Mac Elite
Join Date: Aug 2001
Status: Offline
Reply With Quote
Jul 3, 2003, 04:54 PM
 
Originally posted by CubeBoy:
From what I've seen PGI 5.0 offers on average a 20% performance increase over G77 3.3 on a 1.8 GHz Opteron system using select SPECfp benchmarks done by Portland, we'll have to wait to see how well it fares against Intel's and Compaq's lineup. I don't know how good their compiler for the Pentium 4 is though.

http://www.pgroup.com/images/pgf90vg77.jpg
Well, the G5 is using F90 (although I don't know how much difference there is in performance, if any), but it's also PowerPC, which tends to have less advanced compilers, and it's not being optimzed for the G5. I'd say that this shows that with good compilers, a single 2GHz G5 could match or beat a 3.2GHz P4 (assuming the guess of 20% improvement for the 3.2GHz machine that is stated in the paper is correct) at this task. Quite impressive.
     
CubeBoy
Junior Member
Join Date: Dec 2002
Status: Offline
Reply With Quote
Jul 3, 2003, 08:21 PM
 
Originally posted by Catfish_Man:
Well, the G5 is using F90 (although I don't know how much difference there is in performance, if any), but it's also PowerPC, which tends to have less advanced compilers, and it's not being optimzed for the G5. I'd say that this shows that with good compilers, a single 2GHz G5 could match or beat a 3.2GHz P4 (assuming the guess of 20% improvement for the 3.2GHz machine that is stated in the paper is correct) at this task. Quite impressive.
Impressive indeed, I'm actually surprised that the Pentium 4 fared so well without SSE2 on this particular benchmark considering how weak it's FPU is compared to the Athlon and PPC970/G5.

I really don't know what to make of it, SSE2 was essentially the only optimization that substantially increased P4 performance in fp intensive code (integer code is another story), without it, I'd expect even the fastest Pentium 4 to be no match against a Athlon or PPC970 with their superscalar FPUs. Right now, I believe it's mostly due to how the Pentium 4 handles excess bandwidth. You see, any extra bandwidth that the P4/motherboard channel provides is used for prefetching data. Since this particular benchmark has such a small memory footprint it won't require a great deal of bandwidth from the main memory and thus would allow for a great deal of the data to be prefetched. So bearing this in mind, I think that it's reasonable to assume that the benchmark, which can't normally fit in the cache, now (due to extensive prefetching) runs completely in the on-die full speed cache seen on the Pentium 4.

Regarding the 3.2 GHz P4, 20% increase is only assuming a linear increase in performance due to higher clock rates, and compared to the 2.66 GHz P4, the 3.2 GHz model is alot more than just a speed bump, as I explained in my post on Macrumors.
( Last edited by CubeBoy; Jul 4, 2003 at 08:38 AM. )
     
Eug
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Jul 4, 2003, 01:26 AM
 
I'm surprised this has received so little attention. Maybe it's the title.

Anyways, this thread has gotten a lot more posts. Much of it is flaming though unfortunately.

P.S. I do find this graph very intersting:



Remember this?

( Last edited by Eug; Jul 4, 2003 at 01:39 AM. )
     
scottiB  (op)
Professional Poster
Join Date: Jan 2000
Location: Near Antietam Creek
Status: Offline
Reply With Quote
Jul 4, 2003, 07:46 PM
 
Yeah, it was a weak title. I was between tying up some loose ends before the American holiday, and couldn't come up with something better.

Honestly, I thought the "G5 vs. G4 vs. P4" would actually be the clarion call... .

Next time: "Holy effin' sh!+! The G5 kix azz!"
I am stupidest when I try to be funny.
     
Eug Wanker
Posting Junkie
Join Date: Jun 2003
Location: Dangling something in the water… of the Arabian Sea
Status: Offline
Reply With Quote
Jul 4, 2003, 10:14 PM
 
The author speaks (back in January):

Date: Mon, 13 Jan 2003 23:29:38 -0500
From: Craig Hunter
Subject: G4 vs. P4 performance

I have been following the discussion of Rob Galbraith's benchmarks with much interest, as I have spent a good deal of time testing, optimizing, and benchmarking software for the G4 (OS X) and P4 (Linux).

The first thing to realize is that there are numerous benchmarks that show the P4 is faster, and there are numerous benchmarks that show the G4 is faster. What matters? Well, probably the benchmarks that apply to the kind of work you do. For people doing photo processing with the software Rob tested, his results are extremely relevant. But, someone working with a program optimized for AltiVec and dual processors might have a completely opposite experience.

Just to give an example of a benchmark that goes the other way, see this chart.

(You're welcome to mirror this benchmark image, since my web site may not handle a lot of traffic). These real-world results come from the Jet3D computational fluid dynamics noise prediction software, which I developed for my doctoral thesis and currently use in my work at NASA. Jet3D is written in a combination of FORTRAN 77, FORTRAN 90, and C, and is optimized for AltiVec and dual processors on G4 hardware. When compiled on Linux using Intel's ifc compiler tools, Jet3D also becomes optimized for the P4 (using the various SIMD extensions available on the P4).

As you can see, the G4 does quite well here. A dual processor 1.25GHz G4 system is more than 3.5X faster than a single processor 2GHz P4 system. Though it's not shown on the chart, a single 1.25GHz G4 processor benchmarks at about 1589 MFLOPS, 1.9X faster than the P4. If you look at MFLOPS per MHz for a single processor, the G4 comes in at 1.27 MFLOPS/MHz, while the P4 comes in at 0.42 MFLOPS/MHz. If you want a good example of the MHz myth, look at the Cray, which comes in at 1.78 MFLOPS/MHz with only a 500MHz processor, beating both the G4 and P4.

Without AltiVec, the Jet3D benchmark would be about 794 MFLOPS on the dual-1.25GHz G4, which erases the performance lead over the P4. And then, using only a single processor, the 1.25GHz G4 benchmarks at about 418 MFLOPS, which is about half as fast as the P4. And all of a sudden, the G4 doesn't look very compelling. For the Jet3D benchmark, AltiVec and dual processors are key (AltiVec more so than dual procs). This is true for most benchmarks I have looked at; thus numerically intensive applications that can't use AltiVec and/or dual processors are likely to suffer on the G4.

In the case of Jet3D, it was easy to optimize for AltiVec. I was able to hand-vectorize about 10 lines of code within the guts of the FORTRAN algorithm and convert the computations to C for easy access to AltiVec hardware instructions. It had a huge effect for not a lot of work. For other more complicated cases, it may be possible to use the VAST compiler tools to automatically vectorize and tie in with AltiVec (VAST has parallel tools also). But in some cases, vectorization is not possible or feasible. In those instances, you're stuck with the processor's scalar performance, and the P4 generally has better scalar performance than the G4 in my experience. One final note: these are my personal views, and do not represent the views of NASA Langley Research Center, NASA, or the United States Government, nor do they constitute an endorsement by NASA Langley Research Center, NASA, or the United States Government
     
moki
Ambrosia - el Presidente
Join Date: Sep 2000
Location: Rochester, NY
Status: Offline
Reply With Quote
Jul 5, 2003, 12:53 AM
 
1) All of these tests were single-CPU tests. If they had a multi-threaded benchmark, or the benchmark was multi-processor aware, you could expect to see significant performance gains from the G5 specs

2) They are using an old Jet3D binary, one that was not compiled using gcc 3.3, and thus it doesn't have the 970-specific optimizations and scheduling in it. They do note this, but it is worth repeating here, because recompiled apps will benefit quite a bit in some cases from gcc 3.3 for the G5

Still, the scores look pretty good -- I am eager to see how the benchmarks turn out when they are recompiled under gcc 3.3, on shipping G5's (and running Panther, too )
Andrew Welch / el Presidente / Ambrosia Software, Inc.
     
Eug Wanker
Posting Junkie
Join Date: Jun 2003
Location: Dangling something in the water… of the Arabian Sea
Status: Offline
Reply With Quote
Jul 5, 2003, 06:21 AM
 
Originally posted by moki:
1) All of these tests were single-CPU tests. If they had a multi-threaded benchmark, or the benchmark was multi-processor aware, you could expect to see significant performance gains from the G5 specs
The test is MP aware. He gets almost double the performance with two CPUs and states that in the text. He just didn't put it in the graph.
     
krove
Mac Elite
Join Date: Jul 2000
Location: Washington, DC
Status: Offline
Reply With Quote
Jul 6, 2003, 01:15 PM
 
I'd like to know how he got his hands on a dual 2 GHz G5 so quickly!

How did it come to this? Goodbye PowerPC. | sensory output
     
Eug
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Jul 15, 2003, 09:00 AM
 
Originally posted by krove:
I'd like to know how he got his hands on a dual 2 GHz G5 so quickly!
He didn't. He gave the benchmark to a friend of his, who went to WWDC and ran it on a test machine there.

In other words, he just took his old unoptimized 32-bit code and ran it stock on the G5 - no modifications or recompilation.

By the way, here is an AMD Clawhammer Athlon 64 3400+ bench. It performs in Sandra about on par with the Pentium 4 2.66 GHz, but bests it in memory bandwidth and (if I read the table correctly) plain FPU power, the latter by a large margin. It's curious that the situation is similar to the G5 2.0 GHz.

There is also this review of the Clawhammer 2800+.
     
   
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 11:04 PM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,