Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Hardware - Troubleshooting and Discussion > Mac Desktops > Does the G5 have any advantages over Intel's core duo?

Does the G5 have any advantages over Intel's core duo?
Thread Tools
Junior Member
Join Date: Jan 2006
Status: Offline
Reply With Quote
Feb 14, 2006, 11:14 PM
 
Wasn't sure where to ask this question but I figured this place would be a good one.

Does the G5 have any advantagese over the Core Duo? Or is the core duo superior in every way?

My friend also asks more specifically in things dealing with lots of number crunching and computations, which processor is better suited?
     
Posting Junkie
Join Date: Oct 2005
Location: Houston, TX
Status: Offline
Reply With Quote
Feb 14, 2006, 11:43 PM
 
One CPU is rarely superior to another in every way.

The VMX unit on the G5 makes it very fast for some encryption routines, such as RC5. At the same clockrate a single G5 about 30% faster than Core Duo (it is a multithreaded benchmark, so both cores are used).
G5 has a faster FSB... some folks on these boards (notably Lateralus) claim this has a nontrivial impact on performance, but offer no proof.
The VMX unit also helps with benchmarks like BLAST when you have long word lengths.
The G5 is also a screamer for Linpack, a 1970's era matrix math benchmark used for ranking supercomputers (due to be replaced this year, if I recall correctly).
It's hard to get firm prices for the G5s, but I believe they may cost less based on the figures I've heard thrown around.

Which processor is better suited depends on the nature of your application, how well it can be optimized to use the SIMD units, and how much is it optimized (compilers used, hand coded asm, etc).
     
Clinically Insane
Join Date: Oct 2000
Location: Los Angeles
Status: Offline
Reply With Quote
Feb 15, 2006, 02:03 AM
 
Very nice reply, mduell.

"The natural progress of things is for liberty to yield and government to gain ground." TJ
     
Mac Enthusiast
Join Date: Jan 2006
Status: Offline
Reply With Quote
Feb 15, 2006, 02:17 AM
 
is it asking too much to have a processor that'll let me burn multiple DVD's, play a graphics intesnsive first-person-shooter, process my video feed from my digital camera, print a 700-page document, scan photos, download multiple linux isos off of p2p servers/networks, and do something like Seti@home (or whatever it's called now)..........ALL in one sitting......at the same time.......and fast, too?
     
Moderator
Join Date: May 2001
Location: Hilbert space
Status: Offline
Reply With Quote
Feb 15, 2006, 02:36 AM
 
Performance-wise, the G5 is competitive, and its newest evolution which was presented ISSC (the PPC970GX) should have less appetite for power. It's architecture has some advantages that come into play with smp setups (point-to-point interconnects vs. front-side bus topology), but this argument is less important for single-cpu computers (such as mobile computers and the iMac).

Since the G5 is much smaller than most current cpus, it is also cheaper to manufacture (some people calculated a price of roughly $100 if I remember correctly in case of the iMac), but Apple has to do the logic board design, including parts of the North and South bridge (to use ancient terms here). Now Apple can use a phletora of chipsets for free.

To be honest, I think SJ put it rather well that performance-wise, there was little incentive to move from PowerPCs to Intel cpus (especially true since any emulated app WILL run slower), but Intel cpus (at least the ones available now) consume far less power.
I don't suffer from insanity, I enjoy every minute of it.
     
Clinically Insane
Join Date: Oct 2000
Location: Los Angeles
Status: Offline
Reply With Quote
Feb 15, 2006, 03:29 AM
 
Originally Posted by OreoCookie
To be honest, I think SJ put it rather well that performance-wise, there was little incentive to move from PowerPCs to Intel cpus (especially true since any emulated app WILL run slower), but Intel cpus (at least the ones available now) consume far less power.
Do you happen to have a quotation from Jobs stating as much? I'd really love to know he admitted that. I assume you're referring to the WWDC keynote, but I don't feel like looking back at it. Intel has the edge on low watt, high performance laptop chips, but that's about it. The Pentium D and Xeon aren't low energy by any stretch of the imagination.

"The natural progress of things is for liberty to yield and government to gain ground." TJ
     
Professional Poster
Join Date: Mar 2002
Location: Boston
Status: Offline
Reply With Quote
Feb 15, 2006, 05:53 AM
 
I think the advantage of the G5 is less technical and higher level. That is, at the moment the G5 can run all OSX software, the mactel cannot. I think as time goes by that advantage will evaporate and the mactel will surpass the g5 in performance and compatability.
     
Moderator
Join Date: May 2001
Location: Hilbert space
Status: Offline
Reply With Quote
Feb 15, 2006, 07:08 AM
 
Originally Posted by Big Mac
Do you happen to have a quotation from Jobs stating as much? I'd really love to know he admitted that. I assume you're referring to the WWDC keynote, but I don't feel like looking back at it. Intel has the edge on low watt, high performance laptop chips, but that's about it. The Pentium D and Xeon aren't low energy by any stretch of the imagination.
Well, I don't have an exact quote (and I'm too lazy to look it up), but he said all this when the switch to Intel was announced and he brought up all this `performance-per-Watt' issue. If you look at the Power5+ and the proposed Power6, it's clear that IBM has some solid cpus up in its pipeline/its shelves that future desktop derivatives could be based on.
I don't suffer from insanity, I enjoy every minute of it.
     
Mac Elite
Join Date: Apr 2000
Location: Minneapolis, MN USA
Status: Offline
Reply With Quote
Feb 15, 2006, 09:02 AM
 
Thinking:

The whole performance per watt desire was based on the fact that laptops
are more and more becoming the item that people want versus large desktop
computers. The former is more the desire of the consumer while the latter
is the item pros and other prosumers purchase.

The power consumption of the DC Intel chips (competitive processing power
but not consuming as much in terms of electrical power) would theoretically
lead to longer battery life for laptop computers and such but have enough
power to run applications at a better than average level.

Also, a DC iMac is a compelling solution for consumers I'd think.
But it isn't as powerful as my DP G5 yet based on my informal tests.

I've posted my facts about how many simultaneous tracks of 32 note
polyphony in software synths in Garageband did (26 before the machine
seized up at 165% processing power) on a CoreDuo 2.0. But most
folks aren't doing 26 tracks of 32 voice polyphony - I'm guessing you
could do 50-60 tracks with that application real world use.

And the ability to do multiple simultaneous things is more a function of
the operating system than the hardware it uses - a DC would work better
than a single CPU in my opinion.

Re: G5 (DP 2.5, 2.5 GB ram)
I have routinely worked on large audio projects (80+ tracks) (with effects
and software instruments in addition to recorded material) while also
simultaneously running two instances of Seti at Home, while burning a
DVD at the same time using my G5 DP 2.5 while also downloading
podcasts with iTunes and also running a bittorrent client.

It doesn't cause the machine to seize up and it all runs at seemingly
full speed to my perceptions. I believe a large part of this is that there
is a second processor there - I've said it before and I'll say it again - I
will never again willingly buy a single CPU computer - that second core
or processor does wonders with system throughput.

The one thing you will notice about a G5 is that the DPs generate a
noticeable amount of heat (read: warm air coming out the back of
the machine). The DC machines should generate less heat if that's
a concern, also, less in terms of electricity consumption.

I note that the G5 dual consumes more electricity than any other
computer I own (G4 400 sawtooth, Athlon 2400 box, Pentium M
laptop, etc.) - I used a power consumption meter called the Power
Angel to determine this.

The performance cannot be ignored however - I have yet to make
it seize up using audio software and I've done this on my old machine
time and time again. The speed in Seti at Home is something to
experience and the overall heft I get running things that used to
be slow is great.

I have yet to go "I want to do this particular project but I don't have
enough processing horsepower to do it". No limits here yet.

They'll have to produce one hell of a machine on the Intel side
to replace it. I want to check that baby out once it's released.
     
Senior User
Join Date: Apr 2002
Location: Stockholm Sweden
Status: Offline
Reply With Quote
Feb 15, 2006, 09:10 AM
 
The G5 has a good FPU but weaker SPU. The bus speed of the G5 is very good but the latence in the norhtbridge is higher than in Intels and mcuh worse than AMDs IMC solution. In some cases with a lot of data streaming the G5 will shine, in other work were the data is largely kept in the larger L2 cache of the Intel CPU they will shine. If your friend is aiming for a long term investment he should go with the Intel. If he need massive power today the quad core G5 is very very good.
     
Forum Regular
Join Date: Nov 2005
Status: Offline
Reply With Quote
Feb 15, 2006, 11:46 AM
 
Originally Posted by macdummy
Wasn't sure where to ask this question but I figured this place would be a good one.

Does the G5 have any advantagese over the Core Duo? Or is the core duo superior in every way?

My friend also asks more specifically in things dealing with lots of number crunching and computations, which processor is better suited?
It depends on what G5 we're talking about. Let's compare the 970MP (the dual core), just to take the number of cores difference out of the equation.

1) The Core Duo at 2.0 or 2.16 GHz will be substantially faster in integer code than even one of the 2.5 GHz 970MPs in the Quad. Per GHz, the Core Duo is about 40% faster in SPECint than the G5. This number is actually a bit optimistic for the G5, because most real code uses GCC, which optimizes a lot better for x86 than PowerPC. Based on my own benchmarks, I'd put the difference at closer to 50% in favor of the core Duo.

2) The 970MP will be substantially faster in vector code. SSE is a lot better in its version 3 incarnation, but its no Altivec. SSE has no FMA instruction (multiply-accumulate), nor do any existing implementations of it actually process the full 128-bits each clock cycle. Theoretically, the G5's AltiVec unit can complete 8 32-bit operations per clock cycle, while the Core Duo's SSE unit can only complete 2 per clock cycle. In practice, the difference won't be nearly this large, but for certain tasks where AltiVec shines (signal processing, which was what it was designed to do), the advantage can be very large.

3) The 970MP will usually be a bit faster in floating-point code, but it depends. The G5 has a very high 6 cycle latency for FPU instructions. The Core Duo has a 4 cycle latency. This means that dependent operations on the G5 must be seperated by six clock cycles, versus 4 on the Core Duo. If the compiler can find independent operations to do during that gap, this isn't a problem. If it can't, the CPU ends up wasting some cycles. In practice, the G5's FPU latency just makes the compiler's job harder. GCC 3.x optimized rather poorly for the G5's FPU, but 4.x is significantly better.

4) The 970MP has very high memory latencies relative to other modern processors. It's about 30% higher than a Core Duo's, and almost triple that of an Opteron's. This actually kills the G5 in a lot of benchmarks that it would otherwise win because of its strong FPU. For example, the G5 runs SciMark small data set benchmark neck and neck with an Opteron. However, when you move to the large data set benchmark, which hits memory a lot harder, its performance plummets relative to the Opteron's.

So basically the answer is: it depends. Your friend says he's interested in "number crunching", but that's not specific enough. What kind of number crunching? Does he do compiling? The Core Duo would be a lot faster for that. Does he do 3D rendering? The G5 could be faster at that, but it depends on how well the code is optimized. Generic UNIX programs like Blender and POVray will likely run slower on the G5. Optimized stuff like Cinebench will definitely run faster. Does he do signal processing? The G5 will scream on that sort of code.
(Last edited by rhashem; Feb 15, 2006 at 11:55 AM. )
     
Forum Regular
Join Date: Nov 2005
Status: Offline
Reply With Quote
Feb 15, 2006, 11:52 AM
 
Originally Posted by OreoCookie
Since the G5 is much smaller than most current cpus,
That's not really a fair comparison, though. The 970FX is small (~66mm^2) because it doesn't have a lot on the chip. One core, only 512KB of cache, no integrated memory controller, etc. The 970MP, with dual cores and 2MB of total cache, has a die size of 170mm^2, which is pretty in-line with other processors in its class. The Opteron has a die size of 199mm^2, but it has an integrated memory controller and multiple Hypertransport links, which the 970MP doesn't. Yonah has a die size of 90mm^2, but that's on a 65nm process, versus the 90nm process the other two are built on. One can estimate that at 90nm, Yonah's die size would be right in line with the 970MP's.
     
Moderator
Join Date: May 2001
Location: Hilbert space
Status: Offline
Reply With Quote
Feb 15, 2006, 12:45 PM
 
Obviously the FX is small because of its small cache, but the cpus which are about as old as the FX (the P4, for instance) are far bigger than this with comparable performance. So the single-core variants fo the PPC970 are small cpus. And this is going to make the G5 much cheaper since you can fit more cpus on a wafer, the likelihood of having working chips probably is larger as well, etc.

You are right, though, that AMD's offerings are not that much larger (especially since they use the same structure size, 90 nm) and that this is due to the cache. However, Intel's P4-based offerings were much bigger than this, especially the variants with humonguous caches. (There are Xeons which take up more than 270 mm2, the largest one I've heard of needs 435 mm2!)

The relatively small size of Yonah can be attributed to the die shrink. If it were manufactured in 90 nm, you'd have to multiply its die size with a factor of roughly 1.9.
I don't suffer from insanity, I enjoy every minute of it.
     
Posting Junkie
Join Date: Oct 2005
Location: Houston, TX
Status: Offline
Reply With Quote
Feb 15, 2006, 12:48 PM
 
Originally Posted by ©öñFü$íóÑ
is it asking too much to have a processor that'll let me burn multiple DVD's, play a graphics intesnsive first-person-shooter, process my video feed from my digital camera, print a 700-page document, scan photos, download multiple linux isos off of p2p servers/networks, and do something like Seti@home (or whatever it's called now)..........ALL in one sitting......at the same time.......and fast, too?
Sure, many CPUs can do that (although you'd also need a pretty quick disk)... if you can't, it's probably due to your OS. As for doing it fast, depends what you call "fast."
Also, running SETI would be rather pointless, since your CPU would probably be maxed out already.

Originally Posted by OreoCookie
Performance-wise, the G5 is competitive, and its newest evolution which was presented ISSC (the PPC970GX) should have less appetite for power. It's architecture has some advantages that come into play with smp setups (point-to-point interconnects vs. front-side bus topology), but this argument is less important for single-cpu computers (such as mobile computers and the iMac).

Since the G5 is much smaller than most current cpus, it is also cheaper to manufacture (some people calculated a price of roughly $100 if I remember correctly in case of the iMac), but Apple has to do the logic board design, including parts of the North and South bridge (to use ancient terms here). Now Apple can use a phletora of chipsets for free.
Two points:
1) AMD and Intel (and probably IBM) are both moving toward four core processors in the next year or two, so a shared interconnect vs dedicated interconnects will only matter for 8-way and larger boxes.
2) The chipsets may be less expensive (development costs are spread over a much larger number of chips), but they still cost Apple something. List price is $65 for 945PM (with ICH7) in quantities of 1000.

Originally Posted by OreoCookie
Well, I don't have an exact quote (and I'm too lazy to look it up), but he said all this when the switch to Intel was announced and he brought up all this `performance-per-Watt' issue. If you look at the Power5+ and the proposed Power6, it's clear that IBM has some solid cpus up in its pipeline/its shelves that future desktop derivatives could be based on.
But is IBM interested in producing low-cost, low-power derivatives? I got the impression they no-bid Apple's request for a mobile G5.

Originally Posted by rhashem
4) The 970MP has very high memory latencies relative to other modern processors. It's about 30% higher than a Core Duo's, and almost triple that of an Opteron's.
Link for Core Duo memory latency tests?
Last time I saw the numbers, the P4s came in around 150ns, K8s around 130ns, and the G5s around 300ns; I'd expect CoreDuo to be close to the P4, not up at ~230ns.
(Last edited by mduell; Feb 15, 2006 at 01:05 PM. )
     
Moderator
Join Date: May 2001
Location: Hilbert space
Status: Offline
Reply With Quote
Feb 15, 2006, 01:19 PM
 
Originally Posted by mduell
Two points:
1) AMD and Intel (and probably IBM) are both moving toward four core processors in the next year or two, so a shared interconnect vs dedicated interconnects will only matter for 8-way and larger boxes.
It matters also for smaller system, it's the reason why Intel needs those huge caches to keep up with AMD. However, I've said earlier it won't matter for single-cpu systems (which is what we are discussing now). So I think this is a largely theoretical argument.
Originally Posted by mduell
2) The chipsets may be less expensive (development costs are spread over a much larger number of chips), but they still cost Apple something. List price is $65 for 945PM (with ICH7) in quantities of 1000.
True. I don't know how much Apple has spent for mobo design and all this, but I guess at least the same as what they are paying to Intel. I don't have any figures on this, though.
Originally Posted by mduell
But is IBM interested in producing low-cost, low-power derivatives? I got the impression they no-bid Apple's request for a mobile G5.
Depends on your definition of `low power'
I guess for IBM low power means `low enough to use it in blades'
I don't suffer from insanity, I enjoy every minute of it.
     
Posting Junkie
Join Date: Oct 2005
Location: Houston, TX
Status: Offline
Reply With Quote
Feb 15, 2006, 03:27 PM
 
Originally Posted by OreoCookie
Depends on your definition of `low power'
I guess for IBM low power means `low enough to use it in blades'
You don't need low power chips for blades, because you can use noisy cooling. In fact, IBM has some of the highest power chips on the market in blades: 2x2.5Ghz PPC970MP (those are what, 80+W each?) and 2x3.8Ghz Xeon (110W each)
(Last edited by mduell; Feb 15, 2006 at 03:35 PM. )
     
Dedicated MacNNer
Join Date: Apr 2003
Status: Offline
Reply With Quote
Feb 15, 2006, 04:31 PM
 
I'm the last to chime in for Intel, but the two chips aren't really directly comparable. I know the OP asked the question, but memory latencies and all that are largely irrelevant to most of the people that will buy a Mac that currently features the Core Duo.

The G5 is a desktop/workstation chip from breed, for better or worse. The pedigree of the Core Duo is low power consumption but competitive performance for its class.

As for heavy computation... as many have said, it depends on what you're friend would want to do. The iMac is a great general purpose computer, regardless of whether it has a G5 or Core Duo chip. For my money, if I wanted a computer running flat out most of the time, I'd choose a Power Mac every time.
     
Forum Regular
Join Date: Nov 2005
Status: Offline
Reply With Quote
Feb 15, 2006, 06:48 PM
 
Link for Core Duo memory latency tests?
Last time I saw the numbers, the P4s came in around 150ns, K8s around 130ns, and the G5s around 300ns; I'd expect CoreDuo to be close to the P4, not up at ~230ns.
http://techreport.com/reviews/2006q1...4/index.x?pg=3

The 130ns figure for the 970MP is from my own nbench'ing of my machine.
     
Posting Junkie
Join Date: Oct 2005
Location: Houston, TX
Status: Offline
Reply With Quote
Feb 15, 2006, 06:59 PM
 
Originally Posted by rhashem
http://techreport.com/reviews/2006q1...4/index.x?pg=3

The 130ns figure for the 970MP is from my own nbench'ing of my machine.
That's for the Pentium M/915 chipset/DDR2-533, not Core Duo/945 chipset/DDR2-667.
Wow, glad to see the memory latency has improved with the DDR2 PowerMacs.
     
Moderator
Join Date: May 2001
Location: Hilbert space
Status: Offline
Reply With Quote
Feb 16, 2006, 02:01 AM
 
Originally Posted by mduell
You don't need low power chips for blades, because you can use noisy cooling. In fact, IBM has some of the highest power chips on the market in blades: 2x2.5Ghz PPC970MP (those are what, 80+W each?) and 2x3.8Ghz Xeon (110W each)
Maybe you've missed the smiley, but I was joking.

IBM always claims their chips are low power and they have low power variants if anyone wants them. But I haven't seen anybody use them. Of course, if you define low power as less than Xeon, Opterons are low power chips, too
I don't suffer from insanity, I enjoy every minute of it.
     
Forum Regular
Join Date: Nov 2005
Status: Offline
Reply With Quote
Feb 16, 2006, 05:02 PM
 
Originally Posted by mduell
Wow, glad to see the memory latency has improved with the DDR2 PowerMacs.
Eh, not really. I don't know where the 300ns figure came from, but nbench has never reported latency that high for the PMs. The previous nbench results were right around 130ns too.
     
Posting Junkie
Join Date: Oct 2005
Location: Houston, TX
Status: Offline
Reply With Quote
Feb 16, 2006, 07:29 PM
 
Originally Posted by rhashem
Eh, not really. I don't know where the 300ns figure came from, but nbench has never reported latency that high for the PMs. The previous nbench results were right around 130ns too.
Anandtech's comparison review of the dual 2.7 PowerMac.
     
Forum Regular
Join Date: Jul 2005
Status: Offline
Reply With Quote
Feb 17, 2006, 10:01 PM
 
Originally Posted by ©öñFü$íóÑ
is it asking too much to have a processor that'll let me burn multiple DVD's, play a graphics intesnsive first-person-shooter, process my video feed from my digital camera, print a 700-page document, scan photos, download multiple linux isos off of p2p servers/networks, and do something like Seti@home (or whatever it's called now)..........ALL in one sitting......at the same time.......and fast, too?

Well you made me curious I just did the following tasks which is not as much as what you said but it still turned out well.

Did this:

-Burning a backup of a DVD I had purchased from my E: drive which I had 'ripped earlier'
-Ripping that same DVD again to my D: drive just for the sake of adding load to the system
-running Doom3 in a window at 800x600 on high detail
-playing MP3's
-Having my outlook and MSN messenger open (not that they do anything but hog memory)
-did the photoshop test with the horse

The results were that the doom3 cycled in FPS between 27fps and 60fps, audio never hiccupped and DVD never ran out of buffer and the horse test finished in 42.5 seconds

I'm doing the test on a Celeron 500Mhz.... LOL ok no its not. Dual 3Ghz xeon with 2gig RAM and 3x 36gig SCSI drives, X800XT video.

http://powerthings.com/pics/allatonce.jpg
http://powerthings.com/pics/allatonce2.jpg

Sorry for the high res images. Bumped up the res to get stuff on screen.
     
Dedicated MacNNer
Join Date: Jun 2002
Status: Offline
Reply With Quote
Feb 18, 2006, 07:23 PM
 
All you guys are trying to outdo one another with your tech knowlege!!

You forgot the biggest difference! 64-bit.

In general, the major advantages the G5 has are:
  1. Two FPU's per processor
  2. Super fast bus speed
  3. 64-bit
  4. Solid, proven, reliable architecture
  5. RISC instead of poo CISC
www.marcushesse.com

UNC-Charlotte Apple Campus Rep.
     
Forum Regular
Join Date: Jul 2005
Status: Offline
Reply With Quote
Feb 18, 2006, 07:46 PM
 
Originally Posted by BurpetheadX
All you guys are trying to outdo one another with your tech knowlege!!

You forgot the biggest difference! 64-bit.

In general, the major advantages the G5 has are:
  1. Two FPU's per processor
  2. Super fast bus speed
  3. 64-bit
  4. Solid, proven, reliable architecture
  5. RISC instead of poo CISC
Well I'm not sure but at least this is what i think:

-doesnt matter how many FPU's its got. End result should be the performance of the apps
-doesnt matter how fast the bus is. End result should be the performance of the apps
-core duo actually is rumored to be 64 bit but not advertised
http://www.gizmodo.com/gadgets/intel...ers-153822.php

-I've never seen an unreliable processor in my life
-RISC vs CISC... who cares. End result is the performanc of the apps.
     
Dedicated MacNNer
Join Date: Jun 2002
Status: Offline
Reply With Quote
Feb 18, 2006, 07:50 PM
 
Originally Posted by svtcontour
Well I'm not sure but at least this is what i think:

-doesnt matter how many FPU's its got. End result should be the performance of the apps
-doesnt matter how fast the bus is. End result should be the performance of the apps
-core duo actually is rumored to be 64 bit but not advertised
http://www.gizmodo.com/gadgets/intel...ers-153822.php

-I've never seen an unreliable processor in my life
-RISC vs CISC... who cares. End result is the performanc of the apps.

Application peformance is only one aspect of computing. G5's excel with use in farms - whether they be for rendering, medical, compiling, or any type of distributed computing (like SETI). This completely comes down to processor / FPU performance. The Virginia Tech cluster demonstrates this.
www.marcushesse.com

UNC-Charlotte Apple Campus Rep.
     
Mac Elite
Join Date: Aug 2001
Status: Offline
Reply With Quote
Feb 18, 2006, 07:51 PM
 
Originally Posted by BurpetheadX
All you guys are trying to outdo one another with your tech knowlege!!

You forgot the biggest difference! 64-bit.

In general, the major advantages the G5 has are:
  1. Two FPU's per processor
  2. Super fast bus speed
  3. 64-bit
  4. Solid, proven, reliable architecture
  5. RISC instead of poo CISC
If by "outdo one another with our tech knowledge" you mean "have an interesting technical discussion about the relative merits of two processors" then yes, you're right. Also, "poo CISC"? What is this, kindergarten?

<edit>
Originally Posted by BurpetheadX
whether they be for rendering, medical, compiling, or any type of distributed computing (like SETI). This completely comes down to processor / FPU performance
Compiling is most certainly not a floating point task. Lots and lots and lots of integer test&branch. In tests done on #macdev recently an iiMac came close to a quad powermac while running gcc. You're correct though that the design choices in the G5 do lend themselves very well to streaming floating point tasks.
</edit>
     
Forum Regular
Join Date: Nov 2005
Status: Offline
Reply With Quote
Feb 19, 2006, 02:23 PM
 
Originally Posted by BurpetheadX
  1. Two FPU's per processor
  2. Super fast bus speed
  3. 64-bit
  4. Solid, proven, reliable architecture
  5. RISC instead of poo CISC
Well, life isn't quite so simple.

1) The G5 has two high-latency FPUs per processor with no internal bypassing. For all intents and purposes, that means it appears to the compiler's scheduler as a 12-FPU machine running at 1/6th the clockspeed (eg: 333 Mhz for a 2 GHz G5). The Core Duo has one FPU per processor, but with a latency of only four clocks. That means it appears to the compiler's scheduler as a 4-FPU machine running at 1/4th the clockspeed (500 MHz for a 2 GHz G5). Now, most code has the parallelism to take advantage of the G5's FPU pipelines, so the G5 will still win most FPU benchmarks. However, some code won't.

2) The G5's bus is stronger in some ways than a core duo's, but weaker in some ways. It's higher latency than the Core Duo's bus, which impacts a lot of code pretty badly. In the beginning days of the P4, people found that RDRAM wasn't usually a performance win even though it was higher-bandwidth than regular DDR SDRAM. That was because RDRAM was higher latency, which negated its bandwidth advantage. Moreover, the G5's bus is dual-unidirectional. That means a 2.0 GHz G5 has 8 GB/sec of total bus bandwidth, but can only use half of it in a given direction. So if your code does only reads but no writes, the G5's bus will be limited to 4 GB/sec. The Core Duo's bus has less overall bandwidth, but its more flexible. It's got 5.33 GB/sec in either direction. So if your code is doing all reads, it can still use the full speed of the bus.

3) 64-bit usually makes the G5 slower, unless your app needs > 4GB of RAM. That's because 64-bit uses larger pointers, which increase the memory usage of the app and decrease the effective cache size of the processor.

4) The Core Duo is based on the P6 architecture, which is a lot more mature than the POWER4 architecture.

5) RISC is more of a liability these days than an advantage. It results in big code (PowerPC code is more than 50% larger than AMD64 code), which decreases the effective size of the instruction cache. It makes tricks like microps fusion difficult, because each RISC operation only does one thing at a time. RISC was a big advantage back when transistor count was the bottleneck, but these days, power usage and wire delay are the bottleneck. These days, spending a few million transistors to do x86 decoding is nothing compared to being able to take advantage of the more complex nature of x86 instructions in order to reduce the number of operations that have to be tracked internally within the processor.
     
Mac Elite
Join Date: Aug 2001
Status: Offline
Reply With Quote
Feb 19, 2006, 03:01 PM
 
Originally Posted by rhashem
4) The Core Duo is based on the P6 architecture, which is a lot more mature than the POWER4 architecture.
Well... that's a bit like saying the G4 is based on the 603e. Certainly there are inherited traits, but rather a lot of stuff was redesigned.

Originally Posted by rhashem
5) RISC is more of a liability these days than an advantage. It results in big code (PowerPC code is more than 50% larger than AMD64 code), which decreases the effective size of the instruction cache. It makes tricks like microps fusion difficult, because each RISC operation only does one thing at a time. RISC was a big advantage back when transistor count was the bottleneck, but these days, power usage and wire delay are the bottleneck. These days, spending a few million transistors to do x86 decoding is nothing compared to being able to take advantage of the more complex nature of x86 instructions in order to reduce the number of operations that have to be tracked internally within the processor.
Certainly CISC instruction sets are less of a disadvantage than they were when x86 decode logic was a significant cost in transistors, and certainly one can reduce the disadvantage by clever tricks like pre-decoding and trace caches, but I'd hardly say that RISC is a liability. There are numerous flaws countering out the code density advantage of variably sized instructions; from the limited architected register count, to the mind-bending tricks required to execute micro-ops out of order in parallel while pretending to execute x86 ops in order and serially, to the added power usage and latency of the decode pipeline stages. x86 also has flaws beyond just general CISC ones such as the bizarre x87 fpu (thankfully Apple doesn't have to deal with that) and the segmented memory addressing scheme.
     
Forum Regular
Join Date: Nov 2005
Status: Offline
Reply With Quote
Feb 19, 2006, 05:05 PM
 
Originally Posted by Catfish_Man
Well... that's a bit like saying the G4 is based on the 603e. Certainly there are inherited traits, but rather a lot of stuff was redesigned.
The G4e's execution core is very substantially different from the 603e's. Its a 3+1 issue design versus the 603e's 2-issue design, it's got more execution units, a much bigger reorder buffer, better branch predictor, different bus, etc. The Core Duo has a lot of redesigned parts, like the branch predictor, bus, and a better decoder, but its got the same basic execution core and a very similar 3-issue frontend. It's not at all a stretch to say that the Core Duo is based on the mature P6 architecture.

Certainly CISC instruction sets are less of a disadvantage than they were when x86 decode logic was a significant cost in transistors, and certainly one can reduce the disadvantage by clever tricks like pre-decoding and trace caches, but I'd hardly say that RISC is a liability.
It's not clear that x86 is really a disadvantage these days at all. Certainly, the performance figures suggest that x86 isn't a bottleneck. The overhead of x86 in terms of transistor count is minimal these days. Indeed, the POWER4 design shows that spending a few clock cycles and a couple of decode stages to convert instructions to a native ISA can be beneficial even for a RISC architecture, because it decouples the processor design from the ISA and gives designers additional flexibility.

from the limited architected register count,
The limited register count is largely offset by very compact and flexible ways to express loads and stores. The explicit load/store model really bloats certain memory-intensive programs (eg: most object-oriented programs). Of course, amd64 eliminates the register-count issue --- beyond its 16 architectural registers, the performance improvement becomes quite marginal.

to the mind-bending tricks required to execute micro-ops out of order in parallel
It's really not that mind-bending. Most x86 operations actually decode into a single micro-op. AMD's x86 guy has been quoted as saying something along the lines of "x86 is quirky, but it really doesn't make the processor harder to implement". And of course, the meatier semantics of CISC instructions make things like micro-ops fusion easier to implement. Both the Opteron and the Core Duo use the fact that x86 instructions can have memory operands to reduce the number of explicit load/stores that must be tracked within the processor. The Merom architecture will take this idea even further. The size of the CPU increases quadratically with the number of in-flight operations, so if a CISC-y ISA can result in 10% fewer instructions being in-flight, that can result in a very significant reduction in core size and wire delays. The fact that 3-issue designs like the Opteron and the Core Duo can beat 3+1 and 4+1 designs like the G4 and G5 in integer code is a testament to the utility of ops-fusion in real-world code.

to the added power usage and latency of the decode pipeline stages
As I pointed out earlier, spending extra decode stages simplifying the instruction stream can be a nice win even in RISC designs like the G5. The power-usage argument is a bit misleading. Yeah, the decoders are bigger and take more power, but if you use ops-fusion, you can get away with a slightly smaller reorder buffer, which takes less power. Which route is better really depends on the specifics of the situation.

x86 also has flaws beyond just general CISC ones such as the bizarre x87 fpu (thankfully Apple doesn't have to deal with that) and the segmented memory addressing scheme.
I think its important to note that Apple's use of x86 is only temporary. It will, in due time, switch to the amd64 ISA. The amd64 ISA relieves most of the glaring weaknesses in the x86 design, such as the size of the register file and the x86 FPU, while keeping the strengths of compact code size and higher-level instructions.

Overall, I think I have to take back the "liability" part. "Liability" is probably too strong of a word. However, I think its important to realize that x86 versus powerpc is really a matter of engineering tradeoffs in ISA design, not "poo versus ice cream". I'd argue that given the current stage of technology, x86 makes a better set of tradeoffs than does conventional RISC.
     
Mac Elite
Join Date: Aug 2001
Status: Offline
Reply With Quote
Feb 19, 2006, 07:11 PM
 
Originally Posted by rhashem
The G4e's execution core is very substantially different from the 603e's. Its a 3+1 issue design versus the 603e's 2-issue design, it's got more execution units, a much bigger reorder buffer, better branch predictor, different bus, etc. The Core Duo has a lot of redesigned parts, like the branch predictor, bus, and a better decoder, but its got the same basic execution core and a very similar 3-issue frontend. It's not at all a stretch to say that the Core Duo is based on the mature P6 architecture.
I see your point, but I think you may underestimate the changes to the CoreDuo core from P6. Consider that both ops fusion and SSE2/3 have been added, for example. The 603e/G4 comparison was exaggerated though, I admit

Originally Posted by rhashem
It's not clear that x86 is really a disadvantage these days at all. Certainly, the performance figures suggest that x86 isn't a bottleneck.
Possibly due to Intel and AMD's resources and superb engineers (IBM, at least, is known to rely heavily on automated design tools. I've heard this is a reason for the 2 cycle latency on the Power4/5/PPC970 integer unit), but I agree it shows that it's possible to implement very high performance processors with x86.

Originally Posted by rhashem
Overall, I think I have to take back the "liability" part. "Liability" is probably too strong of a word. However, I think its important to realize that x86 versus powerpc is really a matter of engineering tradeoffs in ISA design, not "poo versus ice cream". I'd argue that given the current stage of technology, x86 makes a better set of tradeoffs than does conventional RISC.
Ya, I'm slightly on the other side of the line, but I can definitely see what you're saying.

It raises an interesting question of what a ground-up modern CISC ISA would look like Most of the recent ones I've heard about (IA64, Crusoe) are VLIW designs, but I tend to not notice stuff outside the PC/Server market.
     
Fresh-Faced Recruit
Join Date: Feb 2006
Status: Offline
Reply With Quote
Feb 21, 2006, 04:20 PM
 
For those doing and iterative numerical computations, where the differences between successive solutions to an equation become smaller, denormalized numbers---those closer to floating point zero than machine precision---can arise. It turns out that x86-based processors of all stripes including P4's, Opterons, and Xeons, get burned *badly* by them, e.g., slow-downs by factor of 100 or more. It turns out that denormals must be processed by specialized micro-code that isn't pipelined. I literally would watch output from each iteration come at a rapid clip, until the algorithm got near convergence, at which point it ground to a halt choking on denormals. This problem comes up in filters for audio processing as well. A work-around is to truncate denormal numbers to zero, but this violates the IEEE754 standard for floats and is considered risky (think Ariane 5). Other work-arounds depend on the problem you're solving, e.g., ensure that denormals are impossible, but I'll be damned if I'll waste my time just because somebody wanted to save some cash. By contrast, the G5 is a champ and handles denormals quickly due to specially designed extra circuitry in the Power4 design. There's some stuff on the web about the Power4 / G5 advantage.

Frankly, if you are serious about *general* numerical computation in open-areas of research where you want reliable results, G5 is much stronger in my view, precisely because of the denormal issue (anecdotally, I also noticed a 50% speed up for the G5 on some partial differential equation codes that didn't involve denormals). Historically, truncate-to-zero was common practice until the very thoughtful designers of IEEE754 showed inaccuracies that are so introduced, and IEEE754 standardized denormals. However, no personal computer architecture before G5 dealt with this in hardware. It's quite sad no PC will have denormal support for a while after the G5 is gone, since there is little consumer incentive for it, forcing serious computational scientists back to Power5 at higher prices.

Basically G5 provides workstation-level numerics to a personal computer. The newer G5's also support ECC memory so you don't have to reboot to flush out the weekly bit errors that occur in non-ECC memory.

That said, for smash-and-grab computations where you know already what you're doing, x86 machines can be faster.
     
Posting Junkie
Join Date: Oct 2005
Location: Houston, TX
Status: Offline
Reply With Quote
Feb 21, 2006, 09:04 PM
 
Originally Posted by BurpetheadX
[*]Super fast bus speed
Show me where it matters.

Originally Posted by BurpetheadX
Application peformance is only one aspect of computing. G5's excel with use in farms - whether they be for rendering, medical, compiling, or any type of distributed computing (like SETI). This completely comes down to processor / FPU performance. The Virginia Tech cluster demonstrates this.
Server farms are just another application, and they're not all FPU bound.
VT's cluster shows nothing. If the G5 is so hot, what is its share of the top 500 super computers?



3.4%. But comparing supercomputer use of CPUs for desktop use seems rather silly. Why isn't Apple making a PPC440 Mac? Because that wouldn't make any sense.
     
Dedicated MacNNer
Join Date: Apr 2003
Status: Offline
Reply With Quote
Feb 22, 2006, 01:11 PM
 
Originally Posted by thejam
It's quite sad no PC will have denormal support for a while after the G5 is gone, since there is little consumer incentive for it, forcing serious computational scientists back to Power5 at higher prices.

Basically G5 provides workstation-level numerics to a personal computer. The newer G5's also support ECC memory so you don't have to reboot to flush out the weekly bit errors that occur in non-ECC memory.

That said, for smash-and-grab computations where you know already what you're doing, x86 machines can be faster.
I agree 100% - the G5 has some great features. I'm involved with a bunch of scientists and have performed code profiling and benchmarking to advise on which gives them best bang for buck. Besides the fact that they like the nice, friendly interface of OS X, the applications they spend ~75% of the cpu hours running have been optimized for Altivec and give speed-ups of between 10x and 30x compared with the (few) Opterons they have. Curiously, the same apps have never been optimized for any variety of SSE, but are very widely used in their niche community on white-box Intel/AMD Linux boxes.

One app that I wrote my self does a large number of geometric calculations, and the hardware square root on the G5 means that it flies....
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 05:17 PM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2