 |
 |
Confused About Tiger's 64-Bit Implementation
|
 |
|
 |
|
Dedicated MacNNer
Join Date: Jun 2002
Status:
Offline
|
|
"Tiger provides 64-bit memory addressing for applications that need it most, while continuing to support 32-bit applications. In fact, many 32-bit applications will run more efficiently under Tiger, thanks to over-all system improvements and fine-tuning."
Will Tiger only have 64-Bit memory addressing? With no 64-bit processor addressing? Or is there a difference? I would think that 64-bit processor addressing is where real speed would come from...
|
|
|
| |
|
|
|
 |
|
 |
|
Addicted to MacNN
Join Date: Aug 2004
Location: FFM
Status:
Offline
|
|
No real speed comes from 64 bit. It just means that an application can address more memory than a 32 bit application. If an application does not need more than 4 GB of RAM (which most don't) it won't benefit from going 64 bit.
I don't know what you mean by "64 bit processor addressing".
|
|
|
| |
|
|
|
 |
|
 |
|
Dedicated MacNNer
Join Date: Jun 2002
Status:
Offline
|
|
Originally posted by TETENAL:
No real speed comes from 64 bit. It just means that an application can address more memory than a 32 bit application. If an application does not need more than 4 GB of RAM (which most don't) it won't benefit from going 64 bit.
I don't know what you mean by "64 bit processor addressing".
I always thought that a 64-bit processor could process 64-bit chunks of data, as opposed to the usual 32-bit. Meaning that a 64-bit processor could theoretically be twice as fast as a 32-bit processor.
So 64-bit just means that it can address more memory? Guh?
|
|
|
| |
|
|
|
 |
|
 |
|
Professional Poster
Join Date: Nov 2000
Location: Tasmania, Australia
Status:
Offline
|
|
Originally posted by BurpetheadX:
I always thought that a 64-bit processor could process 64-bit chunks of data, as opposed to the usual 32-bit. Meaning that a 64-bit processor could theoretically be twice as fast as a 32-bit processor.
So 64-bit just means that it can address more memory? Guh?
Processing 64-bit words doesn't make it faster than processing 32-bit words in most cases. The exception is in mathematical/scientific application where they would jump through hoops to emulate larger words than available on the processor. But for most of us it will make no difference.
|
|
|
| |
|
|
|
 |
|
 |
|
Professional Poster
Join Date: Apr 2001
Location: Long Beach, CA
Status:
Offline
|
|
Originally posted by BurpetheadX:
I always thought that a 64-bit processor could process 64-bit chunks of data, as opposed to the usual 32-bit. Meaning that a 64-bit processor could theoretically be twice as fast as a 32-bit processor.
So 64-bit just means that it can address more memory? Guh?
The misconceptions come from the jumps from 8 bit to 16 bit and from 16 bit to 32 bit. With 8 bits, you can represent positive integers from 0 to 255. Clearly, this isn't enough. Programmers had to come up with ways to represent significantly larger numbers using multiple 8 bit registers. So, going up to 16 bits (up to 64k values) made this easier. However, 64 thousand still isn't big enough to the vast majority of integer calculations. Therefore, the jump to 32 bits meant that we could represent integers up to 4 billion. This is plenty for the vast vast majority of integer math. Therefore, going to 64 bits doesn't help out for that vast majority of stuff that works just fine in 32 bits. Therefore, 64 bits does not mean the same speed boost that 32 bits did.
That's it in a nutshell.
The market had to go to 64 bits because we finally hit the 4GB memory limit of 32 bit processors. The market didn't change because they could finally fit more bits in a register.
|

ACSA 10.4/10.3, ACTC 10.3, ACHDS 10.3
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Oct 1999
Location: San Jose, Ca
Status:
Offline
|
|
To expand on what Detrius said, the "bitness" of a processor is a confusing thing, as it generally refers to the address space that it works on, but some 32 bit processors have 64bit floating point units (the parts that deal with decimal numbers), and some 32bit cips (like the G4) actually have a 36 bit address space (there are some wonky instructions in there... rarely used.. but there none the less).
And the data pathways in the G4 are mostly 128bit, with some 256bit paths for the Altivec units (which are 128bit units).
Things get even weirder if you start to go back to the 286... it had a 16bit memory space, but a 32bit integer unit, but was marketed as a 32bit chip...
In any event, the G5 is a fully 64bit chip (not counting the pathways.. or the altivec like units), with a 64bit address space, 64bit floating point, 64bit registers, 64bit integer, etc... 64bitness will not make it faster in 90% of consumer apps, but will make real differences in scientific and high end database work. There are pleanty of other improvemetns both in the G5, and in 10.4 besides the bitness to more than make up for that thought... it is just that many people will grasp at the number without bothering to understand what it means...
|
|
|
| |
|
|
|
 |
|
 |
|
Senior User
Join Date: Dec 2002
Location: Portland, OR
Status:
Offline
|
|
detrius and larkost,
Thanks. Was wondering about that myself 
|
|
iMac - C2D, 2.8Ghz, 4GB, 320GB
MacBook - C2D, 2.4Ghz Uni, 4GB, 500GB
|
| |
|
|
|
 |
|
 |
|
Dedicated MacNNer
Join Date: Dec 2001
Location: Promised Land
Status:
Offline
|
|
All things being equal, a 64b CPU will actually be a little slower than an equiv 32b CPU. 64b is the twice the size of 32b, this means that twice the amount of data has to be passed to/from the CPU when running a 64b app. Each ptr to a memory location in said app will be 64b, so everytime a memory address is resolved, you have to send double the data. This also increases the footprint in the L2 instruction and TLB caches, so not as many addresses can be cached (which means localitiy of reference is hampered).
The reason the G5 (and AMD) systems are faster than their 32b cousins, is because they have faster CPU clocks, faster system bus, and faster board controllers. This overshadows the performance lost when dealing with 64bit ptrs.
As noted already, the big gain with 64b is being able to address 16EB of memory.
BTW, this also applies to 64b apps, they will generally be slower than 32b counterparts (excepting scientific apps that deal with very large numbers or very large data sets).
|
|
G5 2.5 DP/2GB RAM/NVidia 6800 Ultra
PowerBook Al 1Ghz/768MB RAM
6gb Blue iPod Mini
|
| |
|
|
|
 |
|
 |
|
Professional Poster
Join Date: Nov 2000
Location: Tasmania, Australia
Status:
Offline
|
|
Originally posted by someone_else:
All things being equal, a 64b CPU will actually be a little slower than an equiv 32b CPU. 64b is the twice the size of 32b, this means that twice the amount of data has to be passed to/from the CPU when running a 64b app. Each ptr to a memory location in said app will be 64b, so everytime a memory address is resolved, you have to send double the data. This also increases the footprint in the L2 instruction and TLB caches, so not as many addresses can be cached (which means localitiy of reference is hampered).
Is data passed between RAM and CPU serially or in parallel? I've always been a bit confused by this.
|
|
|
| |
|
|
|
 |
|
 |
|
Addicted to MacNN
Join Date: May 2001
Location: Cupertino, CA
Status:
Offline
|
|
Originally posted by someone_else:
All things being equal, a 64b CPU will actually be a little slower than an equiv 32b CPU. 64b is the twice the size of 32b, this means that twice the amount of data has to be passed to/from the CPU when running a 64b app. Each ptr to a memory location in said app will be 64b, so everytime a memory address is resolved, you have to send double the data. This also increases the footprint in the L2 instruction and TLB caches, so not as many addresses can be cached (which means localitiy of reference is hampered).
Well if we're going to be picky, all things being equal I'd imagine it's the system that's slower, not the CPU.
|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Oct 1999
Location: San Jose, Ca
Status:
Offline
|
|
someone_else: You are correct... as long as the data pipes remain the same... which they never have. The PowerMac G5 interleaved memory access, so that you can get 64bits at a time (no word yet on the iMac G5), and all of the pipes on the chip are more than beefy enough to saturate the different processing units, so they are going to be kept full (from that aspect at least).
Brass: Cpu's mostly get their data over parallel connections, but there is a lot of serial technology to try and solve the timing issues inherent in parallel busses. A lot of the newer motherboard busses (PCI-X, HyperTransport) are similar hybrids that use something like parallel busses of serial connections... In other words, things are getting messy again...
|
|
|
| |
|
|
|
 |
|
 |
|
Dedicated MacNNer
Join Date: Dec 2001
Location: Promised Land
Status:
Offline
|
|
Originally posted by larkost:
someone_else: You are correct... as long as the data pipes remain the same... which they never have. The PowerMac G5 interleaved memory access, so that you can get 64bits at a time (no word yet on the iMac G5), and all of the pipes on the chip are more than beefy enough to saturate the different processing units, so they are going to be kept full (from that aspect at least).
Right, but with a 32b app, you could fetch two address at once, so with 64b ptrs, your throughput is halved (theoretically of course).
Brass: Cpu's mostly get their data over parallel connections, but there is a lot of serial technology to try and solve the timing issues inherent in parallel busses. A lot of the newer motherboard busses (PCI-X, HyperTransport) are similar hybrids that use something like parallel busses of serial connections... In other words, things are getting messy again...
Ugh. Maybe not "nasty", but confusing. HyperTransport is a point-to-point bus, which I would take to mean a serial packet type protocol (don't know the specifics though). But the pipes into the processor are parallel. Like you said it's becoming a mess.
Originally posted by itali195:
Well if we're going to be picky, all things being equal I'd imagine it's the system that's slower, not the CPU.
You missed my point. 64 bits is not automatically twice as fast as 32 bits. If you took a G4 system, and replaced it with a (theoretically) identical G4 that used 64b words instead of 32b -- it would be slower than the 32b version (assuming you then ran 64b apps). Fortunately, the system around the CPU (and the CPU internals themselves) advance to overcome the speed hit.
(Last edited by someone_else; Sep 2, 2004 at 03:43 PM.
)
|
|
G5 2.5 DP/2GB RAM/NVidia 6800 Ultra
PowerBook Al 1Ghz/768MB RAM
6gb Blue iPod Mini
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: May 2001
Location: ~/
Status:
Offline
|
|
Not only is the OP confused but so are some of the replies in this thread. The 64-bit implementation of Tiger doesn't have anything to do with "64-bit processor addressing" which isn't even a real thing. What Tiger has is system libraries that are compiled in both 32-bit and 64-bit editions. In Panther it is possible for an application written for the G5 to address more than 4GB of memory.
Unfortunately if you're using a system framework and call a class such as NSImage you're limited to a 4GB image because the system framework only supports 32-bit memory addressing. In Tiger it will be possible to use an NSImage that is up to 4TB. Now that you could practically do that but as far as memory limitations are concerned you could. Right now the G5 is limited by the 32-bit nature of the system frameworks so any über-memory applications need to use their own libraries all compiled for the G5.
The only time the G5's 64-bit wide Integer registers make processing faster is when the code is processing long integers. The standard C integer is 32 bits wide meaning it has 32 ones and zeroes with which to represent a value. An unsigned integer can represent up to about 4 billion. A long integer is 64 bits wide meaning it has 64 ones and zeroes to represent a value. An unsigned long integer can represent a number larger than I want to type. Most of the time only normal 32-bit integers are needed for math so that is what are used. Ergo a G5 is going to run normal code no faster being a 64-bit chip than it would as a 32-bit chip.
What makes the G5 exceptionally quick is the number of pipelines it has. Since I'm a nice guy I threw together these diagrams to help people out:
G5
G4
As you can see the G5 has an extra floating point (FP) pipeline above and beyond what the G4 has available. Floating point math is where the G4 really stumbled performance wise. If a 1.25GHz G4 had two FP pipelines instead of one it would be able to run FP intensive code about twice as fast as a normal G4. FP math is used extensively in 3D rendering, physics calculations in games/simulations, high dynamic range image processing, and even high fidelity audio processing. A G5 clocked similarly to a G4 would perform FP math about twice as fast simply by virtue of the extra FP pipeline.
The G5 also doesn't get slowed down much by using larger memory pointers (64 bit wide pointers) since its frontside bus is so fast. The G5's load/store unit is also well optimized to group as many operations together as possible to mitigate any throughput problems because of the larger memory pointers. The interleaved system memory also helps out a great deal when it comes to maximizing the available bandwidth. Despite what some people will try to tell you the G5 is a well designed processor and extremely powerful despite a relatively low clock speed.
Panther's kernel allows the G5s to use as much memory as they can handle but the system frameworks are the capacity bottleneck right now. Tiger will let developers use all of the standard system frameworks but support 4GB+ of memory usage. This will be a big boon to the scientific and engineering crowd that are giving the PowerMacs heavy considerations to replace their Sun and SGI workstations. It will also help out the media crowd that would loave to load a PowerMac down with 16GB of RAM to use with their editing suites.
|
|
|
| |
|
|
|
 |
|
 |
|
Addicted to MacNN
Join Date: May 2001
Location: Cupertino, CA
Status:
Offline
|
|
edit: nm, Graymalkin said it already
|
|
|
| |
|
|
|
 |
|
 |
|
Mac Elite
Join Date: Jan 2000
Location: Seattle, WA, King
Status:
Offline
|
|
A few other notes: the G5's system bus scales with the CPU frequency. So with a 2x multiplier, the bus is 1GHz for the dual 2GHz G5s. For the dual 2.5s, the bus is 1.25GHz. The new iMacs use a 3x multiplier, so the bus speed is relatively slower than it is on the PowerMacs.
Also, you'll notice that while the G4 had essentially three integer pipelines, the G5 has only two.
Now for my own question: why are 64-bit apps in Tiger restricted to the command line?
|
|
|
| |
|
|
|
 |
|
 |
|
Professional Poster
Join Date: Apr 2001
Location: Long Beach, CA
Status:
Offline
|
|
Originally posted by bmedina:
Now for my own question: why are 64-bit apps in Tiger restricted to the command line?
If this is true, it means that the BSD layer has been updated, but the Carbon/Cocoa frameworks have not yet been updated.
|

ACSA 10.4/10.3, ACTC 10.3, ACHDS 10.3
|
| |
|
|
|
 |
|
 |
|
Professional Poster
Join Date: Apr 2001
Location: Long Beach, CA
Status:
Offline
|
|
Originally posted by larkost:
someone_else: You are correct... as long as the data pipes remain the same... which they never have. The PowerMac G5 interleaved memory access, so that you can get 64bits at a time (no word yet on the iMac G5), and all of the pipes on the chip are more than beefy enough to saturate the different processing units, so they are going to be kept full (from that aspect at least).
From what I have seen, the iMac G5 does NOT interleave memory; at least it isn't required. You can tell this by the fact that you don't have to buy memory in pairs at the Apple store when you BTO. I suspect this is related to the FSB being slower.
|

ACSA 10.4/10.3, ACTC 10.3, ACHDS 10.3
|
| |
|
|
|
 |
 |
|
 |
|
|
|
|
|

|
|
 |
Forum Rules
|
 |
 |
|
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
|
HTML code is Off
|
|
|
|
|
|
 |
 |
 |
 |
|
 |
|