Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Hardware - Troubleshooting and Discussion > Mac Desktops > The upcoming Gulftown/Westmere Mac Pro

The upcoming Gulftown/Westmere Mac Pro (Page 4)
Thread Tools
mduell
Posting Junkie
Join Date: Oct 2005
Location: Houston, TX
Status: Offline
Reply With Quote
Jun 30, 2010, 12:48 PM
 
Originally Posted by pixelmason View Post
how would one avoid using all 4 [or 6,8,12,24] cores? it's called grand central dispatch, new feature in OS X.6. there are plenty of uses for more cores and faster graphics... there will always be a need for more. particularly when you isolate the creative field. this should never be a question. more!
GCD makes the programming marginally easier, but it doesn't let you max out an arbitrary number of cores. Some algorithms just don't scale that well.

Originally Posted by pixelmason View Post
Why would I:

buy an iMac27i7 that has maxes out the potential of the hardware. meaning an upgrade consist of ebaying the old machine and buying a new one.

buy the base model MP8core2.26, which is way overprice [even on an absolute scale with other macpros]. as the price drops on the now ever-so-pricey x5580 down to sub-1k range, this would yield an opportunity to increase the computing power by nearly 33%.
I guess you haven't been following Intel's chip pricing/upgrade scheme.
     
SierraDragon
Mac Elite
Join Date: Mar 2004
Location: Truckee, CA
Status: Offline
Reply With Quote
Jun 30, 2010, 01:21 PM
 
Originally Posted by P View Post
I think ifixit.com sells a kit where you put an SSD in the optical slot.
Perfect idea, that makes an iMac into an entirely different beast. An iMac with an SSD and a 2-TB HD is a great solution for folks who want good value 4-core power and can tolerate glossy (and many can in a properly configured desktop workspace). Now if they would just make a matte display available...

As long as I needed to open up the iMac anyway (not fun) I would consider removing the stock HD to an external case and putingt a fast third party internal HD in along with the SSD. Too much added heat from a fast third-party drive into the compromised iMac form factor might be an issue though.

I do think that we probably should not claim comparative performance rankings of generally non-upgradable boxes like iMacs against stock versions of machines specifically intended to be upgradeable like MPs.
     
pixelmason
Registered User
Join Date: Jun 2010
Status: Offline
Reply With Quote
Jul 1, 2010, 03:48 AM
 
Originally Posted by P View Post
This is the type of post I probably shouldn't even answer, but:

1) External does not equal internal. External drives are connected to the outside of the computer using USB, Firewire or eSATA. Both the iMac and the MP have USB and Firewire, neither has eSATA. Internally is a different story, there you need an MP.

The usage patterns are quite different. If you want to archive stuff, or just add storage capacity, or transfer files between computers, you use external drives. If you need scratch disks and access latency is critical, you use internal drives or possibly eSATA. If you specify external, that means the first use case, where an iMac is just fine.

2) The Intel CPU you linked to is an i7-980X. That CPU is not used in any shipping MP, nor is it rumored to ever be included. If you meant the hexacore Xeons, they are called Xeon 5600 and 3600, here. Note that the language is a little less over the top. The actual CPU in the MP today, the one you can buy, is the Xeon 3500 and Xeon 5500. That one is a lot less impressive compared to the i5-750 and i7-860 in the iMac.

3) Grand Central Dispatch is a tool for programmers to make multithreaded code more easily. It is not a magic tool that makes all old code multithreaded. We'll see how much it is used in the future.

Again: It depends on the workload how useful those extra cores are - or might be, if/when Apple gets around to shipping them. In much the same manner, the other advantages of the MP might be of more or less use to you. The point is that for most of Apple's history, the top Powermac/Mac Pro was faster than the consumer Macs at everything. That is no longer the case. If you go totally eighties in your benchmarking and run a single thread on all shipping non-BTO Macs, the fastest machine will be the Core i5 iMac. That is, as far as I know, unique in Apple's history. If you permit BTO models and include up to 4 cores, the Core i7 iMac will be essentially tied with the top quad MP.
glad you did though, and I really agree with most of this. i think the thing in number one is that for very little you can install eSATA in a MP and it is much faster than fw800. really not a reason to buy one for many people though.
on number two. i know nothing ships with these processors. it is the subject of the thread though... as for three, well i know little about GCD as i do not code apple apps. my impression though was that it 'manages' the threading of apps. i often have apps running simultaneously. how does running multiple apps tie into core usage? does the os 'hand out' threads on per app basis? if so couldn't more core become more useful?

it's is a sad/depressing story you tell though. i mean all these cores and nothing knows how to take advantage of them, not even the os? i don't expect to see a mac pro with a 9GHz processor anytime soon, but one with 24 threads @ 2.4GHz maybe. What use are the extra cores if nothing uses them?
     
pixelmason
Registered User
Join Date: Jun 2010
Status: Offline
Reply With Quote
Jul 1, 2010, 04:00 AM
 
Originally Posted by mduell View Post
GCD makes the programming marginally easier, but it doesn't let you max out an arbitrary number of cores. Some algorithms just don't scale that well.



I guess you haven't been following Intel's chip pricing/upgrade scheme.
re: my previous post about GCD. but this discussion about cores isn't aimed at this program or that program, but all of them. not some legacy software that isn't being updated, but the latest version that came out yesterday. and running at the same time as 4 or 5 other apps. big monster apps that horde ram and processing power. this is the scenario.

i follow intel's roadmap and pricing quite closely, yes. the lga1366 socket is quite scalable right now. and two even more so. damned if those qpi's aren't pricey in pairs though. the only true deficit of the i7 family...
     
P
Moderator
Join Date: Apr 2000
Location: Gothenburg, Sweden
Status: Offline
Reply With Quote
Jul 1, 2010, 07:00 AM
 
Originally Posted by pixelmason View Post
as for three, well i know little about GCD as i do not code apple apps. my impression though was that it 'manages' the threading of apps. i often have apps running simultaneously. how does running multiple apps tie into core usage? does the os 'hand out' threads on per app basis? if so couldn't more core become more useful?
Somewhat simplified: Each process gets a reserved memory area and one execution thread to operate on data in that memory or communicate with the OS to get more data somehow. That thread can then spawn more threads - which operate in the same memory area - or more processes - which operate in their own memory area. The operating system kernel is responsible for assigning a thread to a core according to its priorities, and to make sure that the memory area that the process gets is backed by either real or virtual memory.

The way a modern operating system works, the process usually doesn't know or care how many cores there are, when the threads are actually executing, whether the memory area that it has been promised actually exists, etc. It relies on the OS to supply these things as needed.

Any Mac OS X session has lots of threads running at any time, but that doesn't mean they they're executing more than a tiny sliver of the time. The threads that really need a significant amount of processing power are very few. This is because the traditional way of programming is to have one big thread that executes things in one order, and APIs and methods have developed to support that. Most programs have one "main" thread that everything executes on, and then pushes certain tasks onto other threads. The tricky bit for the programmer is that the order that things are executed is now completely unknown. This makes threading hard.

What GCD is, essentially, is a set of programming tools that makes this task of pushing tasks onto second threads a little easier. If it is used, it makes things a little easier. If it isn't used, it does nothing.

Originally Posted by pixelmason View Post
it's is a sad/depressing story you tell though. i mean all these cores and nothing knows how to take advantage of them, not even the os? i don't expect to see a mac pro with a 9GHz processor anytime soon, but one with 24 threads @ 2.4GHz maybe. What use are the extra cores if nothing uses them?
Three reasons:

1) Some tasks can easily be threaded. Database access (each query is one thread), 3D rendering (each pixel can be rendered separately), compression (working on different subsets of data) etc.

2) There is no other way to substantially increase performance, and Intel, AMD, IBM etc like to keep selling CPUs. Intel in particular tried REALLY hard to keep boosting single threaded performance. This died with the Prescott flameout in 2004.

3) There is nothing else to put the transistors on. It is economical to produce CPUs with an area between 150 and 300 mm2. Less than 150, and you can add performance by integrating more, and the competition will do that and outrun you. More than 300, and defects will decrease yield too much. The dualcore Core i3/i5 CPU without memory controller are really too small at 81 mm2, but it's OK because it's the first CPU on the 32nm process, and early CPUs usually have more defects than usual.

All of these things mean that future CPUs will have at least 2 cores and integrate as much as possible from the motherboard onto the chip for lower latency. Anything over the lowest tier of bargain basement will have 4 cores. Servers in particular will get ever increasing core counts - Beckton is already at 8, AMD has one at 12. The only way to increased performance is to make use of those cores. Hopefully Adobe et al have that as their priority.
( Last edited by P; Jul 1, 2010 at 07:07 AM. )
The new Mac Pro has up to 30 MB of cache inside the processor itself. That's more than the HD in my first Mac. Somehow I'm still running out of space.
     
SierraDragon
Mac Elite
Join Date: Mar 2004
Location: Truckee, CA
Status: Offline
Reply With Quote
Jul 1, 2010, 11:10 AM
 
Thanks P for that analysis.

I would love to read a similar analysis of how the GPU is integrated into the process.

-Allen
     
pixelmason
Registered User
Join Date: Jun 2010
Status: Offline
Reply With Quote
Jul 1, 2010, 01:46 PM
 
yes, this is exactly the education i need right now. i am an architecture design student and have been accepted into professional school. we are now allow/required to use computers for our design work [after the first two years of analogue studios]. all this info will help me maximize my ever-diminishing dollars. it is an exciting time for processors though. i remember when two single core xeons were a big deal [@ 1.33 GHz]. Thanks P. and everyone else!
     
P
Moderator
Join Date: Apr 2000
Location: Gothenburg, Sweden
Status: Offline
Reply With Quote
Jul 1, 2010, 04:53 PM
 
Originally Posted by SierraDragon View Post
I would love to read a similar analysis of how the GPU is integrated into the process.
I know less about how the GPU works, because I find it less interesting. 2D graphics I know decently well: each application draws to memory - 4 bytes (with alpha channel) per pixel. This is the copied to the windowserver, which creates a composite of what all the windows contain. Quartz Extreme means that the GPU accelerates this compositing stage. The windows can contain 2D or 3D graphics. 3D graphics is accelerated using OpenGL - more about this below. 2D graphics is drawn using either Quickdraw or QuartzGL. Both of these can be accelerated by the graphics card hardware - basically, and operation like "Draw rectangle" with a color, starting point and size is sent directly to the GPU instead of having the CPU interpret what it would mean and telling the GPU to chaneg certain pixels.

3D is where I'm missing pieces. In its simplest form, the so called fixed function pipeline, the graphics hardware goes through one pixel at a time and evaluates whether each and every object crosses that pixel (after being placed and rotated in the space). For each shape that crosses that pixel, the Z-value is evaluated, so only the object closest to the observer is drawn. For the closest object, a texture is loaded to determine what color the object has in that exact position. Once all objects have been evaluated, the process starts over with the next pixel. This operation is perfectly parallelizable, because you can evaluate any number of pixels at the same time.

Now comes the tricky part. We don't really use fixed function pipelines anymore - we use so called shader programs, that generate the pixels on the fly in the GPU. These programs take the models and the textures as input and then work on them - light them up, distort them, etc. From the Geforce 8000/Radeon 2000 series on, the shaders are universal, so one processor does pixel shading, vertex shading and geometry shading. Each graphics card has a number of these, and knowing the number of shaders and how fast they run is a good way to estimate the power of a graphics card. Each shader is like a small, fairly limited, processor core.

For instance, the Radeon 4850 in the iMac Core i5 has 800 shaders. The Radeon 4670 in the lower end models has 320 shaders, so is less than half as powerful. You can't really compare ATi shaders to nVidia shaders, though - as a rule of thumb, you can multiply nVidia shaders with 5 to get a comparable number to ATi, but after that you have to consider clockspeed as well. nVidia tends to use fewer shaders but run them faster.

Another important factor to consider is memory bandwidth. Those textures I mentioned earlier have to be stored somewhere, and then brought to the shader hardware as fast as possible. That takes a lot of memory bandwidth. Tha bandwidth is determined by three things: the width of the memory channel, its clockspeed, and how many bits can be transfered per clock. That last is determined by the type of memory, but is usually baked into the clockspeed number. DDR3-1066 RAM, for instance, doesn't really run at 1066 MHz. It runs at a slower speed (which also is different in different parts of the chip, but anywhere from 533 to 133 MHz) but it manages 1066 million transfers per pin per second, and so it called "1066" RAM. Integrated graphics chip usually share this memory bus with the CPU. That can be very limiting, partially because the CPU memory bus is very narrow compared to top graphics cards, but fast DDR3 RAM helps some.

The most interesting bit here is those shader processors. 800 cores is a lot, even if they're limited - shoudln't you be able to use them for something? You can, and that's where OpenCL comes in. OpenCL is a way to execute regular programs on those shader processors. GPUs aren't really set up for that yet - for instance, only a tenth of those Radeon shader processors can do double precision math, which is the most common kind - but they're getting there. AMD is planning to use this by making their CPUs focus on integer math with lots of cores and then send the floating point math to the GPU - which will also be on the CPU die soon enough.
The new Mac Pro has up to 30 MB of cache inside the processor itself. That's more than the HD in my first Mac. Somehow I'm still running out of space.
     
reader50
Administrator
Join Date: Jun 2000
Location: California
Status: Offline
Reply With Quote
Jul 1, 2010, 05:50 PM
 
Originally Posted by P View Post
2D graphics I know decently well: each application draws to memory - 4 bytes (with alpha channel) per pixel. This is the copied to the windowserver, which creates a composite of what all the windows contain. Quartz Extreme means that the GPU accelerates this compositing stage. The windows can contain 2D or 3D graphics. 3D graphics is accelerated using OpenGL - more about this below. 2D graphics is drawn using either Quickdraw or QuartzGL. Both of these can be accelerated by the graphics card hardware - basically, and operation like "Draw rectangle" with a color, starting point and size is sent directly to the GPU instead of having the CPU interpret what it would mean and telling the GPU to change certain pixels.
This is slightly off. Each application gets one (or more) window layers, where they draw the window (or dialog) they want to. Quartz assembles these layers into a completed display, placing the layers in correct order and adding shadowing.

Quartz Extreme moves the layer compositing to the graphics card. QE copies each window layer to buffer space on the graphics card, tells the GPU what their 3D orientation is (depth, order, etc) and specifies the shadowing. That way the graphics card does all the assembly.

Quartz 2D Extreme tried to go further, to have the GPU execute the drawing commands as described above. To not just assemble (composite) the layers, but to construct their contents. The project was abandoned because GPUs varied. Apple engineers could not make different cards produce pixel-identical results. It should give identical output on all supported cards, and be identical to composited QE output.
     
P
Moderator
Join Date: Apr 2000
Location: Gothenburg, Sweden
Status: Offline
Reply With Quote
Jul 1, 2010, 06:15 PM
 
Originally Posted by reader50 View Post
This is slightly off. Each application gets one (or more) window layers, where they draw the window (or dialog) they want to. Quartz assembles these layers into a completed display, placing the layers in correct order and adding shadowing.
Yes, but it's a copy. This is an important point - each window content (which may be larger than the window - think scroll bars) is buffered twice. This is why you can scroll windows and move them around and have the drawing update correctly even if the app is not responding.

Originally Posted by reader50 View Post
Quartz Extreme moves the layer compositing to the graphics card. QE copies each window layer to buffer space on the graphics card, tells the GPU what their 3D orientation is (depth, order, etc) and specifies the shadowing. That way the graphics card does all the assembly.
To the graphics card, the window layer is just another texture.

Originally Posted by reader50 View Post
Quartz 2D Extreme tried to go further, to have the GPU execute the drawing commands as described above. To not just assemble (composite) the layers, but to construct their contents. The project was abandoned because GPUs varied. Apple engineers could not make different cards produce pixel-identical results. It should give identical output on all supported cards, and be identical to composited QE output.
Quartz 2D Extreme was renamed Quartz GL, and exists in Leopard. The trick is that the acceleration is off by default, and must be enabled on a per-window basis. One of the features of Apple libraries like Core Animation is that they work well with Quartz GL.
The new Mac Pro has up to 30 MB of cache inside the processor itself. That's more than the HD in my first Mac. Somehow I'm still running out of space.
     
 
Thread Tools
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 01:13 PM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,