Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Hardware - Troubleshooting and Discussion > iPod, iPhone & iPad > ARM Cortex A7: Heterogeneous multiprocessing next in line for Apple?

ARM Cortex A7: Heterogeneous multiprocessing next in line for Apple?
Thread Tools
Eug
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Oct 25, 2011, 01:45 PM
 
I remember the good ol' days when we talked about CPU architectures for new Power Macs / Mac Pros and laptops. Now it seems the big interest is ARM for our iPhones.

I came across this article and was wondering if this might be part of the next architecture for Apple, for the Apple A7.

AnandTech - ARM's Cortex A7: Bringing Cheaper Dual-Core & More Power Efficient High-End Devices

How do you keep increasing performance in a power constrained environment like a smartphone without decreasing battery life? You can design more efficient microarchitectures, but at some point you’ll run out of steam there. You can transition to newer, more power efficient process technologies but even then progress is very difficult to come by. In the past you could rely on either one of these options to deliver lower power consumption, but these days you have to rely on both - and even then it’s potentially not enough. Heterogeneous multiprocessing is another option available - put a bunch of high performance cores alongside some low performance but low power cores and switch between them as necessary.

I've always wondered about this... albeit for laptops, but had assumed it would be too expensive for the mainstream. Is the cost of doing this with Cortex A7 potentially low enough to make it viable for Apple?

ie. Will the A7 have an A7 in it?
     
Professional Poster
Join Date: Mar 2004
Location: UK
Status: Offline
Reply With Quote
Oct 25, 2011, 02:23 PM
 
Why is it called an A7? Aren't ARM already on the A8 or even 9?
MacBook 2.0GHz CD; MacBook Pro 15" 2.4GHz Late '08; PowerMac G4 MDD Dual 1GHz; 3x Xserve G4 1GHz; Mac Mini 2GHz; Big pile of broken and working bits;
     
Clinically Insane
Join Date: Oct 2000
Location: Los Angeles
Status: Offline
Reply With Quote
Oct 25, 2011, 02:25 PM
 
Yeah that's curious, given that the Cortex A8 is at the heart of the Apple A4.

"The natural progress of things is for liberty to yield and government to gain ground." TJ
     
Eug  (op)
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Oct 25, 2011, 02:50 PM
 
Originally Posted by Waragainstsleep View Post
Why is it called an A7? Aren't ARM already on the A8 or even 9?
Yes A9 is here (and in Apple's A5), and actually A7 will be paired with A15.

Wired - Cortex A7 Is Tailor-Made for Android Superphones



I suspect the A7 naming has to do with the fact that even though when it comes out it will be faster than A8 is now, in some ways design-wise it may be considered a cut down version of A8.

AnandTech - ARM's Cortex A7: Bringing Cheaper Dual-Core & More Power Efficient High-End Devices

The Cortex A7 features an 8-stage integer pipeline and is capable of dual-issue. Unlike the Cortex A8 however, the A7 cannot dual-issue floating point or NEON instructions. There are other instructions that turn the A7 into a single-issue machine as well. The integer execution cluster is quite similar to the Cortex A8, although the FPU is fully pipelined and more compact than its older brother.

Limiting issue width for more complex instructions helps keep die size in check, which was a definite goal for the core. ARM claims a single Cortex A7 core will measure only 0.5mm2 on a 28nm process. On an equivalent process node ARM expects customers will be able to implement an A7 in 1/3 - 1/2 the die area of a Cortex A8. As a reference, an A9 core uses about the same (if not a little less) die area as an A8 while an A15 is a bit bigger than both.


Also, on a per clock basis, they say that A7 will achieve 1.9 DMIPS/MHz, whereas A8 does 2.0 DMIPS/MHz. However, the clock speed of A7 is significantly higher.

With these characteristics, Apple could design their Apple A7 (or A6?) with a dual-core A15 and a single or dual-core A7 without that much of a die-size penalty, and get an iPhone/iPad with much faster speeds but longer battery life too.
(Last edited by Eug; Oct 25, 2011 at 03:08 PM. )
     
Mac Elite
Join Date: Mar 2004
Location: Truckee, CA
Status: Offline
Reply With Quote
Oct 25, 2011, 06:41 PM
 
I don't know what it is yet, but I want one...

     
Addicted to MacNN
Join Date: Jul 2004
Location: Toronto
Status: Offline
Reply With Quote
Oct 25, 2011, 08:57 PM
 
I can't see Apple moving any of their laptops away from Intel.
     
Eug  (op)
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Oct 25, 2011, 09:52 PM
 
I think you misunderstand. I had wondered about heterogenous multiprocessing with x86 (not ARM) for laptops.

However, heterogenous multiprocessing with ARM for the iPhone and iPad seems like an interesting idea.
     
Addicted to MacNN
Join Date: Jul 2004
Location: Toronto
Status: Offline
Reply With Quote
Oct 25, 2011, 11:59 PM
 
I don't know why, but I thought your posted link was this one I read the other day: Jon Stokes on ARM's A15 and the MBA. I thought it was kinda out there for Stokes, so it stuck with me I guess.
     
P
Moderator
Join Date: Apr 2000
Location: Gothenburg, Sweden
Status: Offline
Reply With Quote
Oct 26, 2011, 03:13 AM
 
The heterogenous processing thing is about having a low-power core that can keep a device such as a phone in a very low power mode, yet still not quite in standby. Intel has been talking about the same thing as the main feature of Haswell, the successor to Ivy Bridge due in early 2013. Don't know how they're doing it, but one obvious way would be to include an Atom core in the place of the A7 core above. With their process node advantage, Intel doesn't have to be stingy with the transistors.

The A7 is called that because beyond everything else, it will be tiny - way smaller than A8 yet almost as powerful.
The low-end Mac Pro is the most overpriced Mac since the IIvx
     
Moderator
Join Date: May 2001
Location: Hilbert space
Status: Offline
Reply With Quote
Oct 26, 2011, 03:28 AM
 
Originally Posted by P View Post
Intel has been talking about the same thing as the main feature of Haswell, the successor to Ivy Bridge due in early 2013. Don't know how they're doing it, but one obvious way would be to include an Atom core in the place of the A7 core above.
Oh, I didn't know that.
Originally Posted by P View Post
The A7 is called that because beyond everything else, it will be tiny - way smaller than A8 yet almost as powerful.
Overall, SoCs based on the A7 will be more powerful than those based on the A8 since you can run them at higher clock speeds.
I don't suffer from insanity, I enjoy every minute of it.
     
P
Moderator
Join Date: Apr 2000
Location: Gothenburg, Sweden
Status: Offline
Reply With Quote
Oct 26, 2011, 06:02 AM
 
They announced it at IDF, but it got drowned out by Ivy Bridge and the buzzwords that Intel was throwing out (like "Smart Connect Technology"). Haswell seems to be too far down the pipe to have marketing buzzwords yet, so it was more of a mention, but here's a fairly neutral link about what happens.

So let's speculate a bit: For all of this to work, Intel must have one core active, if possibly at a very low clockspeed. Can Intel keep one modern OoOE core running for 10 days? I doubt it. It must be something else that just sits there. It may not be a true core, just a feature that handles the bare necessities, but why bother designing one? The original Atom is 26mm2 at 45nm, which means that it would be 6.5 mm2 at 22nm and that's including a private 512KB L2 cache. According to this, Intel wastes 8 mm2 in every SB quadcore - they clearly aren't concerned with keeping the chips tiny. Why not just include it? Hey, give it access to the L3 and the modern memory controller connection it has, and it won't even be that terrible performance-wise.

Overall, SoCs based on the A7 will be more powerful than those based on the A8 since you can run them at higher clock speeds.
I thought about mentioning that, but honestly, all recent speed-daemon projects have turned out to be incredibly power hungry when you start ramping. I think they will run it at the same 1GHz+ range as the A8.
The low-end Mac Pro is the most overpriced Mac since the IIvx
     
Eug  (op)
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Oct 26, 2011, 07:03 AM
 
Atom does not support SSE4 or VT-x.

1) Does that matter?
2) What about Haswell?
     
P
Moderator
Join Date: Apr 2000
Location: Gothenburg, Sweden
Status: Offline
Reply With Quote
Oct 26, 2011, 08:20 AM
 
Originally Posted by Eug View Post
Atom does not support SSE4 or VT-x.

1) Does that matter?
You could just make those features something that you need to wake a full core to use, but at least SSE4 would be easy enough to add.
Originally Posted by Eug View Post
2) What about Haswell?
I don't understand the question.
The low-end Mac Pro is the most overpriced Mac since the IIvx
     
Eug  (op)
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Oct 26, 2011, 09:02 AM
 
Does Haswell support the full desktop processor instruction set that Atom doesn't?

One of the benefits of Cortex A7 is that it supports the entire instruction set that A15 supports. Any code that will run on A15 will run on A7, albeit with a big performance penalty. In fact, the claim is that one could simply swap in a hybrid A7/A15 CPU into a machine that was written for A15, and the OS wouldn't know the difference. Obviously it would be ideal to tailor the OS to take advantage of heterogenous MP but if you didn't it'd still work.

That's not necessarily the case with current Atom + desktop Intel processor.
     
P
Moderator
Join Date: Apr 2000
Location: Gothenburg, Sweden
Status: Offline
Reply With Quote
Oct 26, 2011, 10:34 AM
 
OK. We know very little about Haswell, but I can't believe that Intel would drop such features. It's supposed to be the mainline successor Ivy Bridge. I think Intel will just add SSE4 and the rest to the Atom if they decide to do this. It can be slow and inefficient, that doesn't matter as long as it runs.
The low-end Mac Pro is the most overpriced Mac since the IIvx
     
Posting Junkie
Join Date: Oct 2005
Location: Houston, TX
Status: Offline
Reply With Quote
Oct 26, 2011, 07:33 PM
 
NVIDIA is doing something similar with Kal-El: it can switch between the 4 main A9 cores and an extra A9 companion core. The A9 even covers everything up to and including light gaming.
     
Eug  (op)
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Oct 26, 2011, 08:57 PM
 
A15 + A7 does seem like a more elegant solution though, as compared to A9 + slower A9.
     
Mac Elite
Join Date: Mar 2004
Location: Truckee, CA
Status: Offline
Reply With Quote
Oct 26, 2011, 10:09 PM
 
More elegant perhaps but also seemingly more complex.
     
Eug  (op)
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Oct 27, 2011, 12:18 AM
 
Originally Posted by SierraDragon View Post
More elegant perhaps but also seemingly more complex.
Why do you say that?
     
Moderator
Join Date: May 2001
Location: Hilbert space
Status: Offline
Reply With Quote
Oct 27, 2011, 04:09 AM
 
Originally Posted by SierraDragon View Post
More elegant perhaps but also seemingly more complex.
It's not more complex, it's actually less complex since an A7 core takes up a lot less space than an A9 core.
I don't suffer from insanity, I enjoy every minute of it.
     
Posting Junkie
Join Date: Oct 2005
Location: Houston, TX
Status: Offline
Reply With Quote
Oct 27, 2011, 03:13 PM
 
Originally Posted by Eug View Post
A15 + A7 does seem like a more elegant solution though, as compared to A9 + slower A9.
It's a different tradeoff. The companion A9 can do a lot more than the A7 can do before it has to switch on the quad A9.

Note we're also talking about different performance levels: A7 + dual A15 vs A9 + quad A9. The ratio of computing performance between one A7 and two A15 may be about the same as between one A9 and four A9.
     
Eug  (op)
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Oct 27, 2011, 09:47 PM
 
Originally Posted by mduell View Post
It's a different tradeoff. The companion A9 can do a lot more than the A7 can do before it has to switch on the quad A9.
I'm not so sure that's true. The companion A9 in Kal-El maxes out at 500 MHz. At 2.5 DMIPS/MHz that's about 1250 DMIPs.

A7 will max out at 1.5 GHz. At 1.9 DMIPS/MHz that's 2850 DMIPS. Thus, it's not surprising that ARM hopes to sell single-core A7 chips for lower end smartphones. Such a smartphone could be faster than the iPhone 4.

A15 will max out at about 2.5 GHz. At 3.5 DMIPS/MHz that's 8759 DMIPS.

So even if the A7 in such an A7/A15 config were running at half the clock speed of its max, it could still hit 1400 DMIPS, vs. the max speed of 1250 DMIPS of the 500 MHz companion A9. Yes, A7 has some limitations that will impact on speed, but such hybrid A7/A15 chips may have two A7s, as opposed to just one companion A9 in Kal-El.

Yeah, we're talking tech from late 2011 / early 2012 (Kal-El) vs. tech from 2013 (A7/A15), but nonetheless the latter may be significantly faster even in low power mode.
(Last edited by Eug; Oct 27, 2011 at 09:55 PM. )
     
P
Moderator
Join Date: Apr 2000
Location: Gothenburg, Sweden
Status: Offline
Reply With Quote
Oct 29, 2011, 03:49 AM
 
ARM makes core designs, which are then given to everyone and their brother to make chips out of at one of several foundries - all of which have had serious problems shrinking beyond 40 nm. Predicted clockspeeds at 28nm come from the same place as sales predictions, and if we have a doctor with a flashlight in the audience, I could show you where.

Time will tell, I suppose, but a single core A7 is likely to be on the slow side compared to a threadshrunk A9 or something. An in-order design that becomes single issue as soon as you start sending something other than plain integer code does not seem like good match. It does have one advantage over the A8, though - it has the cache coherency code to be run in a dual-core setup. 2 A7 cores will be tiny, and they can fire up the second core when the going gets tough.
The low-end Mac Pro is the most overpriced Mac since the IIvx
     
Eug  (op)
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Oct 29, 2011, 12:02 PM
 
Originally Posted by P View Post
ARM makes core designs, which are then given to everyone and their brother to make chips out of at one of several foundries - all of which have had serious problems shrinking beyond 40 nm.
TSMC's 28nm Technology Now in Volume Production

Monday Taiwan Semiconductor Manufacturing Company (TSMC), the world’s largest dedicated semiconductor foundry, said that its 28-nm process is now in volume production, and that production wafers have already been shipped to customers.

According to the company, the new process includes 28-nm High Performance (28HP), 28-nm High Performance Low Power (28HPL) and 28-nm Low Power (28LP) which are in volume production now. It also includes 28-nm High Performance Mobile Computing (28HPM) which will be ready for production by the end of this year.

Monday TSMC said that the number of customer 28-nm production tape outs has more than doubled as compared with that of 40-nm, residing at more than 80 tape-outs so far. "The TSMC 28-nm process has surpassed the previous generation’s production ramps and product yield at the same point in time due to closer and earlier collaboration with customers," the company stated in a press release.
     
P
Moderator
Join Date: Apr 2000
Location: Gothenburg, Sweden
Status: Offline
Reply With Quote
Oct 30, 2011, 09:52 AM
 
Yes, I know. Note the quote from AMD - the first product on it will be the Radeon 7000 series (and isn't that numbering great, by the way - there already was a Radeon 7000 series from before the reset). Doesn't change the fact that everyone that is not Intel is having problems.
  • The Radeon 6000 series was supposed to be made on 32nm TSMC before that node was so delayed that it was moved back to 40nm. The entire 32nm TSMC node was eventually cancelled.
  • The 32nm GF yields on Llano are so low that they're actually supply-constrained for that lukewarm chip
  • The 32nm Power7+ is still MIA, and IBM keeps going on 45nm
  • Bulldozer is far below frequency targets

All signs point to serious problems getting below 40nm. Hopefully TSMC learned something, and GF did at least launch on 32nm eventually, but don't sell that GHz skin before the bear has been shot.
The low-end Mac Pro is the most overpriced Mac since the IIvx
     
Eug  (op)
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Oct 30, 2011, 11:02 AM
 
Everyone always has problems. And then they get sorted out. It looks like it's been several months' delay, but there's enough time to get products out for 2012-2013.
     
P
Moderator
Join Date: Apr 2000
Location: Gothenburg, Sweden
Status: Offline
Reply With Quote
Oct 30, 2011, 05:56 PM
 
The problems are bigger this time than ever before. There is exactly one processor project that has launched at 32nm at intended performance without some sort of delay or reduction in scope, and that is Sandy Bridge. The 32nm Arrandale/Clarkdale was a tiny CPU die (81mm2) with the IMC/graphics on a separate 45nm die - not the original plan - and idle power worse than 45nm Penryn. Llano and Bulldozer were significantly delayed, and at least BD is way below frequency targets. The 32nm Lynnfield/Clarksfield successors were cancelled, and so was the Radeon 6x00 die shrink. I think nVidia also had plans to shrink for the 5x0 generation. Gulftown gained two measly clockspeed bins over Bloomfield at 4 cores (one bin at 6 cores), and the L3 cache was actually way slower. There is a reason that the A5 is a huge 45nm SOC.
The low-end Mac Pro is the most overpriced Mac since the IIvx
     
Eug  (op)
Clinically Insane
Join Date: Dec 2000
Location: Caught in a web of deceit.
Status: Offline
Reply With Quote
Oct 30, 2011, 10:16 PM
 
I would assume the A5 taped out in 2010.
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 02:07 PM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2