I've been having an exchange of ideas with Apple's Darwin Dev team on this... I'm just trying to figure out if it's truly worth while. I've been told that it's just not a good use of engineering resources but I feel it is. So... Without further ado, my argument:
Imagine a system that could map out bad memory addresses so the system would avoid using them making your system stable without the intermittent and random crashes that bad memory addresses may cause. RAM problems are difficult to troubleshoot and diagnose because the seem to happen at random and times with no cause nor reason leading Macintosh computer's to seem "poorly constructed" or "unstable". There is a way to fix these errors. At present, the solution is available, but for Linux!
Quote from the link:
My objective is to patch the Linux kernel in such a way that it can handle defective RAM modules. With defective RAM, I mean RAM which has some bits wrong at some (known) addresses. Normally, such RAM is considered useless and thrown away; the larger RAMs get, the higher the chances of failing addresses. With ever growing RAM sizes, it would therefore be pleasant to have an alternative to discarding of defective RAM chips.
How to do it?
The technology behind this idea relies on the memory allocation approach inside the Linux kernel, as well as the memory swapping mechanisms. The kernel distinguishes kernel allocated memory from user allocated memmory, by never swapping kernel memory out of RAM. Furthermore, it is possible (as needed for some hardware boards) to allocate fixed phycical address to a kernel process. I want to exploit this to allocate precisely the defective parts of RAM before they are made available to anyone else. By allocating them for the kernel, and never freeing them, the RAM at that part of memory is effectively disabled. Furthermore, this need not be done a memory page at a time, since the kernel allocates blocks of 4, 8, 16, 32, ... bytes each to itself, making it possible to enclose a defective address quite closely. A memory module with one bit wrong would perhaps miss 4 bytes out of 128 MB. Would it not be a waste to throw away such a RAM module?
As well, other arguments I pose forth are it will ease the support personnel on the other end if they can run a hardware test (perhaps off of Apple's hardware support CD) that will create a text file of bad memory addresses that will be loaded by a ktext on startup if it detects this textfiles presence. Though to a poweruser this may seem silly and that you should just purchase a new memory module I put forth this now as a solution to many of tech-supports on going frustrations as a regular end-user will not know how to troubleshoot RAM let alone know which module is causing them undue troubles, but by running a simple test off a CD they will be able to get there systems running 100% again (trust me, this is preferable to "ok, what I want you to do is remove all the RAM sticks but one... Yes ma'm, you'll have to open the iMac again... Yes, you'll need to stick your fingers in there... No, no, no, don't worry... Yes, remove them till you have one in there and then use your computer for a few days and tell me if you have trouble... Yes, yes, I understand that, if you still have trouble, odds are it's not the RAM... Thank you ma'm. Thanks for calling *click*")
Anyways, yeah, I'm pretty tired so I'll just repost my argument here:
I think other issues here that would be benefitical to Apple would be customer appreciation as well. As it is, I brought this up because the B&W G3 we have here has one bad memory address (I'm guessing about this, could be more) because it would kernel panic whenever it hit the "faulty" address.
The thing that is odd about this is the RAM that went faulty is the OEM RAM that came with the machine, so as a pro-user I've purchased more RAM that the system has been able to run off of reliably --sort of. Mac OS X boots and works fine till it pages down to a point where it hits that bad address (usually when top reports 14-30MB free remaining of 768MB).
Now this bug has been bothering us for some 10-12 months but because it's intermittent crashes we've never been able to peg it down with even kernel logs pointing to various places. Finally, running one massive application that used all the RAM gave us a much more accurate kernel log that pointed specifically to the RAM (and the bad address).
Now, this has been implemented in the Linux kernel successfully it seems (though I don't know the specs of the system, I doubt they have ECC RAM or other fancy error checking schemes), with a mapping mechanism that is successfull and does not hinder performance in any way (the link provided earilier provides benchmarks as well). This would be a boon for Mac users to get something like this in the mach_kernel and a benefit to end users who don't know why there computers crash at supposedly random times, except that they are using "Macintoshes."
I definitely see the goods of this far outweighing the bads (which currently seem to be that you could buy replacement RAM if *you know for sure* that it is RAM. Besides, not all of us live in the USA where RAM is dirt cheap. Some people live in Canada where things are 1.5x more or Austrailia where things cost 2x more ;-)
As has been almost-said, the average (therefore by definition, a member of the majority) customer will not even be aware of the concept of bad ram, let alone how to diagnose it. Nor will they be willing to spend any money replacing it. If they do do anything about the problem, they'll send it back to Apple and demand it "be fixed", whatever that entails.
And given the nature of memory problems (they don't normally just 'crop up' overnight), there would an open-ended time period for replacement or refund, seeing as they would be considered damaged when sold. I know if I had any Mac with faulty memory (that came with it), I'd send it back to Apple for repair, no matter how old. I wouldn't even think about replacing it out of my own pocket, especially given - as has been kindly noted - that prices in Australia (among other places) are often twice that of the US*.
But I daresay if I lost a few pages I wouldn't even notice, let alone care.
It seems the subject system has already been implemented in Linux. Now, aside from possible license issues - which I'm certain can be gotten around one way or another - there seems little barrier to adopting this code. I make no presumptions that it wouldn't take someone a week or so to do, but that seems like a small cost to me.
* = having said that, a 512 PC133 dimm costs about $60AU here, which is less than $30US. My brief web browsing indicates that's much cheaper than in the US (at present). Go figure.
*NAMES HAVE BEEN REMOVED*