Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Software - Troubleshooting and Discussion > macOS > Is this a good idea? (Making bad RAM useful)

View Poll Results: Is software fault tolerant RAM a good idea?
Poll Options:
Yes 4 votes (66.67%)
No 2 votes (33.33%)
What the heck are you talking about? 0 votes (0%)
Voters: 6. You may not vote on this poll
Is this a good idea? (Making bad RAM useful)
Thread Tools
Metzen
Mac Enthusiast
Join Date: Sep 2001
Location: Edmonton, Alberta
Status: Offline
Reply With Quote
Mar 10, 2003, 07:02 AM
 
I've been having an exchange of ideas with Apple's Darwin Dev team on this... I'm just trying to figure out if it's truly worth while. I've been told that it's just not a good use of engineering resources but I feel it is. So... Without further ado, my argument:

Imagine a system that could map out bad memory addresses so the system would avoid using them making your system stable without the intermittent and random crashes that bad memory addresses may cause. RAM problems are difficult to troubleshoot and diagnose because the seem to happen at random and times with no cause nor reason leading Macintosh computer's to seem "poorly constructed" or "unstable". There is a way to fix these errors. At present, the solution is available, but for Linux!

http://rick.vanrein.org/linux/badram/index.html

Quote from the link:

quote:
My objective is to patch the Linux kernel in such a way that it can handle defective RAM modules. With defective RAM, I mean RAM which has some bits wrong at some (known) addresses. Normally, such RAM is considered useless and thrown away; the larger RAMs get, the higher the chances of failing addresses. With ever growing RAM sizes, it would therefore be pleasant to have an alternative to discarding of defective RAM chips.

How to do it?

The technology behind this idea relies on the memory allocation approach inside the Linux kernel, as well as the memory swapping mechanisms. The kernel distinguishes kernel allocated memory from user allocated memmory, by never swapping kernel memory out of RAM. Furthermore, it is possible (as needed for some hardware boards) to allocate fixed phycical address to a kernel process. I want to exploit this to allocate precisely the defective parts of RAM before they are made available to anyone else. By allocating them for the kernel, and never freeing them, the RAM at that part of memory is effectively disabled. Furthermore, this need not be done a memory page at a time, since the kernel allocates blocks of 4, 8, 16, 32, ... bytes each to itself, making it possible to enclose a defective address quite closely. A memory module with one bit wrong would perhaps miss 4 bytes out of 128 MB. Would it not be a waste to throw away such a RAM module?
As well, other arguments I pose forth are it will ease the support personnel on the other end if they can run a hardware test (perhaps off of Apple's hardware support CD) that will create a text file of bad memory addresses that will be loaded by a ktext on startup if it detects this textfiles presence. Though to a poweruser this may seem silly and that you should just purchase a new memory module I put forth this now as a solution to many of tech-supports on going frustrations as a regular end-user will not know how to troubleshoot RAM let alone know which module is causing them undue troubles, but by running a simple test off a CD they will be able to get there systems running 100% again (trust me, this is preferable to "ok, what I want you to do is remove all the RAM sticks but one... Yes ma'm, you'll have to open the iMac again... Yes, you'll need to stick your fingers in there... No, no, no, don't worry... Yes, remove them till you have one in there and then use your computer for a few days and tell me if you have trouble... Yes, yes, I understand that, if you still have trouble, odds are it's not the RAM... Thank you ma'm. Thanks for calling *click*")

Anyways, yeah, I'm pretty tired so I'll just repost my argument here:

I think other issues here that would be benefitical to Apple would be customer appreciation as well. As it is, I brought this up because the B&W G3 we have here has one bad memory address (I'm guessing about this, could be more) because it would kernel panic whenever it hit the "faulty" address.

The thing that is odd about this is the RAM that went faulty is the OEM RAM that came with the machine, so as a pro-user I've purchased more RAM that the system has been able to run off of reliably --sort of. Mac OS X boots and works fine till it pages down to a point where it hits that bad address (usually when top reports 14-30MB free remaining of 768MB).

Now this bug has been bothering us for some 10-12 months but because it's intermittent crashes we've never been able to peg it down with even kernel logs pointing to various places. Finally, running one massive application that used all the RAM gave us a much more accurate kernel log that pointed specifically to the RAM (and the bad address).

Now, this has been implemented in the Linux kernel successfully it seems (though I don't know the specs of the system, I doubt they have ECC RAM or other fancy error checking schemes), with a mapping mechanism that is successfull and does not hinder performance in any way (the link provided earilier provides benchmarks as well). This would be a boon for Mac users to get something like this in the mach_kernel and a benefit to end users who don't know why there computers crash at supposedly random times, except that they are using "Macintoshes."

I definitely see the goods of this far outweighing the bads (which currently seem to be that you could buy replacement RAM if *you know for sure* that it is RAM. Besides, not all of us live in the USA where RAM is dirt cheap. Some people live in Canada where things are 1.5x more or Austrailia where things cost 2x more ;-)

-----------

As has been almost-said, the average (therefore by definition, a member of the majority) customer will not even be aware of the concept of bad ram, let alone how to diagnose it. Nor will they be willing to spend any money replacing it. If they do do anything about the problem, they'll send it back to Apple and demand it "be fixed", whatever that entails.

And given the nature of memory problems (they don't normally just 'crop up' overnight), there would an open-ended time period for replacement or refund, seeing as they would be considered damaged when sold. I know if I had any Mac with faulty memory (that came with it), I'd send it back to Apple for repair, no matter how old. I wouldn't even think about replacing it out of my own pocket, especially given - as has been kindly noted - that prices in Australia (among other places) are often twice that of the US*.

But I daresay if I lost a few pages I wouldn't even notice, let alone care.

It seems the subject system has already been implemented in Linux. Now, aside from possible license issues - which I'm certain can be gotten around one way or another - there seems little barrier to adopting this code. I make no presumptions that it wouldn't take someone a week or so to do, but that seems like a small cost to me.

* = having said that, a 512 PC133 dimm costs about $60AU here, which is less than $30US. My brief web browsing indicates that's much cheaper than in the US (at present). Go figure.

*NAMES HAVE BEEN REMOVED*
Any intelligent fool can make things bigger, more complex, and more violent. It takes a touch of genius -- and a lot of courage -- to move in the opposite direction.
E. F. Schumacher
     
Gul Banana
Mac Elite
Join Date: May 2002
Status: Offline
Reply With Quote
Mar 10, 2003, 07:42 AM
 
It sounds to me like it would be worth the effort, especially since there would be no impact on performance.
[vash:~] banana% killall killall
Terminated
     
Detrius
Professional Poster
Join Date: Apr 2001
Location: Asheville, NC
Status: Offline
Reply With Quote
Mar 10, 2003, 12:29 PM
 
The difficulty with this idea is that it may be rather difficult to figure out exactly what portion of the ram is flakey. It took you a year to figure out where the problem is. A simple program on the Apple Support CD is NOT going to be able to find these intermittent problems. The difficulty is that sometimes the RAM works, and sometimes it does not.

Another difficulty is that if the RAM is changed, you lose this record. It wouldn't be possible to keep track of which RAM modules has the bad memory addresses. It's different from keeping up with bad blocks on a hard drive. The hard drive stores what you need. The memory module does not.

I do, however, believe that if this could be 100% automated, it would be a good thing to add to a system to increase stability over time.
ACSA 10.4/10.3, ACTC 10.3, ACHDS 10.3
     
   
Thread Tools
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 08:00 PM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,