Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Community > Team MacNN > shared memory/Unrecoverable error when running multiple projects

shared memory/Unrecoverable error when running multiple projects
Thread Tools
Dedicated MacNNer
Join Date: Sep 2004
Location: London
Status: Offline
Reply With Quote
May 29, 2006, 04:23 AM
 
If you are running multiple projects on mac OSX and you get this message:-

2006-05-22 23:30:28 [Einstein@Home] Can't create shared memory: system shmat
2006-05-22 23:30:28 [Einstein@Home] Unrecoverable error for result r1_0286.0__1381_S4R2a_0 (Couldn't start or resume: -146)
2006-05-22 23:30:28 [Einstein@Home] Deferring scheduler requests for 1 minutes and 0 seconds
2006-05-22 23:30:28 [---] Rescheduling CPU: start failed

Exit BOINC then change your project preferences to the setting 'Leave applications in memory while preempted= No' (was 'yes'). If you continue to get the mesage thennrebooy your computer.

You can read an examination of this problem HERE

K.
     
Dedicated MacNNer
Join Date: Nov 2004
Location: Virginia
Status: Offline
Reply With Quote
May 29, 2006, 11:03 AM
 
The link you posted requires login to an account - would you give us a summary of the problem here, please? I normally run with leave-in-memory set to yes, from when I was running climate prediction.

beadman
     
Dedicated MacNNer
Join Date: Sep 2004
Location: London
Status: Offline
Reply With Quote
May 29, 2006, 01:41 PM
 
OK - Pirates is a closed ship at the moment I guess, so here is a summery. This is happening on my Quad

I discovered that when running two or more projects - one or two projects on their own seem to run ok - with the project preferences set to 'Leave applications in memory while preempted= yes, then these error messages started to occur for all the projects - I ran up to five projects. One project fails then other projects fail in turn.

On this occasion it started when I attached to Pirates as my third project. Not all wu's failed at the same time, it was progressive and some succeeded and some did not. Pirates were sending very short wu's, less than a minuet, so I detached from them thinking this might be the problem, but the problem continued.


2006-05-22 04:46:08 [Pirates@Home] Can't create shared memory: system shmget
2006-05-22 04:46:08 [Pirates@Home] Unrecoverable error for result wu_1148262635_451_0 (Couldn't start or resume: -144)
2006-05-22 04:46:08 [Pirates@Home] Can't create shared memory: system shmget
2006-05-22 04:46:08 [Pirates@Home] Unrecoverable error for result wu_1148262635_452_0 (Couldn't start or resume: -144)
2006-05-22 04:46:08 [Pirates@Home] Can't create shared memory: system shmget
2006-05-22 04:46:08 [Pirates@Home] Unrecoverable error for result wu_1148262635_453_0 (Couldn't start or resume: -144)
2006-05-22 04:46:08 [Pirates@Home] Can't create shared memory: system shmget
2006-05-22 04:46:08 [Pirates@Home] Unrecoverable error for result wu_1148262635_454_0 (Couldn't start or resume: -144)
2006-05-22 04:46:08 [Pirates@Home] Unexpected state 7 for task wu_1148262635_451_0
2006-05-22 04:46:08 [Pirates@Home] Unexpected state 7 for task wu_1148262635_452_0
2006-05-22 04:46:08 [Pirates@Home] Unexpected state 7 for task wu_1148262635_453_0
2006-05-22 04:46:08 [Pirates@Home] Unexpected state 7 for task wu_1148262635_454_0

The Captain of Pirates has taken an interest and working with him, we determined that I had plenty of RAM and Virtual memory and it occured to me to check my project preference setting for memory so I switched the project preferences (for all projects) to 'Leave applications in memory while preempted= No'. Closed BOINC and re-booted the machine. No more error messages. Changing preference back to = 'Yes' and the error messages slowly resumed.

There was a little side trip investigating 'Slots' and John McLeod has kindly explained to me how slots work. I copy it here for the sake of compleatness and general edification.

Slots are created and used as needed.

If you are attached to say 30 projects, but only have at most work from 5
of them n hand at any time, and you never have more than 1 result running
from each project, then you will have 5 slots.

On the other hand, if you have a project with mixed deadlines, it may
start a result with a distant deadline, and then pre-empt that one to
start one with a near deadline. So even on a single CPU system, it is
possible to have 2 slots for a single project.

It is also possible to have an extra slot if it cannot be cleaned up for
some reason (BURP was infamous for this problem).

In your case, what it means:

You have never had more than 16 results running or pre-empted on your
system at the same time. You currently have 15. Rosetta has used 2 more
slots than there are CPUs. Either they could not be cleaned up, or it has
had to do some pre-emption to meet deadlines. None of the others really
needs much explaining.

jm7
The Captain, a knowledgable fellow it has to be said, suggested that

Each task uses a small bit of shared memory, and in Unix there is a kernel parameter which limits the total amount of shared memory which may be used. The Activity Monitor on Mac is very useful, but it does not show this limit (as best I can tell). To see what the limit is on your machine you need to open a Unix command shell (the Terminal app on a Mac) and give the command

sysctl -A | grep shmmax

On both Tiger and Panther machines I found a limit of only about 4MB. On two Linux boxes (Fedora Core 3 and 4) I found it was more like 32MB.

So even though you have lots of main memory available, I can see how you might hit the limit on shared memory, if it's only 4MB on a Mac and you have 16 tasks running or suspended but still in memory.
At the end of the results produced running this process in Terminal, I had :-

kern.sysv.shmmax: 4194304

This is the size of the shared memory.

I looked around the web for references to Shared Memory on Mac os and found two of interest.

I noted this

"The disadvantage of shared memory is that it is very fragile. When a data structure in a shared memory region becomes corrupt, all processes that refer to the data structure are affected."

This page also makes the same point.

"Shared memory is fragile. If one program corrupts a section of shared memory, any programs that also use that memory share the corrupted data."

The Captain thinks that there is a way to increase the size of the shared memory and is investigating as, due to a bug, a workaround is required.

That's it. I will let you know of any further developments.

If anyone has the time and wants to play, it would be interesting to know if they can reproduce this problem on their machine; or is it just me.

K.
     
Dedicated MacNNer
Join Date: Sep 2004
Location: London
Status: Offline
Reply With Quote
May 29, 2006, 09:15 PM
 
PRESS GANG NOTICE

Pirates@home is currently open to anyone wanting to create an account with them. A great place for a life on the open sea and to buckle your swash.

K.
     
Administrator
Join Date: Jun 2000
Location: California
Status: Offline
Reply With Quote
May 30, 2006, 12:10 AM
 
I thought Pirates@home was a beta test of Einstein@home. Didn't Pirates shut down when Einstein was ready?
     
Dedicated MacNNer
Join Date: Sep 2004
Location: London
Status: Offline
Reply With Quote
May 30, 2006, 04:43 AM
 
Originally Posted by reader50
I thought Pirates@home was a beta test of Einstein@home. Didn't Pirates shut down when Einstein was ready?
Our current goal is to test and possibly modify the BOINC forum code for use by an unrelated (to BOINC, anyway) project called Interactions in Understanding the Universe (I2U2). We welcome anybody who wants to help out with this project, either by running workunits (when there are any) or trying out the forum and server software. In any case, it is important that you always keep in mind that we are not actually doing any production computing for that or any other scientific project. (At least not yet.)
The url for Pirates is http://pirates.spy-hill.net/

Tip: Git yer Pirate name a'sorted afore ye sign on me buck's.

K.
     
Dedicated MacNNer
Join Date: Sep 2004
Location: London
Status: Offline
Reply With Quote
Jun 3, 2006, 03:11 PM
 
I am very happy to report that my shared memory problem has now been fixed.

With help of Eric Myers/Wormholio at Pirates@home I was able to alter the amount of shared memory on the Mac OSX from about 4 mb to about 16 mb.

This has made an big difference and I have now run six (6) projects for over 24 hours with out any problem. In addition my Mac seems to be running much quieter, less laboured.

For those interested in doing the same, Eric has put up a page explaining how to do it HERE You have to know a bit of basic stuff about your Mac and how to use Terminal.

If, like me, you grade you knowledge of the inner mac at 1/100 then I stongly suggest you join pirates and read my thread discribing how I accomplished this amazing feat. 'Message boards : Pirate Applications : Error msg when running on Mac' ( no direct linking;you will have to join) With Erics guidance I was able to do it so I am sure you can to. Read the whole thread first, read the instruction page first, then read it again, and you should be able to do it. All you need to know is there.

Should you choose to accept this mission...... Yes

K.
     
   
Thread Tools
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On
Top
Privacy Policy
All times are GMT -5. The time now is 02:04 PM.
All contents of these forums © 1995-2011 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.7 © 2000-2011, Jelsoft Enterprises Ltd., Content Relevant URLs by vBSEO 3.3.2