Welcome to the MacNN Forums.

If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

You are here: MacNN Forums > Community > MacNN Lounge > Details of Mars rover DOS glitch (high geek factor)

Details of Mars rover DOS glitch (high geek factor)
Thread Tools
Spliff
Mac Elite
Join Date: Feb 2001
Location: Canaduh
Status: Offline
Reply With Quote
Aug 24, 2004, 01:16 AM
 
Those of you who are programmers will probably appreciate this more than the rest of us.

NASA: DOS Glitch Nearly Killed Mars Rover

August 23, 2004
NASA: DOS Glitch Nearly Killed Mars Rover
By_Mark Hachman

STANFORD, CALIF. -- A software glitch that paralyzed the Mars "Spirit" rover earlier this year was caused by an unanticipated characteristic of a DOS file system, a NASA scientist said Monday.

The flaw, since fixed, was only discovered after days of agonizingly slow tests complicated by the limited "windows" of communication allowed by the rotation of Mars, said Robert Denise, a member of the Flight Software Development Team at NASA's Jet Propulsion Laboratory.

On Jan. 21, the Spirit rover stopped communicating with the teams on Earth, beginning a cycle where the rover would reboot itself, over and over. After days of tests, the team finally discovered on Jan. 26 that the issue was tied to what was originally reported as corruption inside the rover's onboard flash memory.

In a presentation at the Hot Chips conference here, Denise said that the real issue was an embedded DOS file system whose directory structure kept growing and growing. When the rover's embedded operating system then told the flash memory to mirror the data structure in RAM, the unexpectedly large file caused a fatal error and an almost continuous reboot cycle, he said.

Aside from the flash memory error, the recent voyages of Spirit and Opportunity have gone far better than expected. The mission was originally funded to last 90 sols, the equivalent of 90 Mars days, and come to an end last April. (One sol equals 24.65 hours.) Since both rovers have managed to stay "alive" far longer than anticipated, Denise said, the current funding will run out on Sept. 13, the beginning of the "solar conjunction," when Mars disappears behind the Sun and out of radio range. The lifespan of both rovers is really not known, he said.

On Sol 18, the mood among the JPL ground team was nothing short of "euphoric," Denise said. "Life was good," he said. "And then we missed a comms pass," a window in which the JPL team and the rover were supposed to exchange information.

The team didn't worry, at least initially. The team rechecked that its instruments were calibrated, and awaited the next pass a few hours later. Over the next few days, however, nothing went right, Denise said. The team determined the rover was functional; it could emit a status "beep", proving it was online. Other passes, however, generated just pseudorandom noise, indicative that the rover was online, functioning, but that no data was passing through the antenna. The rover, meanwhile, was rebooting hundreds of times a day.

The problem, Denise said, was in the file system the rover used. In DOS, a directory structure is actually stored as a file. As that directory tree grows, the directory file grows, as well. The Achilles' heel, Denise said, was that deleting files from the directory tree does not reduce the size of the directory file. Instead, deleted files are represented within the directory by special characters, which tell the OS that the files can be replaced with new data.

By itself, the cancerous file might not have been an issue. Combined with a "feature" of a third-party piece of software used by the onboard Wind River embedded OS, however, the glitch proved nearly fatal.

According to Denise, the Spirit rover contains 256 Mbytes of flash memory, a nonvolatile memory that can be written and rewritten thousands of times. The rover also contains 128 Mbytes of DRAM, 96 Mbytes of which are used for data, such as buffering image files in preparation for transmitting them to Earth. The other 32 Mbytes are used for code storage. An additional 11 Mbytes of EEPROM memory are used for additional program code storage.

The undisclosed software vendor required that data stored in flash memory be mirrored in RAM. Since the rover's flash memory was twice the size of the system RAM, a crash was almost inevitable, Denise said.

Moving an actuator, for example, generates a large number of tiny data files. After the rover rebooted, the OSes heap memory would be a hair's breadth away from a crash, as the system RAM would be nearly full, Denise said. Adding another data file would generate a memory allocation command to a nonexistent memory address, prompting a fatal error.

Dynamic allocation of memory is considered a no-no in embedded systems, precisely because of the possibility of a system crash, attendees said. Denise acknowledged that JPL's tests only allowed for the addition of a small number of data files, and that the exception slipped by. "We made an exception and got bit by it," he admitted.

The team finally got the rover up and running by essentially using the system RAM as simulated flash, discovered the error, and disabled the dynamic allocation feature, Denise said. The flash memory was erased, and the JPL engineers installed a utility that monitors the file system, and treats the memory heap as a consumable resources.

Denise's keynote address to the Hot Chips audience lasted about an hour, twenty minutes or so dedicated to the flash-memory issue. At the end, he summed up the issue for the small percentage of the audience who weren't engineers: "The Spirit was the willing, but the flash was weak."
( Last edited by Spliff; Aug 24, 2004 at 01:57 AM. )
     
Disgruntled Head of C-3PO
Professional Poster
Join Date: Jul 2001
Location: In bits and pieces on Cloud City
Status: Offline
Reply With Quote
Aug 24, 2004, 01:37 AM
 
Originally posted by Spliff:
Those of you who are programmers will probably appreciate this more than the rest of us.

NASA: DOS Glitch Nearly Killed Mars Rover
I think it killed that website also
"Curse my metal body, I wasn't fast enough!"
     
Spliff  (op)
Mac Elite
Join Date: Feb 2001
Location: Canaduh
Status: Offline
Reply With Quote
Aug 24, 2004, 01:57 AM
 
Originally posted by Disgruntled Head of C-3PO:
I think it killed that website also
Just in case it disappears again, I've posted it here.
     
Eriamjh
Addicted to MacNN
Join Date: Oct 2001
Location: BFE
Status: Offline
Reply With Quote
Aug 24, 2004, 07:22 AM
 
It doesn't say it was microsoft DOS.

I'm a bird. I am the 1% (of pets).
     
voodoo
Posting Junkie
Join Date: Mar 2001
Location: Salamanca, EspaƱa
Status: Offline
Reply With Quote
Aug 24, 2004, 08:58 AM
 
Originally posted by Eriamjh:
It doesn't say it was microsoft DOS.
NASA doesn't use MS DOS for mission critical things. DOS in this case is just the OS. Home made NASA thing.
I could take Sean Connery in a fight... I could definitely take him.
     
   
 
Forum Links
Forum Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts
BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Top
Privacy Policy
All times are GMT -4. The time now is 02:20 PM.
All contents of these forums © 1995-2017 MacNN. All rights reserved.
Branding + Design: www.gesamtbild.com
vBulletin v.3.8.8 © 2000-2017, Jelsoft Enterprises Ltd.,