Abstract:
Spacecraft processors and memory are subjected to high radiation doses and therefore employ radiation-hardened components. However, these components are orders of magnitude more expensive than typical desktop components, and they lag years behind in terms of speed and size. We have integrated algorithm-based fault tolerance (ABFT) methods into onboard data analysis algorithms to detect radiation-induced errors, which ultimately may permit the use of spacecraft memory that need not be fully hardened, reducing cost and increasing capability at the same time. We have also developed a lightweight software radiation simulator, BITFLIPS, that permits evaluation of error detection strategies in a controlled fashion, including the specification of the radiation rate and selective exposure of individual data structures. Using BITFLIPS, we evaluated our error detection methods when using a support vector machine to analyze data collected by the Mars Odyssey spacecraft. We found ABFT error detection for matrix multiplication is very successful, while error detection for Gaussian kernel computation still has room for improvement.