Journal Article
ACM Transactions on Architecture and Code Optimization, vol. 17, iss. 3, pp. 1-27, 2020
Authors
Arnab Das, Sriram Krishnamoorthy, Ian Briggs, Ganesh Gopalakrishnan, Ramakrishna Tipireddy
Abstract
We present FPD
etect
, a low-overhead approach for detecting logical errors and soft errors affecting stencil computations without generating false positives. We develop an offline analysis that tightly estimates the number of floating-point bits preserved across stencil applications. This estimate rigorously bounds the values expected in the data space of the computation. Violations of this bound can be attributed with certainty to errors. FPD
etect
helps synthesize error detectors customized for user-specified levels of accuracy and coverage. FPD
etect
also enables overhead reduction techniques based on deploying these detectors coarsely in space and time. Experimental evaluations demonstrate the practicality of our approach.