Skip to Content
Find More Like This
Return to Search

Methods and apparatus using commutative error detection values for fault isolation in multiple node computers

United States Patent

June 3, 2008
View the Complete Patent at the US Patent & Trademark Office
Lawrence Livermore National Laboratory - Visit the Industrial Partnerships Office Website
Methods and apparatus perform fault isolation in multiple node computing systems using commutative error detection values for--example, checksums--to identify and to isolate faulty nodes. When information associated with a reproducible portion of a computer program is injected into a network by a node, a commutative error detection value is calculated. At intervals, node fault detection apparatus associated with the multiple node computer system retrieve commutative error detection values associated with the node and stores them in memory. When the computer program is executed again by the multiple node computer system, new commutative error detection values are created and stored in memory. The node fault detection apparatus identifies faulty nodes by comparing commutative error detection values associated with reproducible portions of the application program generated by a particular node from different runs of the application program. Differences in values indicate a possible faulty node.
Almasi; Gheorghe (Ardsley, NY), Blumrich; Matthias Augustin (Ridgefield, CT), Chen; Dong (Croton-On-Hudson, NY), Coteus; Paul (Yorktown, NY), Gara; Alan (Mount Kisco, NY), Giampapa; Mark E. (Irvington, NY), Heidelberger; Philip (Cortlandt Manor, NY), Hoenicke; Dirk I. (Ossining, NY), Singh; Sarabjeet (Mississauga, CA), Steinmacher-Burow; Burkhard D. (Wernau, DE), Takken; Todd (Brewster, NY), Vranas; Pavlos (Bedford Hills, NY)
International Business Machines Corporation (Armonk, NY)
11/ 106,069
April 14, 2005
STATEMENT OF GOVERNMENT RIGHTS The invention was made in part under Contract No. W-7405-ENG-48, Subcontract No. B517552, U.S. Department of Energy. Accordingly, the Government has certain rights in this invention.