Skip to Content
Find More Like This
Return to Search

Local rollback for fault-tolerance in parallel computing systems

United States Patent

January 24, 2012
View the Complete Patent at the US Patent & Trademark Office
A control logic device performs a local rollback in a parallel super computing system. The super computing system includes at least one cache memory device. The control logic device determines a local rollback interval. The control logic device runs at least one instruction in the local rollback interval. The control logic device evaluates whether an unrecoverable condition occurs while running the at least one instruction during the local rollback interval. The control logic device checks whether an error occurs during the local rollback. The control logic device restarts the local rollback interval if the error occurs and the unrecoverable condition does not occur during the local rollback interval.
Blumrich; Matthias A. (Yorktown Heights, NY), Chen; Dong (Yorktown Heights, NY), Gara; Alan (Yorktown Heights, NY), Giampapa; Mark E. (Yorktown Heights, NY), Heidelberger; Philip (Yorktown Heights, NY), Ohmacht; Martin (Yorktown Heights, NY), Steinmacher-Burow; Burkhard (Boeblingen, DE), Sugavanam; Krishnan (Yorktown Heights, NY)
International Business Machines Corporation (Armonk, NY)
12/ 696,780
January 29, 2010
GOVERNMENT CONTRACT This invention was Government support under Contract No. B554331 awarded by Department of Energy. The Government has certain rights in this invention.