Skip to Content
Find More Like This
Return to Search

Fault tolerance in a supercomputer through dynamic repartitioning

United States Patent

February 27, 2007
View the Complete Patent at the US Patent & Trademark Office
Lawrence Livermore National Laboratory - Visit the Industrial Partnerships Office Website
A multiprocessor, parallel computer is made tolerant to hardware failures by providing extra groups of redundant standby processors and by designing the system so that these extra groups of processors can be swapped with any group which experiences a hardware failure. This swapping can be under software control, thereby permitting the entire computer to sustain a hardware failure but, after swapping in the standby processors, to still appear to software as a pristine, fully functioning system.
Chen; Dong (Croton On Hudson, NY), Coteus; Paul W. (Yorktown Heights, NY), Gara; Alan G. (Mount Kisco, NY), Takken; Todd E. (Mount Kisco, NY)
International Business Machines Corporation (Armonk, NY)
10/ 469,002
February 25, 2002
This invention was made with Government support under subcontract number B517552 under prime contract number W-7405-ENG-48 awarded by the Department of Energy. The Government has certain rights in this invention.