Skip to Content
Find More Like This
Return to Search

Link failure detection in a parallel computer

United States Patent

7,831,866
November 9, 2010
View the Complete Patent at the US Patent & Trademark Office
Methods, apparatus, and products are disclosed for link failure detection in a parallel computer including compute nodes connected in a rectangular mesh network, each pair of adjacent compute nodes in the rectangular mesh network connected together using a pair of links, that includes: assigning each compute node to either a first group or a second group such that adjacent compute nodes in the rectangular mesh network are assigned to different groups; sending, by each of the compute nodes assigned to the first group, a first test message to each adjacent compute node assigned to the second group; determining, by each of the compute nodes assigned to the second group, whether the first test message was received from each adjacent compute node assigned to the first group; and notifying a user, by each of the compute nodes assigned to the second group, whether the first test message was received.
Archer; Charles J. (Rochester, MN), Blocksome; Michael A. (Rochester, MN), Megerian; Mark G. (Rochester, MN), Smith; Brian E. (Rochester, MN)
International Business Machines Corporation (Armonk, NY)
11/ 832,940
20090037773
August 2, 2007
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT This invention was made with Government support under Contract No. B554331 awarded by the Department of Energy. The Government has certain rights in this invention.