Skip to Content
Find More Like This
Return to Search

Identifying failure in a tree network of a parallel computer

United States Patent

7,783,933
August 24, 2010
View the Complete Patent at the US Patent & Trademark Office
Methods, parallel computers, and products are provided for identifying failure in a tree network of a parallel computer. The parallel computer includes one or more processing sets including an I/O node and a plurality of compute nodes. For each processing set embodiments include selecting a set of test compute nodes, the test compute nodes being a subset of the compute nodes of the processing set; measuring the performance of the I/O node of the processing set; measuring the performance of the selected set of test compute nodes; calculating a current test value in dependence upon the measured performance of the I/O node of the processing set, the measured performance of the set of test compute nodes, and a predetermined value for I/O node performance; and comparing the current test value with a predetermined tree performance threshold. If the current test value is below the predetermined tree performance threshold, embodiments include selecting another set of test compute nodes. If the current test value is not below the predetermined tree performance threshold, embodiments include selecting from the test compute nodes one or more potential problem nodes and testing individually potential problem nodes and links to potential problem nodes.
Archer; Charles J. (Rochester, MN), Pinnow; Kurt W. (Rochester, MN), Wallenfelt; Brian P. (Eden Prairie, MN)
International Business Machines Corporation (Armonk, NY)
11/ 531,787
20080072101
September 14, 2006
GOVERNMENT RIGHTS IN INVENTION This invention was made with Government support under Contract No. B519700 awarded by the Department of Energy. The Government has certain rights in this invention.