Skip to Content
Find More Like This
Return to Search

Method and apparatus for analyzing error conditions in a massively parallel computer system by identifying anomalous nodes within a communicator set

United States Patent

April 19, 2011
View the Complete Patent at the US Patent & Trademark Office
An analytical mechanism for a massively parallel computer system automatically analyzes data retrieved from the system, and identifies nodes which exhibit anomalous behavior in comparison to their immediate neighbors. Preferably, anomalous behavior is determined by comparing call-return stack tracebacks for each node, grouping like nodes together, and identifying neighboring nodes which do not themselves belong to the group. A node, not itself in the group, having a large number of neighbors in the group, is a likely locality of error. The analyzer preferably presents this information to the user by sorting the neighbors according to number of adjoining members of the group.
Gooding; Thomas Michael (Rochester, MN)
International Business Machines Corporation (Armonk, NY)
11/ 425,773
June 22, 2006
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT This invention was made with Government support under Contract No. B591700 awarded by the Department of Energy. The Government has certain rights in this invention.