Individual components of a parallel system perform system maintenance operations during the times that the components are idle waiting for synchronization with other components. When all applicable components reach synchronization, further performance of system maintenance is suspended until the component is again idle at another synchronization point. Preferably, the component is a node having at least one processor and a nodal memory in a multi-node system. A system maintenance operation is preferably an interruptible and resumable diagnostic, such as a memory check. Although the amount of time allotted to system maintenance varies by component, over many synchronization points the total times in each node are sufficient for the maintenance operation.
STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT
 This invention was made with Government support under Contract No. B519700 awarded by the Department of Energy. The Government has certain rights in this invention.