Skip to Content
Find More Like This
Return to Search


United States Patent Application

View the Complete Application at the US Patent & Trademark Office
Lawrence Berkeley National Laboratory - Visit the Technology Transfer and Intellectual Property Management Department Website
Systems, apparatuses, and methods for achieving balanced execution in a multi-node cluster through runtime detection of performance variation are described. During a training phase, performance counters and an amount of time spent waiting for synchronization is monitored for a plurality of tasks for each node of the multi-node cluster. These values are utilized to generate a model which correlates the values of the performance counters to the amount of time spent waiting for synchronization. Once the model is built, the values of the performance counters are monitored for a period of time at the start of each task, and these values are input into the model. The model generates a prediction of whether a given node is on the critical path. If the given node is predicted to be on the critical path, the power allocation of the given node is increased.
Kocoloski, Brian J. (Pittsburgh, PA), Piga, Leonardo (Austin, TX), Huang, Wei (Frisco, TX), Paul, Indrani (Round Rock, TX)
15/ 192,764
June 24, 2016
[0001] The invention described herein was made with government support under contract number DE-AC02-05CH11231 awarded by the United States Department of Energy. The United States Government has certain rights in the invention.