A stream prefetch engine performs data retrieval in a parallel computing system. The engine receives a load request from at least one processor. The engine evaluates whether a first memory address requested in the load request is present and valid in a table. The engine checks whether there exists valid data corresponding to the first memory address in an array if the first memory address is present and valid in the table. The engine increments a prefetching depth of a first stream that the first memory address belongs to and fetching a cache line associated with the first memory address from the at least one cache memory device if there is not yet valid data corresponding to the first memory address in the array. The engine determines whether prefetching of additional data is needed for the first stream within its prefetching depth. The engine prefetches the additional data if the prefetching is needed.