Minimizing completion time for loop tiling with computation and communication overlapping | doi.page