There exist several scheduling schemes for parallelizing loops without dependences for shared and distributed memory systems. However, efficiently parallelizing loops with dependences is a more complicated task. This becomes even more difficult when the loops are executed on a distributed memory cluster where communication and synchronization can be a bottleneck. The problem lies in the processor idle time which occurs during the beginning and final stages of the execution. In this paper we propose a new scheduling scheme that minimizes the processor idle time and thus it enhances load balancing and performance. The new scheme is applied to two-dimensional iteration spaces with dependences. The proposed scheduling scheme follows a tiled wavefront pattern in which the tile size gradually decreases in all dimensions. We have tested the proposed scheme on a dedicated and homogeneous cluster of workstations and we verified that it significantly improves execution times over scheduling using traditional tiling.