Collective I/O

Next: Data prefetching Up: Optimization Techniques Previous: Data sieving

Collective I/O

In many parallel applications, despite the fact that each process may need to access several noncontiguous portions of a file, the requests of different processes are often interleaved and may together span large contiguous portions of the file. Thus, I/O performance can be improved significantly by merging the requests of different processes and servicing the merged request. Such optimization is broadly referred to as collective I/O and can be performed at the disk level (disk-directed I/O ), at the server level (server-directed I/O ), or at the client level (two-phase I/O ).

Two phase I/O is also used in the ROMIO implementation . If the entire I/O access pattern of all processes is known, the data can be accessed efficiently by splitting the access into two phases. In the first phase, processes access data assuming a distribution in memory that results in each process making a single, large, contiguous access. In the second phase, processes redistribute data among themselves to the desired distribution.

The advantage of this method is that by making all file accesses large and contiguous, the I/O time is reduced significantly. The added cost of interprocess communication for redistribution is small compared with the savings in I/O time.

The algorithm for collective writes is similar to that for collective reads, except that the first phase of the two-phase operation is communication and the second phase is I/O.

Next: Data prefetching Up: Optimization Techniques Previous: Data sieving
Created by Katarzyna Zając