Collective Message Passing Primitives

These are primitives for communicating among all the processes of a communicator. Unlike point-to-point communication, both sender and receiver invoke the identical (with the possible exception of immediate versus blocking) MPI function call.

Basic Collective Message Passing APIs

1. Broadcast

int MPI_Bcast (void* buffer, int count, MPI_Datatype type,
               int sourceRank, MPI_Comm comm);

int MPI_Ibcast(void* buffer, int count, MPI_Datatype type,
               int sourceRank, MPI_Comm comm, MPI_Request* request);

Sender: In code for rank==sourceRank:

dtype buf[] = { ...data to be broadcast... };
MPI_Bcast(buf, count, type, sourceRank, comm);

Receiver: In code for rank≠sourceRank:

dtype buf[count];  // data will be deposited here
MPI_Bcast(buf, count, type, sourceRank, comm);

2. Scatter

int MPI_Scatter (const void* sendBuffer, int sendCount, MPI_Datatype sendType,
                       void* recvBuffer, int recvCount, MPI_Datatype recvType,
                       int sourceRank, MPI_Comm comm);

int MPI_Iscatter(const void* sendBuffer, int sendCount, MPI_Datatype sendType,
                       void* recvBuffer, int recvCount, MPI_Datatype recvType,
                       int sourceRank, MPI_Comm comm, MPI_Request* request);

Let totalSize = N * rc, where N is the size of the communicator, and rc is the count to be received by each process.

Sender: In code for rank==sourceRank:

dtype buf[] = { ...data to be scattered... }; // length of "buf" is totalSize
MPI_Scatter(buf, rc, type, xxx, nnn, type, sourceRank, comm);

where: "xxx, nnn" is either "MPI_IN_PLACE, 0" or "anotherBuf, rc".

Receiver: In code for rank≠sourceRank:

dtype buf[rc];  // data will be deposited here
MPI_Scatter(nullptr, rc, type, rBuf, rc, type, sourceRank, comm);

3. Gather

The function prototype for MPI_Gather is identical to that for MPI_Scatter. But note the meaning of the rank parameter is "destinationRank" instead of "sourceRank".

int MPI_Gather (const void* sendBuffer, int sendCount, MPI_Datatype sendType,
                      void* recvBuffer, int recvCount, MPI_Datatype recvType,
                      int destRank, MPI_Comm comm);

int MPI_Igather(const void* sendBuffer, int sendCount, MPI_Datatype sendType,
                      void* recvBuffer, int recvCount, MPI_Datatype recvType,
                      int destRank, MPI_Comm comm, MPI_Request* request);

Receiver: In code for rank==destRank:

dtype buf[totalSize];
MPI_Gather(xxx, nnn, type, buf, rc, type, destRank, comm);

Sender: In code for rank≠destRank:

dtype buf[rc] = { ...data to be gathered back by destRank process... };
MPI_Gather(buf, rc, type, nullptr, rc, type, destRank, comm);

Variations on Scatter and Gather

MPI_Scatterv, MPI_Iscatterv, MPI_Gatherv, and MPI_Igatherv allow assigning variable amounts of work to processes; for example, when the size of work is not evenly divisible by the number of processes over which the work is distributed.

4. Reduce

int MPI_Reduce (const void* sendBuffer, void* recvBuffer, int count, MPI_Datatype type,
                MPI_Op op, int destRank, MPI_Comm comm);

int MPI_Ireduce(const void* sendBuffer, void* recvBuffer, int count, MPI_Datatype type,
                MPI_Op op, int destRank, MPI_Comm comm, MPI_Request* request);

Receiver: In code for rank==destRank:

dtype buf[M]; // data deposited after reduction here
// Depending on the application and the reduction operation, "localBuf" may or may not be the same as "buf"
MPI_Reduce(localBuf, buf, count, type, theOp, destRank, comm);

Sender: In code for rank≠destRank:

dtype localBuf[M] = { ...the data... };
MPI_Reduce(localBuf, nullptr, count, type, theOp, destRank, comm);

Advanced Collective Message Passing APIs

1. All-to-all Scattering

MPI_Alltoall(const void* sendbuf, int sendCount, MPI_Datatype sendType,
                   void* recvbuf, int recvCount, MPI_Datatype recvType, MPI_Comm comm);
	

(Also MPI_Alltoallv and immediate versions of both.)

2. All-to-all Gathering

MPI_Allgather(const void* sendbuf, int sendCount, MPI_Datatype sendType,
                    void* recvbuf, int recvCount, MPI_Datatype recvType, MPI_Comm comm);
	

(Also MPI_Allgatherv and immediate versions of both.)

3. All-to-all Reduction

MPI_Allreduce(const void* sendbuf, void* recvBuf, int count, MPI_Datatype dataType,
              MPI_Op op, MPI_Comm comm);

(Also MPI_Iallreduce.)

Global Synchronization