4  Chapter 3: Distributed Memory Programming with MPI

In this chapter, we will be focusing on using the OpenMPI library with C to create parallel programs (SPMD) that run on distributed memory systems (multiple processes as opposed to multithreading).

4.1 OpenMPI Cheatsheet

4.1.1 Import

#include <mpi.h>

4.1.2 Compilation

mpicc -g -Wall -o path/to/executable path/to/source/code

4.1.3 Execution

mpiexec -n <number of processes> path/to/executable

4.1.4 Initialization

int MPI_Init(
    int* argc_p,
    char*** argv_p
);

Takes in the arguments given to the main function (the values passed in the CLI when executing). Sincle usually the main function will take no input, you will most likely call it in the following manner

MPI_Init(NULL, NULL);

4.1.5 Finalization

int MPI_Finalize(void);

Usually called

MPI_Finalize();

4.1.6 MPI_COMM_WORLD

The most commonly used communication world in MPI is MPI_COMM_WORLD (all caps).

4.1.7 Number of Processors in Communicator

int MPI_Comm_size(
    MPI_Comm comm,
    int* comm_sz_p
);

You give it an MPI communicator and an integer pointer. It places the number of processes in the passed pointer. Usually used in the following manner

int comm_sz;
MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);

The variable commm_sz now stores the number of processes in MPI_COMM_WORLD.

4.1.8 Current Process Rank

The rank acts like the id of the process in its communicator, and it is a value in between 0 and \(p-1\) where \(p\) is the number of process in the communicator.

int MPI_Comm_rank(
    MPI_Comm comm,
    int* my_rank_p
);

Its interface is very similar to MPI_Comm_size. It is usually used in the following manner

int my_rank;
MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

4.1.9 Datatypes

MPI Datatype C Datatype
MPI_CHAR signed char
MPI_SHORT signed short int
MPI_INT signed int
MPI_LONG signed long int
MPI_LONG_LONG signed long long int
MPI_UNSIGNED_CHAR unsigned char
MPI_UNSIGNED_SHORT unsigned short int
MPI_UNSIGNED unsigned int
MPI_UNSIGNED_LONG unsigned long int
MPI_FLOAT float
MPI_DOUBLE double
MPI_LONG_DOUBLE long double
MPI_BYTE
MPI_PACKED

4.1.10 Sending Data

int MPI_Send(
    void* msg_buffer_pointer,
    int msg_size,
    MPI_Datatype msg_type
    int dest_rank,
    int tag,
    MPI_Comm communicator
);

MPI_Send is used to send a message from one process to another. It takes 6 parameters

  1. A pointer to the variable containing the message to be sent
  2. The number of msg_types to be sent. For example, if you are sending a string s, the message type will be MPI_CHAR and the value of msg_size will be strlen(s) + 1, where +1 is for the \0 character in the end.
  3. The type of the data being sent.
  4. The rank of the destination process
  5. Tag of the message to be sent
  6. The communicator; usually MPI_COMM_WORLD

Example of using it:

MPI_Send(greeting, strlen(greeting)+1, MPI_CHAR, 0, 0, MPI_COMM_WORLD);

4.1.11 Receiving Data

int MPI_Recv(
    void* msg_buffer_pointer,
    int buffer_size,
    MPI_Datatype buffer_type
    int dest_rank,
    int tag,
    MPI_Comm communicator,
    MPI_Status* status_pointer
);

7 parameters. The first 6 are the same as MPI_Send with the only difference being that the second parameter is now the buffer size and can be bigger (but not smaller) than the received message. The last parameter will have its own section. If you are not interested in recieveing the status, just pass MPI_STATUS_IGNORE to it.

Example of using it:

const int MAX_STRING = 100;
char greeting[MAX_STRING];
MPI_Recv(greeting, MAX_STRING, MPI_CHAR, 5, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);

4.1.12 The Stauts of MPI_Recv

The last paramater of MPI_Recv is a pointer to a struct that will carry information related to the received message. At least, it includes

  • status.MPI_SOURCE: the rank of the message sender
  • status.MPI_TAG: the tag of the recieved message
  • status.MPI_ERROR

You can also use it to extract the amount of data that has been recieved by passing it to the MPI_Get_count function.

int count;
MPI_Get_count(&status, recv_type, &count);

4.1.13 Wildcards

You could pass MPI_ANY_SOURCE to the source parameter of MPI_Recv to receive a message from any process.

You could pass MPI_ANY_TAG to the tag parameter of MPI_Recv to receive a message from any process.

There is no wildcard for the communicator.

4.1.14 General Structure of an MPI program

#include <mpi.h>

int main(){

    MPI_Init(NULL, NULL);

    int comm_sz;
    int my_rank;

    MPI_Comm_size(MPI_COMM_WORLD, &comm_sz);
    MPI_Comm_rank(MPI_COMM_WORLD, &my_rank);

    /*
    The actual algorithm code
    Uses MPI_Send and MPI_Recv
    */

    MPI_Finalize();
    return 0;
}

4.1.15 MPI Collective Communication

  • All the processes in the communicator must call the same collective function
  • The arguments passed from the different processes must be compatible
  • The output_data_p parameter, which is used in most collective communication functions, is only used on the dest_process. All others must pass an argument that can be NULL to it.
  • Collective communications don’t use tags. They are mached based on the communicator and the order of the calls.

4.1.16 MPI_Reduce

This is a collective communication operation.

It performs reduction operation: combines data from all processes and performs an operation that joins them together into a single value. The result is stored in the output_data_p parameter of the dest_process.

int MPI_Reduce(
    void* input_data_p,
    void* output_data_p,
    int count,
    MPI_Datatype datatype,
    MPI_Op op,
    int root,
    MPI_Comm communicator
);

Sample usage:

int local_int = my_rank;
int global_sum;
MPI_Reduce(&local_int, &global_sum, 1, MPI_INT, MPI_SUM, 0, MPI_COMM_WORLD);

4.1.17 MPI_Op

Available operations in MPI are

Operation Description
MPI_MAX Maximum value
MPI_MIN Minimum value
MPI_SUM Sum of values
MPI_PROD Product of values
MPI_LAND Logical AND
MPI_BAND Bitwise AND
MPI_LOR Logical OR
MPI_BOR Bitwise OR
MPI_LXOR Logical XOR
MPI_BXOR Bitwise XOR
MPI_MAXLOC Maximum value and location
MPI_MINLOC Minimum value and location

4.1.18 MPI_Allreduce

Same like reduce but stores the result in all processes. Same parameters except for the root.

int MPI_Allreduce(
    void* input_data_p,
    void* output_data_p,
    int count,
    MPI_Datatype datatype,
    MPI_Op op,
    MPI_Comm communicator
);

4.1.19 Broadcast

A collective communication operation that sends data from one process to all other processes in the communicator.

The images below show how a broadcast operation (global sum) works using the tree-like structure and the butterfly structure.

Broadcast using tree-like structure

Broadcast using butterfly structure

4.1.20 MPI_Bcast

Sends data from one process to all other processes in the communicator.

int MPI_Bcast(
    void* data_p,
    int count,
    MPI_Datatype datatype,
    int source_proc,
    MPI_Comm communicator
);

4.1.21 MPI_Scatter

Reads an entire vector from the source process and sends a portion of it to each process in the communicator.

int MPI_Scatter(
    void* send_data_p,
    int send_count,
    MPI_Datatype send_datatype,
    void* recv_data_p,
    int recv_count,
    MPI_Datatype recv_datatype,
    int src_proc,
    MPI_Comm comm
);

send_count is the amount of data elements going to each process not the total amount of data being sent.

4.1.22 MPI_Gather

Reads a portion of a vector from each process in the communicator and sends it to the source process where it is concatenated together.

int MPI_Gather(
    void* send_data_p,
    int send_count,
    MPI_Datatype send_datatype,
    void* recv_data_p,
    int recv_count,
    MPI_Datatype recv_datatype,
    int dest_proc,
    MPI_Comm comm
);

recv_count is the amount of data elements coming from each process not the total amount of data being received.

4.1.23 MPI_Allgather

Same as MPI_Gather but the result is stored in all processes in the communicator. Once again, recv_count is the amount of data elements coming from each process not the total and there is no dest_proc parameter.

int MPI_Allgather(
    void* send_data_p,
    int send_count,
    MPI_Datatype send_datatype,
    void* recv_data_p,
    int recv_count,
    MPI_Datatype recv_datatype,
    MPI_Comm comm
);

4.1.24 MPI_Barrier

Ensures that no process will return from calling it until every process in the communicator has started calling it.

int MPI_Barrier(MPI_Comm comm);

4.1.25 MPI_Ssend

Alternative to MPI_Send that is synchronous. Guranteed to block until the message receive starts.

4.1.26 MPI_Sendrecv

Carries out a blocking send and a receive in a single call.

The destination and source processes can be the same or different, and the send and receive buffers can be the same or different.

int MPI_Sendrecv(
    void* send_data_p,
    int send_count,
    MPI_Datatype send_datatype,
    int dest_proc,
    int send_tag,
    void* recv_data_p,
    int recv_count,
    MPI_Datatype recv_datatype,
    int source_proc,
    int recv_tag,
    MPI_Comm comm,
    MPI_Status* status_p
);

4.1.27 Tips

  • All MPI functions start with MPI_ and only the first follwing character is capitalized
  • Constants are all caps
Important

These notes skip MPI derived datatypes and performace evaluation.