Due: April 6, 2020 (emailed by 11:59:59 p.m.)
Consider the following two files that contain various types of data for 500 U.S. cities:
The two files contain the same basic data in two different forms. The csv file will probably be easier for you to parse, but the json file contains additional descriptive information that will tell you what the columns of the spreadsheet mean. You can just search through the json file, looking for "description". Skip the first instance which is the description for the data file as a whole. The second and subsequent instances of "description" identify the data stored in columns A, B, C, ….
Example mpirun command | answer reported by rank 0 process for this example command |
mpirun -np 20 proj2 sr max D | New York, NY, Population2010 = 8175133.00 |
mpirun -np 20 proj2 sr min D | Burlington, VT, Population2010 = 42417.00 |
mpirun -np 20 proj2 sr avg CO | Average OBESITY_CrudePrev = 23.40 |
mpirun -np 20 proj2 sr number AS gt 55 | Number cities with COLON_SCREEN_CrudePrev gt 55 = 430 |
mpirun -np 20 proj2 sr number CO lt 20 | Number of cities with OBESITY_CrudePrev lt 20 = 109 |
The string following "max", "min", "avg", and "number" refers to a column in the original spreadsheet (e.g., column D, column CO, and column AS as in the table above). No relationals other than "lt" or "gt" ("less than" and "greater than", respectively) are needed.
Note that a city is only reported for "max" and "min".
For the example commands here in #4, divide the work evenly among the number of processes specified in the "-np" directive. There are 500 cities in the data file, hence immediately exit with an error message (reported only by rank 0) if the program is launched with an "-np" value that does not evenly divide 500.
Example mpirun command | answer reported by rank 0 process for this example command |
mpirun -np 4 proj2 bg max D E I CO | max Population2010 = 8175133.00; New York, NY max ACCESS2_CrudePrev = 51.50; Pharr, TX max ARTHRITIS_CrudePrev = 36.80; Charleston, WV max OBESITY_CrudePrev = 38.80; Dayton, OH |
mpirun -np 4 proj2 bg min D E I CO | min Population2010 = 42417.00; Burlington, VT min ACCESS2_CrudePrev = 4.20; Newton, MA min ARTHRITIS_CrudePrev = 9.40; College Station, TX min OBESITY_CrudePrev = 12.20; Milpitas, CA |
mpirun -np 4 proj2 bg avg D E I CO | avg Population2010 = 206041.62 avg ACCESS2_CrudePrev = 19.41 avg ARTHRITIS_CrudePrev = 22.43 avg OBESITY_CrudePrev = 23.40 |
Again, note that a city is only reported for "max" and "min".
For the example commands here in #5, each process must do the work associated with one of the requested columns. Therefore the number of processes specified by "-np" must be the same as the number of columns to be examined. Notice in the examples shown, we wanted to examine four columns (D, E, I, and CO), hence we specified "-np 4". Immediately terminate the program with an error message if the number of processes is not the same as the number of columns to scan.
Do you know why I specified "scatter" for the queries in #4, but "broadcast" for the queries in #5? Obviously I want you to get experience with both, but there is more to it than that. (I could ask a similar question of why "reduce" in #4 and "gather" in #5, but that should be a bit more obvious.)
Remove any object files (i.e., *.o) and your linked executable program. Then create and send a tar file of the project2 directory to me at jrmiller@ku.edu.