Algorithms for Large-Scale Graph Processing

Instructors: Mohsen Ghaffari and Jara Uitto
Credit: 2CP
Time and Place: Wednesdays 15 - 16 at CAB 15.2

Course Description: This is a theory seminar, where we present and discuss recent algorithmic developments for processing large-scale graphs. In particular, we focus on Massively Parallel Computation (MPC) algorithms. MPC is a clean and general theoretical framework that captures the essential aspects of computational problems in large-scale processing settings such as MapReduce, Hadoop, Spark, Dryad, etc. This model was introduced in 2010 and it is receiving increasingly more attention from the algorithmic community. During this seminar, we will present and discuss some of the fundamental papers in this growing research area. As a side remark, we note that the topics discussed in the seminar can also serve as a good starting point for master or bachelor thesis projects.

Prerequisites: The seminar has no formal prerequisite and in particular, it does not assume familiarity with parallel computation. However, we expect that all the students are comfortable with basics of algorithms design and analysis, as well as probability theory. For instance, having passed the course Algorithms, Probability, and Computing (APC) is highly recommended, though not required formally. If you're not sure whether you're ready for this class or not, please consult the instructor.

Course Catalogue Entry: see here for other information.


Your presentation:
Each student will present one of the papers in the list. The length of the presentation is approximately 40 minutes, plus 5 minutes for questions and discussions. Also, see these guides for how to give a good technical presentation: an article by Ian Parberry, an article by Jones, Hughes, and Launchbury, and some slides by Puschel.

Questions about your paper:
Prior to your presentation, you are expected to design a few questions about your paper. These questions are intended to be about high level concepts in the paper. By understanding the answers to these questions, one should be able to understand the problem that is studied and particularly, the main challenge in the problem investigated in that paper.

For each of your peers' presentation, you are expected to take a quick glance at the paper they are presenting, and answer the few questions that they designed. The main purpose of these questions is to ensure that you have had a look at the paper and you know its context and basics. This will help you in following the presentation. We will send you the questions a few days before the presentation (tentatively via a doodle link) and you should submit your (short) answers before the presentation.

List of Selected Papers

Select at least three papers from the following list and indicate your choices in the following doodle: Papers. Indicate a preference ordering in the comment section. You will be assigned one of these papers.
  1. Karloff et al. A model of computation for MapReduce. SODA'10
  2. Goodrich et al., Sorting, Searching, and Simulation in the MapReduce Framework. ISAAC'11
  3. Lattanzi et al. Filtering: a method for solving graph problems in MapReduce. SPAA'11
  4. Kumar et al. Fast Greedy Algorithms in MapReduce and Streaming. SPAA'13
  5. Andoni et al. Parallel algorithms for geometric graph problems. STOC'14
  6. Hegeman et al. Lessons from the Congested Clique Applied to MapReduce. SIROCCO'14
  7. Ahn and Guha. Access to Data and Number of Iterations: Dual Primal Algorithms for Maximum Matching under Resource Constraints. SPAA'15
  8. Roughgarden et al. Shuffles and Circuits: (On Lower Bounds for Modern Parallel Computation) SPAA'16
  9. Im et al. Efficient massively parallel methods for dynamic programming: STOC'17
  10. Assadi and Khanna. Randomized Composable Coresets for Matching and Vertex Cover. SPAA'17
  11. Czumaj et al. Round compression for parallel matching algorithms. STOC'18
  12. Ghaffari et al. Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex Cover. PODC'18
  13. Harvey et al. Greedy and Local Ratio Algorithms in the MapReduce Model. SPAA'18
  14. Andoni et al. Parallel Graph Connectivity in Log Diameter Rounds. FOCS'18
  15. Ghaffari et al. Sparsifying Distributed Algorithms with Ramifications in Massively Parallel Computation and Centralized Local Computation. SODA'19
  16. Assadi et al. Sublinear Algorithms for (Delta + 1) Vertex Coloring, SODA'19
  17. Bateni et al. Massively Parallel Dynamic Programming on Trees. ArXiv'18
  18. Brandt et al.. Matching and MIS for Uniformly Sparse Graphs in the Low-Memory MPC Model. ArXiv'18
  19. Behnezhad et al. Massively Parallel Symmetry Breaking on Sparse Graphs: MIS and Maximal Matching. ArXiv'18
  20. Behnezhad et al. Semi-MapReduce Meets Congested Clique. ArXiv'18

Other Relevant Papers