Algorithms for Large-Scale Graph Processing
- Instructors: Mohsen Ghaffari and Jara Uitto
- Credit: 2CP
- Time and Place: Wednesdays 15 - 16 at CAB 15.2
- Course Description: This is a theory seminar, where we present and discuss recent algorithmic developments for
processing large-scale graphs. In particular, we focus on Massively Parallel Computation (MPC)
algorithms. MPC is a clean and general theoretical framework that captures the essential aspects
of computational problems in large-scale processing settings such as MapReduce, Hadoop,
Spark, Dryad, etc. This model was introduced in 2010 and it is receiving increasingly more attention from the algorithmic community. During this seminar, we will present and discuss some of the fundamental papers in this growing research area.
As a side remark, we note that the topics discussed in the seminar can also serve as a good starting point for master or bachelor thesis projects.
- Prerequisites: The seminar has no formal prerequisite and in particular, it does not assume familiarity with parallel computation. However, we expect
that all the students are comfortable with basics of algorithms design and analysis, as well as probability theory. For instance, having passed the course Algorithms, Probability, and Computing (APC) is highly recommended, though not required formally.
If you're not sure whether you're ready for this class or not, please consult the instructor.
- Course Catalogue Entry: see here for other information.
Tasks
Your presentation:
Each student will present one of the papers in the list. The length of the presentation is approximately 40 minutes, plus 5 minutes for questions and discussions.
Also, see these guides for how to give a good technical presentation: an article by Ian Parberry,
an article by Jones, Hughes, and Launchbury, and
some slides by Puschel.
Questions about your paper:
Prior to your presentation, you are expected to design a few questions about your paper.
These questions are intended to be about high level concepts in the paper.
By understanding the answers to these questions, one should be able to understand the problem that is studied and particularly, the main challenge in the problem investigated in that paper.
Participation:
For each of your peers' presentation, you are expected to take a quick glance at the paper they are presenting, and answer the few questions that they designed. The main purpose of these questions is to ensure that you have had a look at the paper
and you know its context and basics. This will help you in following the presentation. We will send you the questions a few days before the presentation (tentatively via a doodle link) and you should submit your (short) answers before the presentation.
List of Selected Papers
Select at least three papers from the following list and indicate your choices in the following doodle: Papers.
Indicate a preference ordering in the comment section.
You will be assigned one of these papers.
-
Karloff et al. A model of computation for MapReduce. SODA'10
-
Goodrich et al., Sorting, Searching, and Simulation in the MapReduce Framework. ISAAC'11
-
Lattanzi et al. Filtering: a method for solving graph problems in MapReduce. SPAA'11
-
Kumar et al. Fast Greedy Algorithms in MapReduce and Streaming. SPAA'13
-
Andoni et al. Parallel algorithms for geometric graph problems. STOC'14
-
Hegeman et al. Lessons from the Congested Clique Applied to MapReduce. SIROCCO'14
-
Ahn and Guha. Access to Data and Number of Iterations: Dual Primal Algorithms for Maximum Matching under Resource Constraints. SPAA'15
-
Roughgarden et al. Shuffles and Circuits: (On Lower Bounds for Modern Parallel Computation)
SPAA'16
-
Im et al. Efficient massively parallel methods for dynamic programming:
STOC'17
-
Assadi and Khanna. Randomized Composable Coresets for Matching and Vertex Cover.
SPAA'17
-
Czumaj et al. Round compression for parallel matching algorithms.
STOC'18
-
Ghaffari et al. Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex Cover.
PODC'18
-
Harvey et al. Greedy and Local Ratio Algorithms in the MapReduce Model.
SPAA'18
-
Andoni et al. Parallel Graph Connectivity in Log Diameter Rounds.
FOCS'18
-
Ghaffari et al. Sparsifying Distributed Algorithms with Ramifications in Massively Parallel Computation and Centralized Local Computation.
SODA'19
-
Assadi et al. Sublinear Algorithms for (Delta + 1) Vertex Coloring,
SODA'19
-
Bateni et al. Massively Parallel Dynamic Programming on Trees.
ArXiv'18
-
Brandt et al.. Matching and MIS for Uniformly Sparse Graphs in the Low-Memory MPC Model.
ArXiv'18
-
Behnezhad et al. Massively Parallel Symmetry Breaking on Sparse Graphs: MIS and Maximal Matching.
ArXiv'18
-
Behnezhad et al. Semi-MapReduce Meets Congested Clique.
ArXiv'18
Other Relevant Papers
-
Beame et al.
Communication steps for parallel query processing.
PODS'13
-
Beame et al.
Skew in parallel query processing.
PODS'14
-
Kiveris et al.
Connected Components in MapReduce and Beyond.
SOCC'14
-
Assadi et al. Coresets Meet EDCS: Algorithms for Matching and Vertex Cover on Massive Graphs. ArXiv'17
-
Brandt et al.
Breaking the Linear-Memory Barrier in MPC: Fast MIS on Trees with n^epsilon Memory per Machine.
ArXiv'18
-
Assadi et al.
Massively Parallel Algorithms for Finding Well-Connected Components in Sparse Graphs.
ArXiv'18
-
Burkhardt. Graph connectivity in log-diameter steps using label propagation. ArXiv'18
Schedule
- 19.09.2018 Introduction by Mohsen
- 26.09.2018 Sebastian Brandt: Connected Components in MapReduce and Beyond by Kiveris et al. (SOCC'14).
Please answer these two questions, before 3 pm of September 26.
- 03.10.2018 Manuela Fischer: Filtering: a method for solving graph problems in MapReduce by Lattanzi et al. (SPAA'11).
Please answer these two questions, before 3 pm of October 3.
- 10.10.2018 Jara Uitto: Sorting, Searching, and Simulation in the MapReduce Framework by Goodrich et al. (ISAAC'11).
Please answer these two questions, before 3 pm of October 10.
- 17.10.2018 Andrea Romano: A model of computation for MapReduce by Karloff et al. (SODA'10).
Please answer these two questions, before 3 pm of October 17.
- 24.10.2018 Philipp Hardegger: Fast Greedy Algorithms in MapReduce and Streaming by Kumar et al. (SPAA'13).
Please answer these two questions, before 3 pm of October 24.
- 31.10.2018 Guest Lecture by Silvio Lattanzi (Google Research): Distributed models, MapReduce, and large scale algorithms
- 07.11.2018 Robin Jadoul: Greedy and Local Ratio Algorithms in the MapReduce Model by Harvey et al. (SPAA'18).
Please answer these two questions, before 3 pm of October November 7.
- 14.11.2018 Jean Kaufmann: Parallel Graph Connectivity in Log Diameter Rounds by Andoni et al. (FOCS'18).
Please answer these two questions, before 3 pm of October November 14.
- 21.11.2018 Jingqiu Ding: Improved Massively Parallel Computation Algorithms for MIS, Matching, and Vertex Cover by Ghaffari et al. (PODC'18).
Please answer these two questions, before 3 pm of October November 21.
- 28.11.2018 Ghinea Diana-Elena: Sublinear Algorithms for (Delta + 1) Vertex Coloring by Assadi et al. (SODA'19).
Please answer these two questions, before 3 pm of October November 28.
- 28.11.2018 Simona Holh: Access to Data and Number of Iterations: Dual Primal Algorithms for Maximum Matching under Resource Constraints by Ahn and Guha (SPAA'15).
Please answer these two questions, before 3 pm of October November 28
- 05.12.2018 Davin Choo: Parallel algorithms for geometric graph problems by Andoni et al (STOC'14).
Please answer these two questions, before 3 pm of December 4.
- 12.12.2018 Miklos Horvath: Sparsifying Distributed Algorithms with Ramifications in Massively Parallel Computation and Centralized Local Computation by Ghaffari and Uitto (SODA'19).
Please answer these two questions, before 3 pm of December 12.
- 19.12.2018 Julian Portmann: Shuffles and Circuits: (On Lower Bounds for Modern Parallel Computation) by Roughgarden et al. (SPAA'16).
Please answer these two questions, before 3 pm of December 19.