I joined the Systems Group in January 2017 and since then I have been working with Prof. Timothy Roscoe on Strymon, a system for predictive datacenter analytics. In September 2017 I was awarded the ETH Zurich Postdoctoral Fellowship for my research project "Automatic Scaling of Distributed Streaming Computations Using Graph Analytics on Real-Time Monitoring Data". I am broadly interested in distributed stream processing and large-scale graph analytics.
Before coming to ETH, I did my PhD at KTH, Stockholm, and UCL, Belgium, where I was admitted to a double doctoral program as an EMJD-DC fellow. My thesis, "Performance Optimization Techniques and Tools for Distributed Graph Processing" received the IBM Innovation Award 2017. During my PhD I also spent time at DIMA TU Berlin, Telefonica Research Barcelona, and data Artisans.
I am a committer and PMC member of Apache Flink, an open-source framework and distributed execution engine for stream processing. I have written a book about the system together with Fabian Hueske.
Check it out!
We decribe fundamental concepts of parallel stream processing and discuss how streaming analytics differ from traditional batch data analysis. The book targets software engineers, data engineers, and system administrators willing to learn the basics of Flink's DataStream API, including the structure and components of a common Flink streaming application.
Are you a student looking for BSc/MSc thesis projects? Contact me for a list of topics on stream processing and/or graph analytics.
I am broadly interested in three dimensions of big data analytics and stream processing:
Systems: how to design and implement scalable data processing systems whose capabilities stretch beyond those of traditional data management platforms? My recent work in this area includes understanding the performance of streaming dataflows and enabling accurate automatic scaling of streaming jobs.
Algorithms: how to represent, partition, summarize, and analyze possibly unbounded data of various formats and originating from diverse, distributed sources? My recent work in this area includes a distributed graph summarization technique and a survey of streaming graph partitioning methods in the context of data-parallel continuous processing.
Programming models: how to achieve end-to-end, efficient big data processing while providing expressive, high-level programming models, accessible to data scientists and non-expert users?
My recent work in this area includes a survey of high-level programming abstractions for distributed graph processing.
For a full list of publications, visit my Google Scholar page.
V. Kalavri, J. Liagouris, M. Hoffmann, D. Dimitrova, M. Forshaw, T. Roscoe, Three steps is all you need: fast accurate, automatic scaling decisions for distributed streaming dataflows, in 13th USENIX Symposium on Operating Systems Design and Implementation (OSDI) 2018. [pdf] [slides]
Z. Abbas, V. Kalavri, P. Carbone, V. Vlassov, Streaming Graph Partitioning: An Experimental Study, in Proc. VLDB Endow. 11, 11 (2018). [pdf]
M. Hoffmann, A. Lattuada, J. Liagouris, V. Kalavri, D. Dimitrova, S. Wicki, Z. Chothia, T. Roscoe, SnailTrail: Generalizing Critical Paths for Online Analysis of Distributed Dataflows, in 15th USENIX Symposium on Networked Systems Design and Implementation (NSDI '18). [pdf] [slides]
V. Kalavri, V. Vlassov and S. Haridi, High-Level Programming Abstractions for Distributed Graph Processing, in IEEE Transactions on Knowledge and Data Engineering, vol. 30, no. 2, pp. 305-324, Feb. 1 2018. [pdf]
I have been involved in teaching activities since the beginning of my Master studies. During the past 7 years, I have prepared and taught courses at various academic institutions, industrial training programs, conferences, and Summer schools. I have given lectures to audiences with diverse educational and cultural background and expertise levels, ranging from first-year Bachelor students in Sweden to senior software engineers in China.
Spring 2019: Data Stream Processing and Analytics
3rd International Summer School on Data Science, Croatia, 2018.
Big Data Analytics Summer School, Stockholm, 2017 & 2018.
Invited tutorial at the 31st British International Conference on Databases (BICOD), London, 2017.
Apache Flink tutorials at BOSS workshop (VLDB), 2016 & 2017.
2nd Int’l ScaDS Summer School on Big Data, Germany, 2016.
EIT Summer School on Cloud and Big Data, Sweden, 2016.
2017, IBM Innovation Award.
In recognition of an outstanding PhD thesis that presents an original contribution to informatics or its applications.
2017, ETH Zurich Postdoctoral Fellowship.
Research project: Automatic scaling of distributed streaming computations using graph analytics on real-time monitoring data.
2012, Erasmus Mundus Doctoral Fellowship.
Hosts: KTH Royal Institute of Technology and Universite catholique de Louvain.
For more presentations, visit my Slideshare profile.
CCGrid 2019 (Applications and Data Science track co-Chair)
OPODIS 2018 (PC member)
ICAC 2018 (Sub-reviewer)
DBPL 2019 (co-located with PLDI 2019)
GRADES-NDA 2019 (co-located with SIGMOD 2019)
DBTest 2018 (co-located with SIGMOD 2018)
GRADES-NDA 2018 (co-located with SIGMOD 2018)
GABB 2018 (co-located with IPDPS 2018)
GABB 2017 (co-located with IPDPS 2017)
DEEM 2017 (co-located with SIGMOD 2017)
IEEE Transactions on Knowledge and Data Engineering
Flink Forward San Francisco 2017
Berlin Buzzwords 2017
Flink Forward Berlin 2016
Berlin Buzzwords 2016