My research is focused on studying different aspects of dissagregating compute and storage in the context of large-scale data warehousing. My interests are around distributed systems, stream processing and databases.
I also contribute to different open source projects from the Apache Software Foundation.
Apache Gora: It is an in-memory processing framework which abstract different NoSQL data model into a simple key-value one.
Apache Samza: It is the streaming transformation platform using Apache Kafka as its main transportation layer.
Apache Nutch: It is a distributed web crawler leveraring Hadoop for spawning tasks. It mainly does wide crawls but it is being enhanced to provide more vertical crawls.
Apache Giraph: It is an iterative graph processing system built for high scalability.