Blog

Sharing our knowledge

GraphX- Pregel internals and tips for distributed graph processing

Apache Spark is becoming the most popular data analysis tool recently. And hence understanding its internals for better performant code writing is very important for a data scientist. GraphX is a specific Spark API for Graph processing on top of Spark. It has many improvements for graph specific processing. Along with some basic popular graph algorithms like PageRank and Connected Components, it also provides a Pregel API for developing any vertex-centric algorithm. Understanding the internals of the Pregel function and other GraphX APIs are important to write well-performing algorithms. Keeping that in mind, in this blog, we will elaborate the internals of Pregel as implemented in GraphX.