The goal of this project is to study and implement TRIÈST, a streaming graph processing algorithm which aims to count the local and global triangles in fully dynamic streams. For the purpose of the assignment, we would focus on implementing both the basic and improved version of the algorithm.
In order to accomplish the task, the following two steps will be performed:
- First, implement the reservoir sampling or the Flajolet-Martin technique used in the graph algorithm presented in the paper.
- Second, develop the streaming graph algorithm that uses reservoir sampling.
This project has been implemented using Python and PySpark, to be able to have a parallelized version of the algorithm when running in a distributed cluster. Additionally, the dependencies NumPy, random, and urllib.request (to download the dataset from remote storage), were used for the implementation.
The selected graph dataset CA-AstroPh, was retrieved from Stanford Network Analysis Platform (SNAP) collection, it contains a collaboration network of papers published on arXiv in the AstroPhysics category.
The results together with an extensive hypertuning using Tensorboard are detailed in the project report.
- Serghei Socolovschi serghei@kth.se
- Angel Igareta alih2@kth.se