Analytics over large graphs is an important and challenging part of the big-data problem: consider for example, the recent observation that any two users of a very popular social networking site are separated by at most 4.75 friends on average.
X-Stream is a system being built at LABOS that enables analytics on big graphs using small clusters of machines. X-Stream is based on the philosophy that sequential access to data works best for all storage mediums: main memory, SSD and magnetic disk. It is therefore built around a storage-centric and streaming philosophy: rearrange graph algorithms to stream data from storage. X-Stream is a large and complex system, currently in the region of 15K lines of code and growing fast. Its foundations range across IO complexity theory, operating system storage stacks, storage device and processor architecture, and finally graph algorithms. We also expect to touch areas such as programming languages in the near future as we explore layering domain specific languages on top of X-Stream.
At its limit, X-Stream is currently capable of analytics on graphs with upwards of 64 billion edges on a single machine using only two attached 3TB magnetic disks, a capacity we expect to grow significantly. This already puts X-Stream in some select company as only large and expensive clusters or supercomputers have hitherto been able to handle graphs that big.
X-Stream was supported by a Hasler foundation grant for “Cache conscious graph processing”.
Chaos is the successor of X-Stream. Unlike X-Stream which is limited to a single machine, Chaos scales out to multiple machines. Chaos treats the aggregate storage of all machines as a single flat disk and uses work stealing to balance the load across nodes in the cluster.
Chaos is currently capable of working on graphs with over 1 trillion edges on a single rack of commodity machines. a milestone that could only be reached before using HPC or large clusters consisting of hundreds or thousands of machines.
- Amitabha Roy, Laurent Bindschaedler, Jasmina Malicevic, and Willy Zwaenepoel: Chaos: Scale-out Graph Processing from Secondary Storage. SOSP 2015.
- Jasmina Malicevic, Amitabha Roy, and Willy Zwaenepoel: Scale-up Graph Processing in the Cloud: Challenges and Solutions. CloudDP 2014.
- Amitabha Roy, Ivo Mihailovic, and Willy Zwaenepoel: X-Stream: Edge-centric Graph Processing using Streaming Partitions. SOSP 2013.
If you are working to reproduce any of the results in our papers, particularly comparative benchmarking with Graphchi or other graph processing systems, please make sure to read this first.