Graph analytics often target large-scale web and social networks, which are typically power-law graphs. Graph algorithms can often be recast as generalized SpMV operations, making SpMV important to optimize for graph analytics. However, executing SpMV on large-scale power-law graphs results in highly irregular memory access patterns with poor cache utilization. Worse, we find that existing locality and vectorization optimizations are largely ineffective on modern out-of-order processors. To improve performance for power-law graphs on modern OOO processors, we propose Locality-Aware Vectorization (LAV). LAV is a new approach that leverages the input’s power-law nature to extract locality and enable effective vectorization for SpMV-like memory access patterns.
We evaluate LAV with several graphs on an Intel Skylake-SP processor, and find that LAV is faster than CSR (and prior approaches) by an average of 1.5x. LAV reduces the number of DRAM accesses by 35% on average with only 3.3% memory overhead.