Lawrence Livermore National Laboratory, United States of America
Subgraph search in a massive background graph, i.e., pattern matching in graphs, is a challenging problem, particularly in an interactive usage scenario where fast response time is important. Our approach, PruneJuice, is based on two intuitions: rather than directly searching for individual matches, it is cheaper to eliminate vertices and edges of the background graph that do not participate in any match. Second, to perform this pruning process, it is possible to decompose a search pattern into a set of constraints, and uses each of these constraints to eliminate non-matching vertices and edges. This paper explores the feasibility of indexing the background graph to accelerate the checking of non-local constraints (e.g., cycles and paths), the most expensive phase of pruning.
In particular, we demonstrate that indexing labeled triangle (i.e., a 3-Cycle) participation is a valuable acceleration technique. Additionally, we observe that labeled triangle counts at edges can be employed to detect 3-Paths that have terminal points with non-unique labels, at relatively little extra cost. Such paths and triangles are basic sub-constraints that can be used to rapidly prune non-matching vertices and edges. As proof of concept, we show that, using triangle information alone, indexing further accelerates expensive queries in PruneJuice by up to 500x on billion-edge graphs.