Event
Ph.D. Dissertation Defense: Alexandria Barghi
Tuesday, March 11, 2025
4:00 p.m.
AVW 2460
Maria Hoo
301 405 3681
mch@umd.edu
ANNOUNCEMENT: Ph.D. Dissertation Defense
Name: Alexandria Barghi
Committee:
Prof. Manoj Franklin, Chair
Prof. Donald Yeung
Prof. Shruvya BhattacharyyaProf. Cunxi Yu
Prof. Ramani Duraiswami, Dean's Representative
Date/Time: Tuesday, March 11, 2025 at 4:00 PM
Location: AVW 2460
Title: Beyond Property Graphs: Development and Applications of a Hybrid Graph Analytics System
Over the past two decades, graph analytics has grown from a fairly niche discipline into a ubiquitous field touching nearly every aspect of modern computing. This growth began with the internet, which led to the standardization of OWL and RDF, and was accelerated by social media and large-scale e-commerce, which produced a slew of content that could easily be represented as a graph. Along the way, graph databases emerged as an alternative to relational databases, and graph processing systems were conceived to accelerate graph algorithms and process graph queries. Today, graphs are used to solve a huge variety of problems across domains, from recommendation systems that serve targeted advertisements, to bank and insurance claim fraud detection, to customer churn prediction, to molecular and genomic analysis.
We now stand at a new frontier of graph analytics, at the intersection of traditional graph analytics, the rapidly growing technology of graph neural networks (GNNs), and large language models. These technologies are the key to unlocking the power of generative artificial intelligence, delivering more accurate, complete, up-to-date, and hallucination-free models through the process of retrieval augment generation (RAG). However, the sheer size and complexity of enterprise-scale graphs makes deploying a graph based RAG difficult. Much of the existing technology in these areas was not designed to work together, or to work at the scale needed by a large enterprise.
In this dissertation, I will discuss the state of graph analytics, how it can be applied to RAG, and the challenges currently preventing universal adoption of graph-based RAG. I will then introduce my work, the BitGraph framework, which provides the keys to unlocking GPU-accelerated graph-based RAG.
In the first component of my dissertation, I will start by describing the core construction of the BitGraph framework and its subcomponents, which include the Gremlin++ query language and Maelstrom, a lightweight backend for accelerating vector operations on both the CPU and GPU. I will show how the BitGraph framework is constructed in layers, with Maelstrom as the bottom layer, Gremlin++ as the middle layer, and BitGraph as the top layer, and how this construction makes the framework extremely versatile and enables acceleration of up to 35x over an equivalent naive CPU implementation.
In the second component of my dissertation, I will analyze and discuss handling and processing graph queries at scale, focusing on the theory behind query acceleration and how it applies to the paradigms and data structures used in the BitGraph framework. I will then show how specific types of query optimizations, called traversal strategies, work together to perform end-to-end just-in-time optimization to deliver 100% speedup on a simple query, and 40% speedup on a more advanced query.
In the third and final component of my dissertation, I will show how to use BitGraph and its query language, Gremlin++, to solve a large-scale RAG problem. I will also discuss how I further extended the BitGraph framework to include support for vertex embeddings, which were critical in developing what I call prize-aware graph traversal, a type of graph traversal that starts from a set of initial vertices and greedily selects additional vertices in the neighborhood through embedding comparison.