Scalable Distributed Inverted List Indexes in Disaggregated Memory

Research output: Contribution to conferencePaperpeer-review

Abstract

Memory disaggregation separates compute (CPU) and main memory resources into disjoint physical units
to enable elastic and independent scaling. Connected via high-speed RDMA-enabled networks, compute
nodes can directly access remote memory. This setting often requires complex protocols with many network
roundtrips as memory nodes have near-zero compute power.
In this paper, we design a scalable distributed inverted list index for disaggregated memory architectures.
An inverted list index maps a set of terms to lists of documents that contain this term. Current solutions either
partition the index horizontally or vertically with severe limitations in the disaggregated memory setting due to
data & access skew, high network latency, or out-of-memory errors. Our method partitions lists into fixed-size
blocks and spreads them across the memory nodes to balance skewed accesses. Block-based list processing
keeps the memory footprint of compute nodes low and masks latency by interleaving remote accesses with
expensive list operations. In addition, we propose efficient updates with optimistic concurrency control and
read-write conflict detection. Our experiments confirm the efficiency and scalability of our method.
Original languageEnglish
Number of pages27
DOIs
Publication statusPublished - 30 May 2024
EventACM SIGMOD International Conference on Management of Data - Santiago de Chile, Chile
Duration: 9 Jun 202414 Jun 2024

Conference

ConferenceACM SIGMOD International Conference on Management of Data
Abbreviated titleSIGMOD
Country/TerritoryChile
CitySantiago de Chile
Period9/06/2414/06/24

Keywords

  • Distributed Database Management Systems
  • Disaggregated Memory
  • Inverted Index
  • RDMA

Fields of Science and Technology Classification 2012

  • 102 Computer Sciences

Cite this