New data coding methods for large data centers

news story image

Modern-day data centers store large volumes of information in distributed form, placing parts of the same data file on different servers in the system. Unfortunately, servers fail on a regular basis, either transiently or permanently. When they do, recovering the data stored on them depends on methods of data encoding. Developing these methods is critical to designing large-scale storage systems.

Professor Alexander Barg (ECE/ISR) is the principal investigator for a three-year, $500K National Science Foundation grant, “From Storage Codes to Recoverable Systems.” In this project, Barg will construct methods of data encoding that account for low-cost data recovery based on the nature of connections between the servers such as spatial proximity or the availability of communication links.

The project represents a shift from the broadly studied problems of data reconstruction that discount the varying cost of moving data between servers based on the topology of the system and opens a possibility of engaging new mathematical methods for designing efficient methods of data encoding and reconstruction.

Barg will advance high-density storage systems based on recently discovered applications of computer science and applied mathematics tools to the code design. He will also establish new statistical properties of methods of data encoding as well as the limits on the volume of data that can be stored in the system while maintaining the recovery functionality.

In large-scale distributed storage, different parts of the codeword comprising a chunk of data are stored on different servers, and efficient recovery hinges on the ability to reconstruct unavailable data from servers that are in close proximity of the failed storage node. In technical terms, the system is described by a graph where coordinates of the codeword are stored on different vertices, and the value of each vertex is a function of the values of its neighbors in the graph.

Barg will construct efficient encoding methods based on recently established connections between codes for storage, index codes, and low-density quantum codes. He will construct codes of the highest possible rate and develop iterative procedures for recovering multiple nodes for various classes of graphs, relying on their algebraic properties and on methods from percolation theory on finite graphs. Barg will also address coding systems that support data recovery on infinite graphs, such as the graph of integers and the two-dimensional integer lattice, using methods from constrained systems, symbolic dynamics, and entropy theory to establish the maximum possible density of data stored in large-scale systems.

Published July 2, 2021