Counting with Prediction: Rank and Select Queries with Adjusted Anchoring

Külekci M. O.

Data Compression Conference (DCC), Utah, United States Of America, 22 - 25 March 2022, pp.409-418 identifier identifier

  • Publication Type: Conference Paper / Full Text
  • Doi Number: 10.1109/dcc52660.2022.00049
  • City: Utah
  • Country: United States Of America
  • Page Numbers: pp.409-418
  • Istanbul Technical University Affiliated: Yes


Rank and select queries are the fundamental building blocks of the compressed data structures. On a given bit string of length n, counting the number of set bits up to a certain position is named as the rank, and finding the position of the kth set bit is the select query. We present a new data structure and the procedures on it to support rank/select operations. The proposed scheme introduces (log 2m/d + log n/s.d) overhead bits per each bit over the n-bits long input bit string, where d is the inner-block size in bits, s is the number of inner-blocks in a super-block, and m, is a properly chosen constant modulus value. When compared to the previous two-level hierarchical data structures that generate (log(s.d)/d + log n/s.d) overhead bits per bit, the new approach reduces the space consumption significantly with proper selection of the parameters. With the new data structure, the rank queries are usually (approximate to 90% of the time) executed in O(t(d)) time, where O(t(d)) is the time required to compute a rank in an inner-block of length d-bits, which is assumed to be constant via the wide-register instructions in modern processors. Seldom, it may require to investigate more than one block, where on average this is observed to be around two blocks, empirically. We provide probabilistic analyses on how to choose the appropriate parameters and present several trade-offs to guarantee constant-time rank. We also investigate using the same data structure to support the select queries as well. Experimental evaluation of the introduced scheme revealed that the proposed data structure consumes nearly 30%-50% less space than its alternatives by introducing less than 5% overhead, while the speed is either better or very competitive when compared with the current state-of the art implementations both in terms of rank and select.