

pixie is a succinct data structures library.
Features
- BitVector
- Data structure with 3.61% overhead supporting rank and select for 1 bits.
- Supports:
rank(i): number of set bits (1s) up to position i.
select(k): position of the k-th set bit.
- Similar operations
rank0/select0 for 0.
- Implementation mainly follows [1] with SIMD optimizations similar to [2]
- Optimized via AVX-512/AVX-2, for large binary sequences performance is I/O bounded.
- RmMTree
- Implementation of a range min-max tree, it supports
rank, select and excess-related operations allowing for a fast navigation in DFUDS/BP trees.
Requirements
Build Instructions
git clone https://github.com/Malkovsky/pixie.git
cd pixie
cmake --preset release
cmake --build --preset release
Manual alternative:
mkdir -p build/release
cmake -B build/release -DCMAKE_BUILD_TYPE=Release
cmake --build build/release -j
Tests are enabled by default (PIXIE_TESTS=ON). Benchmarks are opt-in; enable with -DPIXIE_BENCHMARKS=ON or configure with the benchmarks-all preset. Use benchmarks-third-party for comparison backends such as sdsl-lite, and benchmarks-diagnostic for performance diagnostics (Release with debug info + performance counters support).
Running Tests
After building with presets, binaries are located in build/release.
BitVector
./build/release/unittests
RmM Tree
Coverage
Configure a coverage build with GCC (benchmarks disabled):
cmake --preset coverage
cmake --build --preset coverage
Run tests and generate the gcov text report:
./scripts/coverage_report.sh
Running Benchmarks
Before running benchmarks, configure with presets:
cmake --preset benchmarks-all
cmake --build --preset release
For a RelWithDebInfo diagnostic build, use:
cmake --preset benchmarks-diagnostic
cmake --build --preset release
BitVector
Benchmarks are random 50/50 0-1 bitvectors up to $2^{34}$ bits.
./build/release/benchmarks
Write JSON and plot size-scaled benchmark curves with a log-scaled x-axis:
./build/release/benchmarks --benchmark_out=bitvector_bench.json --benchmark_out_format=json
python3 scripts/plot_size_benchmarks.py bitvector_bench.json -o graphs/bitvector_size.png --size-key n
Excess Positions
./build/release/excess_positions_benchmarks --benchmark_out=excess_positions.json --benchmark_out_format=json
python3 scripts/excess_benchmark_table.py excess_positions.json -o src/docs/excess_positions_benchmark_results.md
Generated benchmark documentation can be written to src/docs/benchmark_results.md; the documentation pipeline does not run benchmarks.
RmM Tree
./build/release/bench_rmm
For focused runs, bench_rmm accepts --ops with a comma-separated operation list. The benchmark harness only builds the query pools needed by the selected operations, so subset runs avoid much of the setup cost:
./build/release/bench_rmm --ops=rank1,select1 --benchmark_out=rmm_rank_select.json --benchmark_out_format=json
By default, RmM benchmarks step through sizes by powers of two. Use --per_octave=<n> for finer sampling between adjacent powers of two, or --explicit_sizes=<csv> for an exact size list.
Google Benchmark filters are also used to limit RmM setup when --ops is not provided:
./build/release/bench_rmm --benchmark_filter='^rank1$' --benchmark_out=rmm_rank1.json --benchmark_out_format=json
For comparison with range min-max tree implementation from sdsl-lite, use the third-party benchmark preset. This defines SDSL_SUPPORT and builds bench_rmm_sdsl:
cmake --preset benchmarks-third-party
cmake --build --preset benchmarks-third-party
sudo cpupower frequency-set --governor performance
./build/release-third-party/bench_rmm_sdsl --benchmark_out=rmm_bench_sdsl.json
For visualization, write the JSON output to a file using --benchmark_out=<file> (e.g. ./build/release/bench_rmm --benchmark_out=rmm_bench.json) and plot it with scripts/plot_rmm.py (add --sdsl-json rmm_bench_sdsl.json for per-operation sdsl-lite comparison plots). For size-scaled tree plots, use:
python3 scripts/plot_size_benchmarks.py rmm_bench.json -o graphs/rmm_size.png --size-key N
Example Usage
#include <pixie/bitvector.h>
#include <vector>
#include <iostream>
using namespace pixie;
int main() {
std::vector<uint64_t> bits = {0b101101};
std::cout << "bv: " << bv.to_string() << "\n";
std::cout << "rank(4): " << bv.rank(4) << "\n";
std::cout << "select(2): " << bv.select(2) << "\n";
}
Non-interleaved, non-owning bit vector with rank and select.
Definition bitvector.h:63
#include <pixie/rmm_tree.h>
#include <cstdint>
#include <iostream>
#include <string>
#include <vector>
using namespace pixie;
int main() {
std::string bits = "11101001011000";
std::vector<std::uint64_t> words((bits.size() + 63) / 64);
for (std::size_t i = 0; i < bits.size(); ++i) {
if (bits[i] == '1') {
words[i / 64] |= std::uint64_t{1} << (i % 64);
}
}
std::cout << "close(1): " << t.close(1) << "\n";
std::cout << "open(3): " << t.open(3) << "\n";
std::cout << "enclose(1): " << t.enclose(1) << "\n";
}
Range min–max tree over a bitvector (LSB-first) tailored for balanced-parentheses (BP).
Definition rmm_tree.h:38
References
- [1] Laws et al., SPIDER: Improved Succinct Rank and Select Performance SPIDER
- [2] Kurpicz, Engineering compact data structures for rank and select queries on bit vectors pasta-toolbox/bit_vector