specially chapters 3 (in-memory containers) and 4 (on-disk containers).
What it is
*bcolz* provides **columnar and compressed** data containers that can
live either on-disk or in-memory. The compression is carried out
transparently by Blosc, an ultra fast meta-compressor that is optimized
for binary data. Compression is active by default.
Column storage allows for efficiently querying tables with a large
number of columns. It also allows for cheap addition and removal of
columns. Lastly, high-performance iterators (like ``iter()``,
``where()``) for querying the objects are provided.
bcolz can use diffent backends internally (currently numexpr,
Python/NumPy or dask) so as to accelerate many vector and query
operations (although it can use pure NumPy for doing so too). Moreover,
since the carray/ctable containers can be disk-based, it is possible to
use them for seamlessly performing out-of-memory computations.
While NumPy is used as the standard way to feed and retrieve data from
bcolz internal containers, but it also comes with support for
high-performance import/export facilities to/from `HDF5/PyTables tables
<http://www.pytables.org>`_ and `pandas dataframes
Have a look at how bcolz and the Blosc compressor, are making a better
use of the memory without an important overhead, at least for some real
bcolz has minimal dependencies (NumPy is the only strict requisite),
comes with an exhaustive test suite, and it is meant to be used in
production. Example users of bcolz are Visualfabriq
(http://www.visualfabriq.com/), Quantopian (https://www.quantopian.com/)