We base our execution engine on CWI’s latest analytic database project DuckDB.

Setup local repo

The very first step has been to fork the DuckDB github repository through the Github web interface.

Next, we need to setup our local fork to be able to follow updates from the main repository:

git clone git@github.com:informagi/duckdb.git
cd duckdb
git remote add upstream git@github.com:cwida/duckdb.git

Installation

Preliminaries

Install clang as the C/C++ compiler:

sudo dnf install clang

Ensure clang is used by default:

export CC=/usr/bin/clang
export CXX=/usr/bin/clang++

Install sqlite-devel and libasan:

sudo dnf install sqlite-devel
sudo dnf install libasan

Install the Address Sanitizer library (for using debug flags):

sudo dnf install libasan

Build from source

cd duckdb
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Debug ..
make

The cmake process has many options - right now, I build the python, Parquet and JSON modules, by adding the following flags:

cmake .. -DBUILD_JSON_EXTENSION=1 -DBUILD_PARQUET_EXTENSION=1 -DBUILD_PYTHON=1 \
           -DCMAKE_INSTALL_PREFIX=${DUCKDB_PREFIX} -DUSER_SPACE=1

Note: need -DEXTENSION_STATIC_BUILD=1 too? Not needed for using CLI it seems, but perhaps necessary when using extensions from Python. Unclear.

Build from source into non-standard directory

Setup:

DUCKDB_PREFIX=/export/data/ir/local
PYTHONPATH=${PYTHONPATH:+${PYTHONPATH}:}$(pip3 show six | \
  grep "Location:" | cut -d " " -f2 | \
  sed -e "s|/usr|${DUCKDB_PREFIX}|")

export DUCKDB_PREFIX PYTHONPATH

Compilation DuckDB library:

cd duckdb
mkdir build
cd build
cmake .. -DBUILD_JSON_EXTENSION=1 -DBUILD_PARQUET_EXTENSION=1 -DBUILD_PYTHON=1 \
           -DCMAKE_INSTALL_PREFIX=${DUCKDB_PREFIX} -DUSER_SPACE=1
make all install

In older version, it took more manual steps to install the Python package; it seems not necessary any longer. Just in case we need this later:

cd ../tools/pythonpkg
export DUCKDB_VERSION=$(python setup.py --version)
sed -e "s/0.0.0.unknown/${DUCKDB_VERSION}.local/g" -i.orig setup.py
cd ../..

mkdir -p $DUCKDB_PREFIX/src/duckdb-pythonpkg
cp -R tools/pythonpkg $DUCKDB_PREFIX/src/duckdb-pythonpkg
pip3 install --prefix $DUCKDB_PREFIX -e $DUCKDB_PREFIX/src/duckdb-pythonpkg

TODO: uninstall?

Contributing your changes

Synching if you did not make any changes:

git fetch upstream
git pull upstream master

Synching while merging your own changes:

git fetch upstream
git checkout master
git merge upstream/master

See also the Github docs on synching a fork. To actually contribute, push your changes and submit a pull request across forks through the Github interface.

Acknowledgements

Funding from the NWO SQIREL-GRAPHS project, CWI, and Radboud’s iCIS institute.