DuckDB Informagi Fork
We base our execution engine on CWI’s latest analytic database project DuckDB.
Setup local repo
The very first step has been to fork the DuckDB github repository through the Github web interface.
Next, we need to setup our local fork to be able to follow updates from the main repository:
git clone git@github.com:informagi/duckdb.git
cd duckdb
git remote add upstream git@github.com:cwida/duckdb.git
Installation
Preliminaries
Install clang
as the C/C++ compiler:
sudo dnf install clang
Ensure clang
is used by default:
export CC=/usr/bin/clang
export CXX=/usr/bin/clang++
Install sqlite-devel
and libasan
:
sudo dnf install sqlite-devel
sudo dnf install libasan
Install the Address Sanitizer library (for using debug flags):
sudo dnf install libasan
Build from source
cd duckdb
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=Debug ..
make
The cmake
process has many options - right now, I build the python, Parquet and JSON modules, by adding the following flags:
cmake .. -DBUILD_JSON_EXTENSION=1 -DBUILD_PARQUET_EXTENSION=1 -DBUILD_PYTHON=1 \
-DCMAKE_INSTALL_PREFIX=${DUCKDB_PREFIX} -DUSER_SPACE=1
Note: need -DEXTENSION_STATIC_BUILD=1
too? Not needed for using CLI it seems, but perhaps necessary when using extensions from Python. Unclear.
Build from source into non-standard directory
Setup:
DUCKDB_PREFIX=/export/data/ir/local
PYTHONPATH=${PYTHONPATH:+${PYTHONPATH}:}$(pip3 show six | \
grep "Location:" | cut -d " " -f2 | \
sed -e "s|/usr|${DUCKDB_PREFIX}|")
export DUCKDB_PREFIX PYTHONPATH
Compilation DuckDB library:
cd duckdb
mkdir build
cd build
cmake .. -DBUILD_JSON_EXTENSION=1 -DBUILD_PARQUET_EXTENSION=1 -DBUILD_PYTHON=1 \
-DCMAKE_INSTALL_PREFIX=${DUCKDB_PREFIX} -DUSER_SPACE=1
make all install
In older version, it took more manual steps to install the Python package; it seems not necessary any longer. Just in case we need this later:
cd ../tools/pythonpkg
export DUCKDB_VERSION=$(python setup.py --version)
sed -e "s/0.0.0.unknown/${DUCKDB_VERSION}.local/g" -i.orig setup.py
cd ../..
mkdir -p $DUCKDB_PREFIX/src/duckdb-pythonpkg
cp -R tools/pythonpkg $DUCKDB_PREFIX/src/duckdb-pythonpkg
pip3 install --prefix $DUCKDB_PREFIX -e $DUCKDB_PREFIX/src/duckdb-pythonpkg
TODO: uninstall?
Contributing your changes
Synching if you did not make any changes:
git fetch upstream
git pull upstream master
Synching while merging your own changes:
git fetch upstream
git checkout master
git merge upstream/master
See also the Github docs on synching a fork. To actually contribute, push your changes and submit a pull request across forks through the Github interface.
Acknowledgements
Funding from the NWO SQIREL-GRAPHS project, CWI, and Radboud’s iCIS institute.