Installation¶
This page describes how to install datatable
on various systems.
Prerequisites¶
Python 3.5+ is required, although we recommend Python 3.6 or newer for best results. You can check your python version via
In addition, we recommend using pip
version 20.0+, especially if you’re
planning to install datatable from source, or if you are on a Unix machine.
There are no other prerequisites. Datatable does not depend on any other python module 1, nor on any non-standard system library.
Basic installation¶
On most platforms datatable
can be installed directly from PyPI using
pip
:
The following platforms are supported:
macOS
Datatable has been tested to work on macOS 10.12.5 (Sierra), macoS 10.13.6 (High Sierra), and macOS 10.15.2 (Catalina).
Linux x86_64 / ppc64le
We produce binary wheels that are tagged as
manylinux2010
(forx86_64
architecture) andmanylinux2014
(for ppc64le). Consequently, they will work with your Linux distribution if it is compatible with one of these tags. Please refer to PEP-513 and PEP-599 for details.Windows
Windows wheels are available for Windows 10 or later.
Install latest dev version¶
If you wish to test the latest version of datatable
before it has been
officially released, then you can use one of the binary wheels that we build
as part of our Continuous Integration process.
If you are on Windows, then pre-built wheels are available on AppVeyor. Click on a green master build of your choice, then navigate to the “Artifacts” tab, copy the wheel URL that corresponds to your Python version, and finally install it as:
For macOS and Linux, development wheels can be found at our S3 repository.
Scroll to the bottom of the page to find the latest links, and then download
or copy the URL of a wheel that corresponds to your Python version and
platform. This wheel can be installed with pip
as usual:
Alternatively, you can instruct pip
to go to that repository directly
and find the latest version automatically:
Build from source¶
In order to build and install the latest development version of datatable
directly from GitHub, run the following command:
Since datatable
is written mostly in C++, your computer must be set up for
compiling C++ code. The build script will attempt to find the compiler
automatically, searching for GCC, Clang, or MSVC on Windows. If it fails, or
if you want to use some other compiler, then set environment variable CXX
before building the code.
Datatable uses C++14 language standard, which means you must use the compiler that fully implements this standard. The following compiler versions are known to work:
Clang 5+;
GCC 6+;
MSVC 19.14+.
Install datatable in editable mode¶
If you want to tweak certain features of datatable
, or even add your
own functionality, you are welcome to do so. This section describes how
to install datatable for development process.
First, you need to fork the repository and then clone it locally:
$ git clone https://github.com/your_user_name/datatable $ cd datatableBuild
_datatable
core library. The two most common options are:$ # build a "production mode" datatable $ make build $ # build datatable in "debug" mode, without optimizations and with $ # internal asserts enabled $ make debugNote that you would need to have a C++ compiler in order to compile and link the code. Please refer to the previous section for compiler requirements.
On macOS you may also need to install Xcode Command Line Tools.
On Linux if you see an error that
'Python.h' file not found
, then it means you need to install a “development” version of Python, i.e. the one that has python header files included.After the previous step succeeds, you will have a
_datatable.*.so
file in thesrc/datatable/lib
folder. Now, in order to makedatatable
usable from Python, run$ echo "`pwd`/src" >> ${VIRTUAL_ENV}/lib/python*/site-packages/easy-install.pth(This assumes that you are using a virtualenv-based python. If not, then you’ll need to adjust the path to your python’s
site-packages
directory).Install additional libraries that are needed to test datatable:
$ pip install -r requirements_tests.txt $ pip install -r requirements_extra.txt $ pip install -r requirements_docs.txtCheck that everything works correctly by running the test suite:
$ make test
Once these steps are completed, subsequent development process is much simpler.
After any change to C++ files, re-run make build
(or make debug
) and
then restart python for the changes to take effect.
Datatable only recompiles those files that were modified since the last time,
which means that usually the compile step takes only few seconds. Also note
that you can switch between the “build” and “debug” versions of the library
without performing make clean
.
Troubleshooting¶
Despite our best effort to keep the installation process hassle-free, sometimes
problems may still arise. Here we list some of the more frequent ones, where we
know how to resolve them. If none of these help you, please ask a question on
StackOverflow (tagging with [py-datatable]
), or file an issue on
GitHub.
ImportError: cannot import name '_datatable'
This means the internal core library
_datatable.*.so
is either missing entirely, or is in a wrong location, or have wrong name. The first step is therefore to find where that file actually is. Use the systemfind
tool, limiting the search to your python directory.If the file is missing entirely, then it was either deleted, or installation used a broken wheel file. In either case, the only solution is to rebuild or reinstall the library completely.
If the file is present but not within the
site-packages/datatable/lib/
directory, then moving it there should solve the issue.If the file is present and is in the correct directory, then there must be a name conflict. In python run:
>>> import sysconfig >>> sysconfig.get_config_var("SOABI") 'cpython-36m-ppc64le-linux-gnu'
The reported suffix should match the suffix of the
_datatable.*.so
file. If it doesn’t then renaming the file will fix the problem.Python.h: no such file or directory
when compiling from sourceYour Python distribution was shipped without the
Python.h
header file. This have been observed on certain Linux machines. You would need to install a Python package with a-dev
suffix, for examplepython3.6-dev
.fatal error: 'sys/mman.h' file not found
on macOSIn order to compile from source on mac computers, you need to have Xcode Command Line Tools installed. Run
$ xcode-select --installImportError: This package should not be accessible
The most likely cause of this error is a misconfigured
PYTHONPATH
environment variable. Unset that variable and try again.
Footnotes
- 1
Since version v0.11.0