Contributing documentation

The documentation for datatable project is written entirely in the ReStructured Text (RST) format and rendered using the Sphinx engine. These technologies are standard for Python.

The source RST files for the documentation live in the docs/ directory of the project. With every pull request they are sent to host, where they are automatically compiled into HTML and made available to the public. You can find the link to this new build in the list of “checks” at the bottom of every PR.

The basic workflow for developing documentation is the following:

  1. Set up a local datatable repository;

  2. Go into the docs/ directory and create new or change existing documentation files;

  3. Run in a terminal

    $ make html
  4. If there were no errors, the documentation can be viewed locally by opening the file docs/_build/html/index.html in a browser.

The make html command needs to be re-run after every change you make. Occasionally you may also need to make clean if something doesn’t seem to work properly.

Basic formatting

At the most basic level, RST document is a plain text, where paragraphs are separated with empty lines:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur.

The line breaks within each paragraph are ignored; on the rendered page the lines will be as wide as is necessary to fill the page. With that in mind, we ask to avoid lines that are longer than 80 characters in width, if possible. This makes it much easier to work with code on small screen devices.

Page headings are a simple line of underlined text:

Heading Level 1 =============== Heading Level 2 --------------- Heading Level 3 ~~~~~~~~~~~~~~~

Each document must have exactly one level-1 heading; otherwise the page would not render properly.

Basic bold text, italic text and literal text is written as follows. (Note that literals use double backticks, which is a frequent cause of formatting errors.):

**bold text** *italic text* ``literal text``

Bulleted and ordered lists are done similarly to Markdown:

- list item 1; - list item 2; - a longer list item, that might need to be be carried over to the next line. 1. ordered list item 1 2. ordered list item 2 This is the next paragraph of list item 2.

The content of each list item can be arbitrarily complex, as long as it is properly indented.

Code blocks

There are two main ways to format a block of code. The simplest way is to finish a paragraph with a double-colon :: and then start the next paragraph (code) indented with 4 spaces:

Here is a code example:: >>> print("Hello, world!", flush=True)

In this case the code will be highlighted assuming it is a python sample. If the code corresponds to some other language, you’ll need to use an explicit code-block directive:

.. code-block:: shell $ pip install datatable

This directive allows you to explicitly select the language of your code snippet, which will affect how it is highlighted. The code inside code-block must be indented, and there has to be an empty line between the .. code-block:: declaration and the actual code.

When writing python code examples, the best practice is to use python console format, i.e. prepend all input lines with >>> (or ... for continuation lines), and keep all output lines without a prefix. When documenting an error, remove all traceback information and leave only the error message:

>>> import datatable as dt >>> DT = dt.Frame(A=[5], B=[17], D=['zelo']) >>> DT | A B D | int32 int32 str32 -- + ----- ----- ----- 0 | 5 17 zelo [1 row x 3 columns] >>> DT.hello() AttributeError: 'datatable.Frame' object has no attribute 'hello'

This code snippet will be rendered as follows:

import datatable as dt DT = dt.Frame(A=[5], B=[17], D=['zelo']) DT
AttributeError: 'datatable.Frame' object has no attribute 'hello'

Advanced directives

All rst documents are arranged into a tree. All non-leaf nodes of this tree must include a .. toctree:: directive, which may also be declared hidden:

.. toctree:: :hidden: child_doc_1 Explicit name <child_doc_2>

The .. image:: directive can be used to insert an image, which may also be a link:

.. image:: <image URL> :target: <target URL if the image is a link>

In order to note that some functionality was added or changed in a specific version, use:

.. x-version-added:: 0.10.0 .. x-version-deprecated:: 1.0.0 .. x-version-changed:: 0.11.0 Here's what changed: blah-blah-blah

The .. seealso:: directive adds a Wikipedia-style “see also:” entry at the beginning of a section. The argument of this directive should contain a link to the content that you want the user to see. This directive is best to include immeditately after a heading:

.. seealso:: :ref:`columnsets`

Directive .. x-comparison-table:: allows to create a two-column table specifically designed for comparing two entities across multiple comparison points. It is primarily used to create the “compare datatable with another library” manual pages. The content of this directive is comprised of multiple “sections” separated with ====, and each section has 2 or 3 parts (separated with ----): an optional common header, then the content of the first column, and then the second:

.. x-comparison-table:: :header1: datatable :header2: another-library Section 1 header ---- Column 1 ---- Column 2 ==== Section 2 header ---- Column 1 ---- Column 2

Changelog support

RST is language that supports extensions. One of the custom extensions that we use supports maintaining a changelog. First, the .. changelog:: directive which is used in releases/vN.N.N.rst files declares that each of those files describes a particular release of datatable. The format is as follows:

.. changelog:: :version: <version number> :released: <release date> :wheels: URL1 URL2 etc. changelog content... .. contributors:: N @username <full name> -- N @username <full name>

The effect of this declaration is the following:

  • The title of the page is automatically inserted, together with an anchor that can be used to refer to this page;

  • A Wikipedia-style infobox is added on the right side of the page. This infobox contains the release date, links to the previous/next release, and the links to all wheels that where released at that version. The wheels are grouped by the python version / operating system. An sdist link may also be included as one of the “wheels”.

  • Within the .. changelog:: directive, a special form of list items is supported:

    -[new] New feature that was added -[enh] Improvement of an existing feature or function -[fix] Bug fix -[api] API change

    In addition, if any such item ends with the text of the form [#333], then this will be automatically converted into a link to a github issue/PR with that number.

  • The .. contributors:: directive can only be used inside a changelog, and it should list the contributors who participated in creation of this particular release. The list of contributors is prepared using the script ci/

Documenting API

When it comes to documenting specific functions/classes/methods of the datatable module, we use another extension: .. xfunction:: (or .. xclass::, .. xmethod::, etc). This is because this part of the documentation is declared within the C++ code, so that it can be available from within a regular python session.

Inside the documentation tree, each function/method/etc that has to be documented is declared as follows:

.. xfunction:: datatable.rbind :src: src/core/frame/ py_rbind :doc: src/core/frame/ doc_py_rbind :tests: tests/munging/

Here we declare the function dt.rbind(), whose source code is located in file src/core/frame/ in function py_rbind(). The docstring of this function is located in the same file in a variable static const char* doc_py_rbind. The content of the latter variable will be pre-processed and then rendered as RST. The :doc: parameter is optional, if omitted the directive will attempt to find the docstring automatically.

The optional :tests: parameter should point to a file where the tests for this function are located. This will be included as a link in the rendered output.

In order to document a getter/setter property of a class, use the following:

.. xdata:: datatable.Frame.key :src: src/core/frame/ Frame::get_key Frame::set_key :doc: src/core/frame/ doc_key :tests: tests/ :settable: new_key :deletable:

The :src: parameter can now accept two function names: the getter and the setter. In addition, the :settable: parameter will have the name of the setter value as it will be displayed in the docs. Lastly, :deletable: marks this class property as deletable.

The docstring of the function/method/etc is preprocessed before it is rendered into the RST document. This processing includes the following steps:

  • The “Parameters” section is parsed and the definitions of all function parameters are extracted.

  • The contents of the “Examples” section are parsed as if it was a literal block, converting from python-console format into the format jupyter-style code blocks. In addition, if the output of any command contains a datatable Frame, it will also be converted into a Jupyter-style table.

  • All other sections are displayed as-is.

Here’s an example of a docstring:

static const char* doc_rbind = R"(rbind(self, *frames, force=False, bynames=True) -- Append rows of `frames` to the current frame. This method modifies the current frame in-place. If you do not want the current frame modified, then use the :func:`dt.rbind()` function. Parameters ---------- frames: Frame | List[Frame] One or more frames to append. force: bool If True, then the frames are allowed to have mismatching set of columns. Any gaps in the data will be filled with NAs. bynames: bool If True (default), the columns in frames are matched by their names, otherwise by their order. Examples -------- >>> DT = dt.Frame(A=[1, 2, 3], B=[4, 7, 0]) >>> frame1 = dt.Frame(A=[-1], B=[None]) >>> DT.rbind(frame1) >>> DT | A B -- + -- -- 0 | 1 4 1 | 2 7 2 | 3 0 3 | -1 NA -- [4 rows x 2 columns] )";