Contributing documentation

The documentation for datatable project is written entirely in the ReStructured Text (RST) format and rendered using the Sphinx engine. These technologies are standard for Python.

The basic workflow for developing documentation, after setting up a local datatable repository, is to go into the docs/ directory and run

$ make html

After that, if there were no errors, the documentation can be viewed locally by opening the file docs/_build/html/index.html in a browser.

The make html command needs to be re-run after every change you make. Occasionally you may also need to make clean if something doesn’t seem to work properly.

Basic formatting

At the most basic level, RST document is a plain text, where paragraphs are separated with empty lines:

Lorem ipsum dolor sit amet, consectetur adipiscing elit, sed do eiusmod
tempor incididunt ut labore et dolore magna aliqua.

Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut
aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in
voluptate velit esse cillum dolore eu fugiat nulla pariatur.

The line breaks within each paragraph are ignored; on the rendered page the lines will be as wide as is necessary to fill the page. With that in mind, we ask to avoid lines that are longer than 80 characters in width, if possible. This makes it much easier to work with code on small screen devices.

Page headings are a simple line of underlined text:

Heading Level 1
===============

Heading Level 2
---------------

Heading Level 3
~~~~~~~~~~~~~~~

Each document must have exactly one level-1 heading; otherwise the page would not render properly.

Basic bold text, italic text and literal text is written as follows. (Note that literals use double backticks, which is a frequent cause of formatting errors.):

**bold text**
*italic text*
``literal text``

Bulleted and ordered lists are done similarly to Markdown:

- list item 1;
- list item 2;
- a longer list item, that might need to be
  be carried over to the next line.

1. ordered list item 1

2. ordered list item 2

   This is the next paragraph of list item 2.

The content of each list item can be arbitrarily complex, as long as it is properly indented.

Code blocks

There are two main ways to format a block of code. The simplest way is to finish the previous paragraph with a double-colon :: and then start the next paragraph (code) indented with 4 spaces:

Here is a code example::

    print("Hello, world!", flush=True)

A slightly more advanced method is to use an explicit xcode directive:

.. xcode:: shell

    $ pip install datatable

This directive allows you to explicitly select the language of your code snippet, which will affect how it is highlighted. The code inside xcode must be indented, and there has to be an empty line between the .. xcode:: declaration and the actual code.

Advanced directives

All rst documents are arranged into a tree. All non-leaf nodes of this tree must include a .. toctree:: directive, which may also be declared hidden:

.. toctree::
    :hidden:

    child_doc_1
    Explicit name <child_doc_2>

The .. image:: directive can be used to insert an image, which may also be a link:

.. image:: <image URL>
    :target: <target URL if the image is a link>

In order to note that some functionality was added or changed in a specific version, use:

.. xversionadded:: 0.10.0

.. versionchanged:: 0.11.0

The .. seealso:: directive adds a Wikipedia-style “see also:” entry at the beginning of a section. The argument of this directive should contain a link to the content that you want the user to see. This directive is best to include immeditately after a heading:

.. seealso:: :ref:`columnsets`

Changelog support

RST is language that supports extensions. One of the custom extensions that we use supports maintaining a changelog. First, the .. changelog:: directive which is used in releases/vN.N.N.rst files declares that each of those files describes a particular release of datatable. The format is as follows:

.. changelog::
    :version: <version number>
    :released: <release date>
    :wheels: URL1
             URL2
             ...

    changelog content...

    .. contributors::

        N @username <full name>
        ...
        --
        N @username <full name>
        ...

The effect of this declaration is the following:

  • The title of the page is automatically inserted, together with an anchor that can be used to refer to this page;

  • A Wikipedia-style infobox is added on the right side of the page. This infobox contains the release date, links to the previous/next release, and the links to all wheels that where released at that version. The wheels are grouped by the python version / operating system. An sdist link may also be included as one of the “wheels”.

  • Within the .. changelog:: directive, a special form of list items is supported:

    -[new] New feature that was added
    
    -[enh] Improvement of an existing feature or function
    
    -[fix] Bug fix
    
    -[api] API change
    

    In addition, if any such item ends with the text of the form [#333], then this will be automatically converted into a link to a github issue/PR with that number.

  • The .. contributors:: directive can only be used inside a changelog, and it should list the contributors who participated in creation of this particular release. The list of contributors is prepared using the script ci/gh.py

Documenting API

When it comes to documenting specific functions/classes/methods of the datatable module, we use another extension: .. xfunction:: (or .. xclass::, .. xmethod::, etc). This is because this part of the documentation is declared within the C++ code, so that it can be available from within a regular python session.

Inside the documentation tree, each function/method/etc that has to be documented is declared as follows:

.. xfunction:: datatable.rbind
    :src: src/core/frame/rbind.cc py_rbind
    :doc: src/core/frame/rbind.cc doc_py_rbind
    :tests: tests/munging/test-rbind.py

Here we declare the function datatable.rbind(), whose source code is located in file src/core/frame/rbind.cc in function py_rbind(). The docstring of this function is located in the same file in a variable static const char* doc_py_rbind. The content of the latter variable will be pre-processed and then rendered as RST. The :doc: parameter is optional, if omitted the directive will attempt to find the docstring automatically.

The optional :tests: parameter should point to a file where the tests for this function are located. This will be included as a link in the rendered output.

In order to document a getter/setter property of a class, use the following:

.. xdata:: datatable.Frame.key
    :src: src/core/frame/key.cc Frame::get_key Frame::set_key
    :doc: src/core/frame/key.cc doc_key
    :tests: tests/test-keys.py
    :settable: newkey
    :deletable:

The :src: parameter can now accept two function names: the getter and the setter. In addition, the :settable: parameter will have the name of the setter value as it will be displayed in the docs. Lastly, :deletable: marks this class property as deletable.

The docstring of the function/method/etc is preprocessed before it is rendered into the RST document. This processing includes the following steps:

  • The “Parameters” section is parsed and the definitions of all function parameters are extracted.

  • The contents of the “Examples” section are parsed as if it was a literal block, converting from python-console format into the format jupyter-style code blocks. In addition, if the output of any command contains a datatable Frame, it will also be converted into a Jupyter-style table.

  • All other sections are displayed as-is.

Here’s an example of a docstring:

static const char* doc_rbind =
R"(rbind(self, *frames, force=False, bynames=True)
--

Append rows of `frames` to the current frame.

This method modifies the current frame in-place. If you do not want
the current frame modified, then use the :func:`dt.rbind()` function.

Parameters
----------
frames: Frame | List[Frame]
    One or more frames to append.

force: bool
    If True, then the frames are allowed to have mismatching set of
    columns. Any gaps in the data will be filled with NAs.

bynames: bool
    If True (default), the columns in frames are matched by their
    names, otherwise by their order.

Examples
--------
>>> DT = dt.Frame(A=[1, 2, 3], B=[4, 7, 0])
>>> frame1 = dt.Frame(A=[-1], B=[None])
>>> DT.rbind(frame1)
>>> DT
   |  A   B
-- + --  --
 0 |  1   4
 1 |  2   7
 2 |  3   0
 3 | -1  NA
--
[4 rows x 2 columns]
)";