datatable.fillna()

Added in version 1.1.0

For each column from cols either fill the missing values with value, or with the previous/subsequent non-missing values in that column. In the presence of by() the filling is performed group-wise.

Parameters

cols
FExpr

Input columns.

value
None | bool | int | float | str | list | tuple | dict | FExpr

A scalar, a list/tuple/dict of values or an f-expression to impute missing values with. The number of elements in this argument, unless it is a scalar, must match the number of columns in the input data.

reverse
bool

If False, the missing values are filled by using the closest previous non-missing values as a replacement. if True, the closest subsequent non-missing values are used.

return
FExpr

f-expression that converts input columns into the columns filled with value, or with the previous/subsequent non-missing values.

except
ValueError

The exception is raised when the number of elements in value, unless it is a scalar, does not match the number of columns in cols.

Examples

Create a sample datatable frame:

from datatable import dt, f, by DT = dt.Frame({'building': ['a', 'a', 'b', 'b', 'a', 'a', 'b', 'b'], 'var1': [1.5, None, 2.1, 2.2, 1.2, 1.3, 2.4, None], 'var2': [100, 110, 105, None, 102, None, 103, 107], 'var3': [10, 11, None, None, None, None, None, None], 'var4': [1, 2, 3, 4, 5, 6, 7, 8]})
buildingvar1var2var3var4
str32float64int32int32int32
0a1.5100101
1aNA110112
2b2.1105NA3
3b2.2NANA4
4a1.2102NA5
5a1.3NANA6
6b2.4103NA7
7bNA107NA8

Fill all the missing values in a column with a single value:

DT[:, dt.fillna(f.var1, 2)]
var1
float64
01.5
12
22.1
32.2
41.2
51.3
62.4
72

Fill all the missing values in multiple columns with a single value:

DT[:, dt.fillna(f[1:], 2)]
var1var2var3var4
float64int32int32int32
01.5100101
12110112
22.110523
32.2224
41.210225
51.3226
62.410327
7210728

For the grouped frame, fill missing values with the group’s mean:

DT[:, dt.fillna(f[:], dt.mean(f[:])), by('building')]
buildingvar1var2var3var4
str32float64float64float64float64
0a1.5100101
1a1.33333110112
2a1.210210.55
3a1.310410.56
4b2.1105NA3
5b2.2105NA4
6b2.4103NA7
7b2.23333107NA8

Fill down on a single column:

DT[:, dt.fillna(f.var1)]
var1
float64
01.5
11.5
22.1
32.2
41.2
51.3
62.4
72.4

Fill up on a single column:

DT[:, dt.fillna(f.var1, reverse = True)]
var1
float64
01.5
12.1
22.1
32.2
41.2
51.3
62.4
7NA

Fill down on multiple columns:

DT[:, dt.fillna(f['var1':])]
var1var2var3var4
float64int32int32int32
01.5100101
11.5110112
22.1105113
32.2105114
41.2102115
51.3102116
62.4103117
72.4107118

Fill up on multiple columns:

DT[:, dt.fillna(f['var1':], reverse = True)]
var1var2var3var4
float64int32int32int32
01.5100101
12.1110112
22.1105NA3
32.2102NA4
41.2102NA5
51.3103NA6
62.4103NA7
7NA107NA8

Fill down the grouped frame:

DT[:, dt.fillna(f['var1':]), by('building')]
buildingvar1var2var3var4
str32float64int32int32int32
0a1.5100101
1a1.5110112
2a1.2102115
3a1.3102116
4b2.1105NA3
5b2.2105NA4
6b2.4103NA7
7b2.4107NA8

Fill up the grouped frame:

DT[:, dt.fillna(f['var1':], reverse = True), by('building')]
buildingvar1var2var3var4
str32float64int32int32int32
0a1.5100101
1a1.2110112
2a1.2102NA5
3a1.3NANA6
4b2.1105NA3
5b2.2103NA4
6b2.4103NA7
7bNA107NA8