datatable.fillna()¶
For each column from cols
either fill the missing values with value
,
or with the previous/subsequent non-missing values in that column.
In the presence of by()
the filling is performed group-wise.
Parameters¶
FExpr
Input columns.
None
| bool
| int
| float
| str
| list
| tuple
| dict
| FExpr
A scalar, a list/tuple/dict of values or an f-expression to impute missing values with. The number of elements in this argument, unless it is a scalar, must match the number of columns in the input data.
bool
If False
, the missing values are filled by using the closest
previous non-missing values as a replacement. if True
,
the closest subsequent non-missing values are used.
FExpr
f-expression that converts input columns into the columns filled
with value
, or with the previous/subsequent non-missing values.
Examples¶
Create a sample datatable frame:
from datatable import dt, f, by
DT = dt.Frame({'building': ['a', 'a', 'b', 'b', 'a', 'a', 'b', 'b'],
'var1': [1.5, None, 2.1, 2.2, 1.2, 1.3, 2.4, None],
'var2': [100, 110, 105, None, 102, None, 103, 107],
'var3': [10, 11, None, None, None, None, None, None],
'var4': [1, 2, 3, 4, 5, 6, 7, 8]})
building | var1 | var2 | var3 | var4 | ||
---|---|---|---|---|---|---|
str32 | float64 | int32 | int32 | int32 | ||
0 | a | 1.5 | 100 | 10 | 1 | |
1 | a | NA | 110 | 11 | 2 | |
2 | b | 2.1 | 105 | NA | 3 | |
3 | b | 2.2 | NA | NA | 4 | |
4 | a | 1.2 | 102 | NA | 5 | |
5 | a | 1.3 | NA | NA | 6 | |
6 | b | 2.4 | 103 | NA | 7 | |
7 | b | NA | 107 | NA | 8 |
Fill all the missing values in a column with a single value:
DT[:, dt.fillna(f.var1, 2)]
var1 | ||
---|---|---|
float64 | ||
0 | 1.5 | |
1 | 2 | |
2 | 2.1 | |
3 | 2.2 | |
4 | 1.2 | |
5 | 1.3 | |
6 | 2.4 | |
7 | 2 |
Fill all the missing values in multiple columns with a single value:
DT[:, dt.fillna(f[1:], 2)]
var1 | var2 | var3 | var4 | ||
---|---|---|---|---|---|
float64 | int32 | int32 | int32 | ||
0 | 1.5 | 100 | 10 | 1 | |
1 | 2 | 110 | 11 | 2 | |
2 | 2.1 | 105 | 2 | 3 | |
3 | 2.2 | 2 | 2 | 4 | |
4 | 1.2 | 102 | 2 | 5 | |
5 | 1.3 | 2 | 2 | 6 | |
6 | 2.4 | 103 | 2 | 7 | |
7 | 2 | 107 | 2 | 8 |
For the grouped frame, fill missing values with the group’s mean:
DT[:, dt.fillna(f[:], dt.mean(f[:])), by('building')]
building | var1 | var2 | var3 | var4 | ||
---|---|---|---|---|---|---|
str32 | float64 | float64 | float64 | float64 | ||
0 | a | 1.5 | 100 | 10 | 1 | |
1 | a | 1.33333 | 110 | 11 | 2 | |
2 | a | 1.2 | 102 | 10.5 | 5 | |
3 | a | 1.3 | 104 | 10.5 | 6 | |
4 | b | 2.1 | 105 | NA | 3 | |
5 | b | 2.2 | 105 | NA | 4 | |
6 | b | 2.4 | 103 | NA | 7 | |
7 | b | 2.23333 | 107 | NA | 8 |
Fill down on a single column:
DT[:, dt.fillna(f.var1)]
var1 | ||
---|---|---|
float64 | ||
0 | 1.5 | |
1 | 1.5 | |
2 | 2.1 | |
3 | 2.2 | |
4 | 1.2 | |
5 | 1.3 | |
6 | 2.4 | |
7 | 2.4 |
Fill up on a single column:
DT[:, dt.fillna(f.var1, reverse = True)]
var1 | ||
---|---|---|
float64 | ||
0 | 1.5 | |
1 | 2.1 | |
2 | 2.1 | |
3 | 2.2 | |
4 | 1.2 | |
5 | 1.3 | |
6 | 2.4 | |
7 | NA |
Fill down on multiple columns:
DT[:, dt.fillna(f['var1':])]
var1 | var2 | var3 | var4 | ||
---|---|---|---|---|---|
float64 | int32 | int32 | int32 | ||
0 | 1.5 | 100 | 10 | 1 | |
1 | 1.5 | 110 | 11 | 2 | |
2 | 2.1 | 105 | 11 | 3 | |
3 | 2.2 | 105 | 11 | 4 | |
4 | 1.2 | 102 | 11 | 5 | |
5 | 1.3 | 102 | 11 | 6 | |
6 | 2.4 | 103 | 11 | 7 | |
7 | 2.4 | 107 | 11 | 8 |
Fill up on multiple columns:
DT[:, dt.fillna(f['var1':], reverse = True)]
var1 | var2 | var3 | var4 | ||
---|---|---|---|---|---|
float64 | int32 | int32 | int32 | ||
0 | 1.5 | 100 | 10 | 1 | |
1 | 2.1 | 110 | 11 | 2 | |
2 | 2.1 | 105 | NA | 3 | |
3 | 2.2 | 102 | NA | 4 | |
4 | 1.2 | 102 | NA | 5 | |
5 | 1.3 | 103 | NA | 6 | |
6 | 2.4 | 103 | NA | 7 | |
7 | NA | 107 | NA | 8 |
Fill down the grouped frame:
DT[:, dt.fillna(f['var1':]), by('building')]
building | var1 | var2 | var3 | var4 | ||
---|---|---|---|---|---|---|
str32 | float64 | int32 | int32 | int32 | ||
0 | a | 1.5 | 100 | 10 | 1 | |
1 | a | 1.5 | 110 | 11 | 2 | |
2 | a | 1.2 | 102 | 11 | 5 | |
3 | a | 1.3 | 102 | 11 | 6 | |
4 | b | 2.1 | 105 | NA | 3 | |
5 | b | 2.2 | 105 | NA | 4 | |
6 | b | 2.4 | 103 | NA | 7 | |
7 | b | 2.4 | 107 | NA | 8 |
Fill up the grouped frame:
DT[:, dt.fillna(f['var1':], reverse = True), by('building')]
building | var1 | var2 | var3 | var4 | ||
---|---|---|---|---|---|---|
str32 | float64 | int32 | int32 | int32 | ||
0 | a | 1.5 | 100 | 10 | 1 | |
1 | a | 1.2 | 110 | 11 | 2 | |
2 | a | 1.2 | 102 | NA | 5 | |
3 | a | 1.3 | NA | NA | 6 | |
4 | b | 2.1 | 105 | NA | 3 | |
5 | b | 2.2 | 103 | NA | 4 | |
6 | b | 2.4 | 103 | NA | 7 | |
7 | b | NA | 107 | NA | 8 |