datatable.models.aggregate()¶
Aggregate a frame into clusters. Each cluster consists of a set of members, i.e. a subset of the input frame, and is represented by an exemplar, i.e. one of the members.
For one- and two-column frames the aggregation is based on the standard equal-interval binning for numeric columns, and grouping for string columns.
When the input frame has more columns than two, a parallel one-pass Ad-Hoc algorithm is employed, see description of Aggregator<T>::group_nd() method for more details. This algorithm takes into account the numeric columns only, and all the string columns are ignored.
Parameters¶
Frame
The input frame containing numeric or string columns.
int
Number of bins for 1D aggregation.
int
Number of bins for the first column for 2D aggregation.
int
Number of bins for the second column for 2D aggregation.
int
Maximum number of exemplars for ND aggregation. It is guaranteed
that the ND algorithm will return less than nd_max_bins
exemplars,
but the exact number may vary from run to run due to parallelization.
int
Number of columns at which the projection method is used for ND aggregation.
int
Seed to be used for the projection method.
bool
An option to indicate whether double precision, i.e. float64
,
or single precision, i.e. float32
, arithmetic should be used
for computations.
float
Fixed radius for ND aggregation, use it with caution.
If set, nd_max_bins
will have no effect and in the worst
case number of exemplars may be equal to the number of rows
in the data. For big data this may result in extremly large
execution times. Since all the columns are normalized to [0, 1)
,
the fixed_radius
value should be choosen accordingly.
Tuple[Frame, Frame]
The first element in the tuple is the aggregated frame, i.e.
the frame containing exemplars, with the shape of
(nexemplars, frame.ncols + 1)
, where nexemplars
is
the number of gathered exemplars. The first frame.ncols
columns
are the columns from the input frame, and the last column
is the members_count
that has stype int32
containing
number of members per exemplar.
The second element in the tuple is the members frame with the shape of
(frame.nrows, 1)
, each row in this frame corresponds to the
row with the same id in the input frame
. The only column exemplar_id
has an stype of int32
and contains the exemplar ids a particular
member belongs to. These ids are effectively the ids of
the exemplar’s rows from the input frame.
ValueError
The exception is raised if the input frame is missing.
TypeError
The exception is raised when one of the frame
’s columns has an
unsupported stype, i.e. the column is both non-numeric and non-string.