Aggregator Class
Defines an aggregation against specified columns identified with join keys.
- Inheritance
-
builtins.objectAggregator
Constructor
Aggregator()
Remarks
Aggregators are typically not instantiated directly. Instead, specify the the type of aggregator when using using an enricher such as the HolidayEnricher object.
Derived aggregators include AggregatorAll, AggregatorAvg, AggregatorMax, AggregatorMin, AggregatorTop.
The process(env, customer_data, public_data, join_keys, debug)
method performs the aggregation.
Methods
get_log_property |
Get log property tuple, None if no property. |
process |
Left join customer_data with public_data on join_keys. Drop all columns in join_keys and all columns which is in the list of to_be_cleaned_up_column_names afterward. |
process_public_dataset |
Perform aggregation on specified public data columns. |
get_log_property
Get log property tuple, None if no property.
get_log_property()
process
Left join customer_data with public_data on join_keys.
Drop all columns in join_keys and all columns which is in the list of to_be_cleaned_up_column_names afterward.
process(env: SparkEnv | PandasEnv, customer_data: CustomerData, public_data: PublicData, join_keys: list, debug: bool)
Parameters
Name | Description |
---|---|
env
Required
|
The runtime environment. |
customer_data
Required
|
The customer data. |
public_data
Required
|
The public data. |
join_keys
Required
|
A list of join key pairs. |
debug
Required
|
Indicates whether to print debug info. |
Returns
Type | Description |
---|---|
A tuple of ( a new instance of class CustomerData, unchanged instance of PublicData, a new joined instance of class CustomerData, join keys (list of tuple)) |
process_public_dataset
Perform aggregation on specified public data columns.
process_public_dataset(env: RuntimeEnv, _public_dataset: object, cols: List[str] | None = None, join_keys: List[Tuple[str, str]] = []) -> object
Parameters
Name | Description |
---|---|
env
Required
|
The runtime environment. |
_public_dataset
Required
|
A public dataset dataframe. |
cols
|
A list of column names to retrieve. Default value: None
|
join_keys
|
A list of join keys to use. Default value: []
|
Returns
Type | Description |
---|---|
A new DataFrame of the public dataset. |
Attributes
should_direct_join
should_direct_join = True