ToKey Class
Converts input values (words, numbers, etc.) to index in a dictionary.
- Inheritance
-
nimbusml.internal.core.preprocessing._tokey.ToKeyToKeynimbusml.base_transform.BaseTransformToKeysklearn.base.TransformerMixinToKey
Constructor
ToKey(max_num_terms=1000000, term=None, sort='ByOccurrence', text_key_values=False, columns=None, **params)
Parameters
- columns
a dictionary of key-value pairs, where key is the output column name and value is the input column name.
Multiple key-value pairs are allowed.
Input column type: numeric or string.
Output column type:
If the output column names are same as the input column names, then
simply specify columns
as a list of strings.
The << operator can be used to set this value (see Column Operator)
For example
ToKey(columns={'out1':'input1', 'out2':'input2'})
ToKey() << {'out1':'input1', 'out2':'input2'}
For more details see Columns.
- max_num_terms
Maximum number of keys to keep per column when auto- training.
- term
List of terms.
- sort
How items should be ordered when vectorized. By default, they will be in the order encountered. If by value items are sorted according to their default comparison, for example, text sorting will be case sensitive (for example, 'A' then 'Z' then 'a').
- text_key_values
Whether key value metadata should be text, regardless of the actual input type.
- params
Additional arguments sent to compute engine.
Examples
###############################################################################
# ToKey
import numpy
from nimbusml import FileDataStream
from nimbusml.datasets import get_dataset
from nimbusml.preprocessing import ToKey
# data input (as a FileDataStream)
path = get_dataset('infert').as_filepath()
data = FileDataStream.read_csv(path, sep=',', numeric_dtype=numpy.float32,
names={0: 'id'})
print(data.head())
# age case education id induced parity pooled.stratum spontaneous ...
# 0 26.0 1.0 0-5yrs 1.0 1.0 6.0 3.0 2.0 ...
# 1 42.0 1.0 0-5yrs 2.0 1.0 1.0 1.0 0.0 ...
# 2 39.0 1.0 0-5yrs 3.0 2.0 6.0 4.0 0.0 ...
# 3 34.0 1.0 0-5yrs 4.0 2.0 4.0 2.0 0.0 ..
# 4 35.0 1.0 6-11yrs 5.0 1.0 3.0 32.0 1.0 ..
# transform usage
xf = ToKey(columns={'id_1': 'id', 'edu_1': 'education'})
# fit and transform
features = xf.fit_transform(data)
print(features.head())
# age case edu_1 education id id_1 induced parity ...
# 0 26.0 1.0 0-5yrs 0-5yrs 1.0 0 1.0 6.0 ...
# 1 42.0 1.0 0-5yrs 0-5yrs 2.0 1 1.0 1.0 ...
# 2 39.0 1.0 0-5yrs 0-5yrs 3.0 2 2.0 6.0 ...
# 3 34.0 1.0 0-5yrs 0-5yrs 4.0 3 2.0 4.0 ...
# 4 35.0 1.0 6-11yrs 6-11yrs 5.0 4 1.0 3.0 ...
Remarks
The ToKey
transform converts a column of text to key values
using a dictionary. This operation can be reversed by using
FromKey to obtain the
orginal values.
Methods
get_params |
Get the parameters for this operator. |
get_params
Get the parameters for this operator.
get_params(deep=False)
Parameters
- deep