Understanding top-level entities in managed feature store

This document describes the top level entities in the managed feature store.

Diagram depicting the main components of managed feature store.

For more information on the managed feature store, see What is managed feature store?

Feature store

You can create and manage feature sets through a feature store. Feature sets are a collection of features. You can optionally associate a materialization store (offline store connection) with a feature store, to regularly precompute and persist the features. It can make feature retrieval during training or inference faster and more reliable.

For more information about the configuration, see CLI (v2) feature store YAML schema

Entities

Entities encapsulate the index columns for logical entities in an enterprise. Examples of entities include account entity, customer entity, etc. Entities help enforce, as best practice, the use of the same index column definitions across the feature sets that use the same logical entities.

Entities are typically created once and then reused across feature-sets. Entities are versioned.

For more information about the configuration, see CLI (v2) feature entity YAML schema

Feature set specification and asset

Feature sets are a collection of features generated by applying transformations on source system data. Feature sets encapsulate a source, the transformation function, and the materialization settings. We currently support PySpark feature transformation code.

Start by creating a feature set specification. A feature set specification is a self-contained definition of a feature set that you can locally develop and test.

A feature set specification typically consists of the following parameters:

  • source: What source(s) does this feature map to
  • transformation (optional): The transformation logic, applied to the source data, to create features. In our case, we use Spark as the supported compute.
  • Names of the columns that represent the index_columns and the timestamp_column: These names are required when users try to join feature data with observation data (more about this later)
  • materialization_settings(optional): Required, to cache the feature values in a materialization store for efficient retrieval.

After development and testing the feature set spec in your local/dev environment, you can register the spec as a feature set asset with the feature store. The feature set asset provides managed capabilities, such as versioning and materialization.

For more information about the feature set YAML specification, see CLI (v2) feature set specification YAML schema

Feature retrieval specification

A feature retrieval specification is a portable definition of a feature list associated with a model. It can help streamline machine learning model development and operationalization. A feature retrieval specification is typically an input to the training pipeline. It helps generate the training data. It can be packaged with the model. Additionally, inference step uses it to look up the features. It integrates all phases of the machine learning lifecycle. Changes to your training and inference pipeline can be minimized as you experiment and deploy.

Use of a feature retrieval specification and the built-in feature retrieval component are optional. You can directly use the get_offline_features() API if you want.

For more information about the feature retrieval YAML specification, see CLI (v2) feature retrieval specification YAML schema.

Next steps