OpenDatasetBase Class

Open Dataset Base Class for inherit.

Construct open datasets.

Inheritance
OpenDatasetBase

Constructor

OpenDatasetBase(cols: List[str] | None = None, enable_telemetry: bool = True, **kwargs)

Parameters

Name Description
cols

A list of columns names to load from the dataset, defaults to None

Default value: None
enable_telemetry

Whether to enable telemetry on this dataset, defaults to True

Default value: True
kwargs
Required

args for filter

Methods

get_file_dataset

Get the file dataset for open dataset.

get_tabular_dataset

Initialize AbstractTabularOpenDataset with blob url.

to_pandas_dataframe

To pandas dataframe.

to_spark_dataframe

To spark dataframe.

get_file_dataset

Get the file dataset for open dataset.

get_file_dataset(start_date: datetime = None, end_date: datetime = None, enable_telemetry: bool = True, **kwargs) -> FileDataset

Parameters

Name Description
cls
Required

current class

start_date
Required

start date, defaults to None

end_date
Required

end date, defaults to None

enable_telemetry
Required

enable telemetry or not, defaults to True

Returns

Type Description

file dataset

get_tabular_dataset

Initialize AbstractTabularOpenDataset with blob url.

get_tabular_dataset(start_date: datetime = None, end_date: datetime = None, cols: List[str] = None, enable_telemetry: bool = True, **kwargs) -> TabularDataset

Parameters

Name Description
cls
Required

type name of the Open Dataset.

start_date
Required

The start date to query inclusively.

end_date
Required

The end date to query inclusively.

cols
Required

A list of column names to retrieve. None will get all columns.

enable_telemetry
Required

Whether to enable telemetry, disabled for UT only.

Returns

Type Description

TabularDataset

to_pandas_dataframe

To pandas dataframe.

to_pandas_dataframe() -> DataFrame

to_spark_dataframe

To spark dataframe.

to_spark_dataframe()

Attributes

cols

Get the column name list to retrieve.

data

Get the data of the OpenDataset Object.

id

Get the location ID of the open data.

log_properties

Get log properties.

registry_id

Get the registry ID of this public dataset registered at the backend.

This registry ID is used to get latest metadata like storage location. Expect all public data sub classes to assign _registry_id.

Returns

Type Description
str

Registry ID string.

time_column_name

Time column name.