OpenDatasetBase Class
Open Dataset Base Class for inherit.
Construct open datasets.
- Inheritance
-
OpenDatasetBase
Constructor
OpenDatasetBase(cols: List[str] | None = None, enable_telemetry: bool = True, **kwargs)
Parameters
Name | Description |
---|---|
cols
|
A list of columns names to load from the dataset, defaults to None Default value: None
|
enable_telemetry
|
Whether to enable telemetry on this dataset, defaults to True Default value: True
|
kwargs
Required
|
args for filter |
Methods
get_file_dataset |
Get the file dataset for open dataset. |
get_tabular_dataset |
Initialize AbstractTabularOpenDataset with blob url. |
to_pandas_dataframe |
To pandas dataframe. |
to_spark_dataframe |
To spark dataframe. |
get_file_dataset
Get the file dataset for open dataset.
get_file_dataset(start_date: datetime = None, end_date: datetime = None, enable_telemetry: bool = True, **kwargs) -> FileDataset
Parameters
Name | Description |
---|---|
cls
Required
|
current class |
start_date
Required
|
start date, defaults to None |
end_date
Required
|
end date, defaults to None |
enable_telemetry
Required
|
enable telemetry or not, defaults to True |
Returns
Type | Description |
---|---|
file dataset |
get_tabular_dataset
Initialize AbstractTabularOpenDataset with blob url.
get_tabular_dataset(start_date: datetime = None, end_date: datetime = None, cols: List[str] = None, enable_telemetry: bool = True, **kwargs) -> TabularDataset
Parameters
Name | Description |
---|---|
cls
Required
|
type name of the Open Dataset. |
start_date
Required
|
The start date to query inclusively. |
end_date
Required
|
The end date to query inclusively. |
cols
Required
|
A list of column names to retrieve. None will get all columns. |
enable_telemetry
Required
|
Whether to enable telemetry, disabled for UT only. |
Returns
Type | Description |
---|---|
TabularDataset |
to_pandas_dataframe
To pandas dataframe.
to_pandas_dataframe() -> DataFrame
to_spark_dataframe
To spark dataframe.
to_spark_dataframe()
Attributes
cols
Get the column name list to retrieve.
data
Get the data of the OpenDataset Object.
id
Get the location ID of the open data.
log_properties
Get log properties.
registry_id
Get the registry ID of this public dataset registered at the backend.
This registry ID is used to get latest metadata like storage location. Expect all public data sub classes to assign _registry_id.
Returns
Type | Description |
---|---|
Registry ID string. |
time_column_name
Time column name.