rx_get_partitions

Usage

revoscalepy.rx_get_partitions(input_data: revoscalepy.datasource.RxXdfData.RxXdfData = None,
    **kwargs)

Description

Get partitions enumeration of a partitioned .xdf file data source.

Arguments

input_data

An existing partitioned data source object which was created by RxXdfData with create_partition_set = True and constructed by rx_partition.

Returns

A Pandas data frame with (n+1) columns, the first n columns are partitioning columns specified by vars_to_partition in rx_partition and the (n+1)th column is a data source column where each row contains an Xdf data source object of one partition.

See also

rx_exec_by. RxXdfData. rx_partition.

Example

import os, tempfile
from revoscalepy import RxOptions, RxXdfData, rx_partition, rx_get_partitions
data_path = RxOptions.get_option("sampleDataDir")

# input xdf data source
xdf_file = os.path.join(data_path, "claims.xdf")
xdf_ds = RxXdfData(xdf_file)

# create a partitioned xdf data source object
out_xdf_file = os.path.join(tempfile._get_default_tempdir(), "outPartitions")
out_xdf = RxXdfData(out_xdf_file, create_partition_set = True)

# do partitioning for input data set
partitions = rx_partition(input_data = xdf_ds, output_data = out_xdf, vars_to_partition = ["car.age","type"], append = "none", overwrite = True)

# use rx_get_partitions to load an existing partitioned xdf
out_xdf_1 = RxXdfData(out_xdf_file)
partitions_1 = rx_get_partitions(out_xdf_1)
print(partitions_1)