I am very beginner in databricks and python, so it maybe basics for you but for me it is still new.
I am trying to parse the json to get all childitems form "valid" and "date_of_creation" sections as columns in table which I will later write to parquet:
{
"valid": [
{
"retailer_id": 11,
"retailer_name": "abc",
"country": "AA",
"point_of_sales": [
66,
68655
]
},
{
"retailer_id": 22,
"retailer_name": "def",
"country": "AA",
"point_of_sales": [
2067,
2068,
68690,
68691
]
},
{
"retailer_id": 33,
"retailer_name": "ghi",
"country": "AA",
"point_of_sales": [
70,
71,
68694
]
}
],
"invalid": {
"retailers": [],
"points_of_sale": []
},
"date_of_creation": "04/26/2021 14:11:16"
}
here is my code where I get "Cannot call display(<class 'method'>)" error
#read json file
json = spark.read.option("multiline", "true").json(input_file)
#parse json
from pyspark.sql.functions import explode, col
valid = json.select(explode("valid").alias("valid"))
valid_childitems = valid.select
(
col("valid.retailer_id").alias("retailer_id"),
col("valid.retailer_name").alias("retailer_name"),
col("valid.country").alias("country"),
explode("valid.point_of_sales").alias("point_of_sales")
)
display (valid_childitems)
I followed the post: https://adatis.co.uk/parsing-nested-json-lists-in-databricks-using-python/
Can someone point me what I am doing wrong?