parsing json databricks / python

Question

I am very beginner in databricks and python, so it maybe basics for you but for me it is still new.

I am trying to parse the json to get all childitems form "valid" and "date_of_creation" sections as columns in table which I will later write to parquet:

{  
	"valid": [  
		{  
			"retailer_id": 11,  
			"retailer_name": "abc",  
			"country": "AA",  
			"point_of_sales": [  
				66,  
				68655  
			]  
		},  
		{  
			"retailer_id": 22,  
			"retailer_name": "def",  
			"country": "AA",  
			"point_of_sales": [  
				2067,  
				2068,  
				68690,  
				68691  
			]  
		},  
		{  
			"retailer_id": 33,  
			"retailer_name": "ghi",  
			"country": "AA",  
			"point_of_sales": [  
				70,  
				71,  
				68694  
			]  
		}  
	],  
	"invalid": {  
		"retailers": [],  
		"points_of_sale": []  
	},  
	"date_of_creation": "04/26/2021 14:11:16"  
}

here is my code where I get "Cannot call display()" error

#read json file  
json = spark.read.option("multiline", "true").json(input_file)  
  
#parse json  
from pyspark.sql.functions import explode, col  
  
valid = json.select(explode("valid").alias("valid"))  
valid_childitems = valid.select  
(  
  col("valid.retailer_id").alias("retailer_id"),  
  col("valid.retailer_name").alias("retailer_name"),  
  col("valid.country").alias("country"),  
  explode("valid.point_of_sales").alias("point_of_sales")  
)  
  
display (valid_childitems)

I followed the post: https://adatis.co.uk/parsing-nested-json-lists-in-databricks-using-python/

Can someone point me what I am doing wrong?

Accepted Answer

Hi @braxx ,

Thanks for using Microsoft Q&A !!
You need to keep left parenthesis - '(' on the same line with the Select statement like below -

You can get the results as shown below -

----------

Please do not forget to "Accept the answer" wherever the information provided helps you to help others in the community.

Thanks
Saurabh

Share via

parsing json databricks / python

0 additional answers

Your answer