Spark SQL How to get the 5th column from the Spark SQL Query

Rajaniesh Kaushikk 476 Reputation points
2020-06-15T13:48:04.437+00:00

Hi,

I have a headerless file which I am reading in the spark.read to create a data frame now I want to get the value of the 5th column from the file.File is comma seperated. How to achieve it. I know it is possible in the T-SQL but not sure how to achieve it in Spark SQL.
Example of the file

myname,1/4/1977,xyz,50,60

Now I want to get the value of 60.

Regards
Rajaniesh

Azure Databricks
Azure Databricks
An Apache Spark-based analytics platform optimized for Azure.
2,045 questions
0 comments No comments
{count} votes

Accepted answer
  1. HimanshuSinha-msft 19,376 Reputation points Microsoft Employee
    2020-06-16T00:35:08.56+00:00

    Hello Rajaniesh ,

    The below piece of code should do the trick . Please do let me know how it goes .

    cols = ['_c4']  
    df.select(*cols).show()  
    

    10172-adf-ui.gif

    Himanshu


    Please do consider to click on "Accept Answer" and "Up-vote" on the post that helps you, as it can be beneficial to other community members


1 additional answer

Sort by: Most helpful
  1. Dimitri B 66 Reputation points
    2020-06-15T23:06:33.467+00:00

    One way to do it would be to read it with "schema" option and assign some arbitrary names to your columns, e.g. col_1, col_2, and then reference it by that name.
    Also I think Spark will do it for you even if you don't use the schema option, not sure what the pattern would be.

    0 comments No comments