Need to Remove Empty Columns from the .csv file

Question

I want to remove empty column from the csv file. for eg. at column2, column 5 the column is empty without header or full column is empty, then how can we remove the empty columns. Any idea please

Answer

Hi @Dinesh Prajapati ,

Welcome to Microsoft Q&A platform and thanks for posting your question here.

As I understand your query, you are trying to delete the columns having empty columnnames or columnvalues from the dataset . Please let me know if that is not the ask here.

You can use select function in pyspark on top of remove(" "). Check below:

df = sqlContext.createDataFrame([(1,"", "a"," "), (2,"", "b"," "), (5,"", "c"," "), (8,"", "d"," ")], ("id"," ", "name"," "))  
  
+---+---+---+---+  
| id|   |name|   |  
+---+---+---+---+  
|  1|   |  a|   |  
|  2|   |  b|   |  
|  5|   |  c|   |  
|  8|   |  d|   |  
+---+---+---+---+  
  
a=list(set(df.columns))  
a.remove(" ")  
df=df.select(a)  
df.show()  
  
+---+---+  
|name| id|  
+---+---+  
|  a|  1|  
|  b|  2|  
|  c|  5|  
|  d|  8|  
+---+---+

Hope this will help. Please let us know if any further queries.

------------------------------

Please don't forget to click on or upvote button whenever the information provided helps you.
Original posters help the community find answers faster by identifying the correct answer. Here is how
Want a reminder to come back and check responses? Here is how to subscribe to a notification
If you are interested in joining the VM program and help shape the future of Q&A: Here is how you can be part of Q&A Volunteer Moderators

Share via

Need to Remove Empty Columns from the .csv file

1 answer

Your answer