How To: output file as a CSV using Hive in Azure HDInsight

One of the common questions our team gets is how to output a Hive table to CSV. Hive does not provide a direct method to use the query language to dump to a file as CSV. Using the command INSERT OVERWRITE will output the table as TSV. We then have to manually convert it to a CSV.

Let's take an example to understand this better.

Every HDInsight cluster comes with hivesampletable as a default sample table on which you can perform basic Hive query operations. If you run the following command in the Hive console, you will see the output of the table printed on the screen.

SELECT * FROM hivesampletable;

If you want to output this to a CSV, first step is we will convert this to a TSV using the INSERT OVERWRITE command as shown below. Please replace containername and accountname with your container name and blob store name of your Azure Blob.

INSERT OVERWRITE DIRECTORY 'wasb://containername@accountname.blob.core.windows.net/outputdir/' SELECT * FROM hivesampletable;

This will generate a file names 000000_0 in TSV format. Next, you can either download this file locally and convert it to CSV locally or log in to the Headnode and then convert it to CSV using the code snippet shown below:

hdfs dfs -cat /outputdir/000000_0 |tr "1" "," >>output.csv

Hope this helps you. If you have a better way to output as CSV, feel free to share it with us in the comments section.

Comments

  • Anonymous
    January 18, 2016
    How about writing a hive table to data lake store? Tried your code but replaced the directory with a folder in my Data Lake Store. But this failed.
    • Anonymous
      March 09, 2017
      I find that I am able to sae to azure data lake store by adl:///