Datalake to Synapse pipeline failure

Girish Tharwani 0 Reputation points
2024-09-24T12:15:45.28+00:00

Hi,

My pipeline is failing with below error when copying data from Datalake Gen 2 to Synapse dedicated pool.

{ "errorCode": "2200", "message": "ErrorCode=ParquetJavaInvocationException,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=An error occurred when invoking java, message: java.io.IOException:O\ntotal entry:15\r\ncom.microsoft.datatransfer.bridge.io.parquet.IoBridge.inputStreamRead(Native Method)\r\ncom.microsoft.datatransfer.bridge.io.parquet.BridgeInputFileStream.fillBuffer(BridgeInputFileStream.java:88)\r\ncom.microsoft.datatransfer.bridge.io.parquet.BridgeInputFileStream.read(BridgeInputFileStream.java:42)\r\njava.io.DataInputStream.read(DataInputStream.java:149)\r\norg.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:102)\r\norg.apache.parquet.io.DelegatingSeekableInputStream.readFullyHeapBuffer(DelegatingSeekableInputStream.java:127)\r\norg.apache.parquet.io.DelegatingSeekableInputStream.readFully(DelegatingSeekableInputStream.java:91)\r\norg.apache.parquet.hadoop.ParquetFileReader$ConsecutivePartList.readAll(ParquetFileReader.java:1850)\r\norg.apache.parquet.hadoop.ParquetFileReader.internalReadRowGroup(ParquetFileReader.java:990)\r\norg.apache.parquet.hadoop.ParquetFileReader.readNextRowGroup(ParquetFileReader.java:940)\r\norg.apache.parquet.hadoop.ParquetFileReader.readNextFilteredRowGroup(ParquetFileReader.java:1082)\r\norg.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:130)\r\norg.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:230)\r\norg.apache.parquet.hadoop.ParquetReader.read(ParquetReader.java:132)\r\ncom.microsoft.datatransfer.bridge.parquet.ParquetBatchReaderBridge.nextBuffer(ParquetBatchReaderBridge.java:168)\r\n.,Source=Microsoft.DataTransfer.Richfile.ParquetTransferPlugin,''Type=Microsoft.DataTransfer.Richfile.JniExt.JavaBridgeException,Message=,Source=Microsoft.DataTransfer.Richfile.HiveOrcBridge,'", "failureType": "UserError",

Azure Synapse Analytics
Azure Synapse Analytics
An Azure analytics service that brings together data integration, enterprise data warehousing, and big data analytics. Previously known as Azure SQL Data Warehouse.
4,903 questions
{count} votes

1 answer

Sort by: Most helpful
  1. Ganesh Gurram 320 Reputation points Microsoft Vendor
    2024-09-25T10:20:37.8566667+00:00

    @Girish Tharwani - Thanks for the question and using MS Q&A platform.

    It seems that your pipeline is failing with an error message related to Parquet file format. The error message indicates that there was an error when invoking Java and that there was an issue with reading the Parquet file.

    To troubleshoot this issue, you can try the following steps:

    • Retry the operation: If the issue is intermittent, retry the operation to see if it resolves the issue.
    • Check the source data: Verify that the source data is in the correct format and that there are no data quality issues that may be causing the issue.

    For more details, refer to MS Q&A thread addressing similar issue: Azure Data Factory error 2200 writing to parquet file - Microsoft Q&A

    In case, if you are experiencing the same issue. Could you please share the details about the source and the file format used to copy data from Data Lake Gen2 to Synapse dedicated pool?

    Hope this helps. Do let us know if you have any further queries.


Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.