New BigQuery connector CopyData error when there's no data

Fredy Treboux 0 Reputation points
2024-09-27T17:04:22.26+00:00

I am trying to replace the Google BigQuery connector for the newer one before the deprecation. But

the new Google BigQuery connector seems to make the "Copy Data" task fail when the source is a Query and there's no data returned from it.

This is the main issue I'm having right now (after sorting out a few others really..., like authentication).

We are running a self-hosted integration runtime. I've manually updated it to what I think is the latest version, 5.46.9020.1 (the automatic update was indicating that no update was needed, but this was not true).

I don't know how to get around this, which is a problem given that we have many pipelines that run every couple of hours to fetch newer data from BigQuery, if it's available.

I suppose one option could be to just try to ignore that the task has failed and go on, but seems like a bad idea as the task could fail for other reasons in which case I don't want things to keep running.

Also, the error message is not at all helpful, and the only reason I came to the conclusion the problem was no data on the source side was because I saw some posts that seemed to indicate this, and then tested the situation myself.

"ErrorCode: UnknownError

Object reference not set to an instance of an object."

Is this actually expected? What can be done about it?

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,643 questions
0 comments No comments
{count} votes

1 answer

Sort by: Most helpful
  1. youzedev@gmail.com 640 Reputation points
    2024-09-27T17:09:24.92+00:00

    The issue you’re encountering with the new Google BigQuery connector in Azure Data Factory (ADF) or Synapse Pipelines, where a “Copy Data” task fails when the source query returns no data, seems to be related to how the connector handles empty result sets. The error message (Object reference not set to an instance of an object) suggests that the connector may not be handling empty results gracefully. Here are some potential solutions and workarounds you can try:

    1. Check for Empty Results Before the Copy Activity

    One way to handle the situation where no data is returned from BigQuery is to check for empty results before running the “Copy Data” task. You can do this by adding a “Lookup” activity in your pipeline before the “Copy Data” activity. The Lookup activity can execute the query and check for results.

    •	Steps:
    
    1.	Add a Lookup activity to your pipeline and configure it to run the same query that you are using in the “Copy Data” task.
    
    2.	In the Settings of the Lookup activity, set the First row only option (so you don’t retrieve the whole dataset).
    
    3.	Use the @activity(‘LookupActivity’).output.count expression to check if the query returned any data.
    
    4.	Add a “If Condition” activity to evaluate whether the result count is greater than 0.
    
    5.	If the result count is greater than 0, run the Copy Data activity. If not, skip the Copy Data task.
    

    This way, the pipeline will only proceed with the copy operation if there is data to process, avoiding the null reference error caused by empty result sets.

    1. Enable Fault Tolerance in Copy Activity

    Another approach is to enable fault tolerance for the “Copy Data” activity. This allows the task to continue even if errors occur (like when the query returns no data).

    •	Steps:
    
    1.	Open the Copy Data activity in your pipeline.
    
    2.	Go to the Settings tab.
    
    3.	Under Fault Tolerance, set Skip incompatible rows to ensure the pipeline skips any problematic rows or empty data scenarios that might be causing the failure.
    

    However, this approach could mask other issues, so you should use it with caution.

    1. Modify the BigQuery Query to Return a Dummy Row

    As a workaround, you can modify the query to always return a row, even if no actual data exists. This way, the connector doesn’t encounter an empty result set and can proceed without throwing the error.

    For example, you can add a UNION clause to ensure a dummy row is always returned:

    SELECT * FROM your_dataset.your_table WHERE your_conditions

    UNION ALL

    SELECT NULL AS column1, NULL AS column2 -- Add as many columns as needed to match the original query structure

    WHERE NOT EXISTS (SELECT * FROM your_dataset.your_table WHERE your_conditions);

    This ensures that the query always returns at least one row, preventing the “Copy Data” activity from failing due to an empty result set.

    1. Update the Self-hosted Integration Runtime

    It’s possible that you’re still facing this issue because of an outdated version of the Self-hosted Integration Runtime (SHIR), even after the manual update. The issue could be related to the connector itself, and newer versions might have fixed it.

    •	Ensure you have the latest version of the Google BigQuery connector and the Self-hosted Integration Runtime (version 5.46.9020.1 or newer).
    
    •	You can verify this by checking the version in the Integration Runtime Configuration Manager or by running the following command to get the installed SHIR version:
    

    $IRInstance = Get-AzDataFactoryV2IntegrationRuntime -ResourceGroupName "YourResourceGroup" -DataFactoryName "YourDataFactory" -Name "YourIntegrationRuntimeName"

    $IRInstance.Properties.Version

    1. Handle the Error in the Pipeline with Retry Logic

    If the pipeline error is truly due to empty data, and it’s safe to ignore this specific error, you can handle it by adding retry logic or custom error handling.

    •	You can set a Retry Policy on the Copy Data activity so that it retries a few times before failing.
    
    •	Alternatively, you can use the On Failure output of the Copy Data activity to trigger a different activity, like logging the issue or sending a notification, but allowing the pipeline to continue without hard failure.
    
    1. Raise a Support Ticket with Microsoft

    Since this issue could be specific to the new version of the Google BigQuery connector, it’s also a good idea to raise a support ticket with Azure Support. They might have more insight into whether this is a known bug with the current version of the connector or if a patch is available to fix it.

    Conclusion

    Here are your best options to handle the issue:

    1.	Pre-check for data using a Lookup activity and conditional logic.
    
    2.	Enable fault tolerance on the Copy Data activity to handle empty results.
    
    3.	Modify the BigQuery query to ensure a dummy row is always returned.
    
    4.	Ensure the Self-hosted Integration Runtime and BigQuery connector are updated.
    
    5.	Add retry or custom error handling to the pipeline.
    

    If none of these solutions work or you still encounter issues, it’s advisable to open a support request with Azure as this might be a bug in the latest connector.

    If my answer is helpful to you, you can adopt it, thank you!

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.