Bulk Inserts in Stream Analytic Jobs

Raj Bisht 20 Reputation points
2024-09-16T15:51:36.9366667+00:00

Hi Microsoft Team,

I have a stream analytics job which is using an event hub as an input and dedicated SQL pool (previously SQL DW) as an output.

In the query activity of this dedicated SQL pool, I Can see multiple queries are running which is suing bulk insert command to put data into SQL DW table configured in stream analytic jobs as an output.

While I am trying to backtrack and trace the origin of these bulk insert queries, I am failing to do so.

I checked the query section in ASA Job under job topology section but there are no bulk insert commands being mentioned. Are these bulk inserts a default option in ASA Jobs whenever we are using SQL Database as an output.

Where can we see these bulk insert commands, is it hardcoded? or it is present in the backend not visible to customers.

I have the above concern as recently due to these bulk insert commands transaction logs got full, and lot of failures were seen in jobs. So, Microsoft support suggested to optimize these bulk insert commands and use the below recommendation:

"To continue with bulk insert operations, you can minimize the load on the transaction log database by following these best practices:

Primary Index Load Scenario Logging Mode
Primary Index Load Scenario Logging Mode
Heap Any Minimal
Clustered Index Empty target table Minimal
Clustered Index Loaded rows do not overlap with existing pages in target Minimal

" BUT in order to use above optimization we need to find these bulk insert commands somewhere and then hardcode to use indexes like heap, clustered index, etc.

Please help me with my doubt. CAN we actually modify these bulk insert commands in ASA Jobs and use the above recommendation of heap, clustered index, etc.

Azure Stream Analytics
Azure Stream Analytics
An Azure real-time analytics service designed for mission-critical workloads.
360 questions
{count} votes

Accepted answer
  1. Ganesh Gurram 945 Reputation points Microsoft Vendor
    2024-09-18T10:27:49.5533333+00:00

    @Raj Bisht - Thanks for the question and using MS Q&A platform. 

    Here is the response shared by the internal team: 

    Yes, bulk inserts are the default and only option when using SQL Database output. Internally Azure Stream Analytics issues ADO.NET SQLBulkCopy API to issue these bulk inserts.

    Based on my understanding, these two things can be done to achieve minimal logging.   

    1. Table Lock: Use the TABLOCK hint to lock the table during the bulk insert operation. This is necessary for minimal logging. 
    2. Non-Clustered Indexes: If your table has non-clustered indexes, minimal logging might not be fully achievable. Consider dropping the indexes before the bulk insert and recreating them afterward.  

    Regarding, #1 Azure Stream Analytics already specifies TABLOCK unless the destination table is a Memory optimized table.  

    If #2 is also not an issue for your table, please include SQL CSS for further recommendations. 

    Another alternative option to write to SQL DW is to try out the Synapse output, which internally uses COPY command to write to DW table. You can validate if this option works better in your setup.  

    Coming to the document which you had shared: Optimizing transactions in dedicated SQL pool in Azure Synapse Analytics applicable to Azure Synapse Analytics 

    Hope this helps. Do let us know if you any further queries.  

    ------------  

    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.  

    2 people found this answer helpful.
    0 comments No comments

0 additional answers

Sort by: Most helpful

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.