Loading files with ever changing columns

Question

Hi All,
I am using ADF to load files from a sftp site into ADLS. The issue is the incoming files are not consistent , column wise. Some times they add columns and there is no way we get notified. Is there a way in ADF where it automatically copies the new columns without manual intervention?
Please advice.
Thanks

Accepted Answer

Hello @bk ,

Thanks for the being so detailed in the ask , believe me that helps .
So in your case the sink is flat files and the source is SQLMI . This is what I was thinking , do you
really need to have the new columns ? if not you can use the tabular source , please read about that here
(https://video2.skills-academy.com/en-us/azure/data-factory/copy-activity-schema-and-type-mapping#tabular-source-to-tabular-sink)

Now let go ahead with the second possiblity where you need to append new columns to the SQL . As far I know
you can use the autocreate table option in data flow to create the staging table and then you can use the Stored proc
to read the Staging table and Alter the actual table .

let me know if this helps .

Updated on 11/17/2020

After talking to few members in the team we are sure that the schema drft(supported) in the source does not mean the we will have the schema drift on the sink side(not supported)
. But then I did tried the below work around in ADF and as per the testing it seems to work . I assume that the file is like below today date

11,22,33,44,55,66,77,88

and tmmrw it will be

11,22,33,44,55,66,77,88,999,10000,11
21,22,23,24,25,26,27,28,299,20000,21

Here is the logic

Copy the data to an intermediate table with the option with "Auto create table" .
The second copy will copy the data from intermediate table to the main table , but we will use the pre-copy script option to columns
check if the schema to the intermediate and final table is not same , if it is not we will add the missing columns .

Here is below piece of SQL scripts ( in my example the name of the final table is foofinalTable and intermediate table is fooSourceTable.

DECLARE @SQlCOmmand table ( id int identity , command varchar(100) )

INSERT INTO @SQlCOmmand ( command)
SELECT 'ALTER TABLE ' + Table_Schema +'.foofinalTable ADD ' +COLUMN_NAME + ' nvarchar(100)' as command
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = N'fooSourceTable'
and COLUMN_NAME not in (
SELECT COLUMN_NAME
FROM INFORMATION_SCHEMA.COLUMNS
WHERE TABLE_NAME = N'foofinalTable')

DECLARE @I INT
DECLARE @CMD VARCHAR(1000)
SELECT @I = 1
DECLARE @Max INT
SELECT @Max = MAX(ID) FROM @SQLCOMMAND

WHILE @I <= @Max BEGIN
SELECT @CMD = COMMAND FROM @SQLCOMMAND WHERE ID = @I
EXEC (@CMD )
SET @I = @I + 1
END

Thanks Himanshu
Please do consider to click on "Accept Answer" and "Upvote" on the post that helps you, as it can be beneficial to other community members.

Share via

Loading files with ever changing columns

0 additional answers

Your answer