Data Preview in ADF Source from CSV showing NULL Values

Question

Hello I am having an issue with mapping/creating a data flow between a CSV in a blob storage and an ADX table.

I have created the CSV source that is linked to a blob storage folder. I have imported the schema for it and when I preview the data it is showing correctly. However, when I create a source in a Data Flow node and I try to copy it to an ADX sink data, only a few fields are correctly copied over correctly. There are 18 fields and only 6 field's data are properly found. When I Data Preview the source, I see that more than half of the values are NULL. For reference, I have 3 other files that are creating in a different data flow where I am not running into the same issue. User's image

If I run the Import Projection field, it only imports 9 out of the 18 fields but at least when I view the Data with Data Preview, all the fields are not NULL. I don't know why it would be able to read correctly in the data source (preview source) but once it gets to a data flow source node, it cannot read correctly.
Example:

User's image

Source without Import Projection

source(output( Date as string, Type as string, StoreName as string, {Job開始時間(YYYY/MM/DD hh:mm:ss)} as string, {Job完了時間(YYYY/MM/DD hh:mm:ss)} as string, {Job結果} as string, {陳列成功可否} as string, {陳列対象の商品JANコード} as string, {商品名} as string, {陳列対象のロケーションID} as string, {ロケーションX} as string, {ロケーションY} as string, {ロケーションZ} as string, ErrCode as string, ManualJob as string, JobId as string, Priority as string, EdgeType as string ), allowSchemaDrift: false, validateSchema: false, ignoreNoFilesFound: false) ~> source1 source1 sink(allowSchemaDrift: true, validateSchema: false, input( StartTime as timestamp, FinishTime as timestamp, StoreName as string, Type as string, JobResult as string, IsSuccess as boolean, SkuId as string, SkuName as string, LocationId as string, ErrorCode as string, IsManualJob as boolean, JobId as string, Priority as integer, EdgeType as string ), format: 'table', skipDuplicateMapInputs: true, skipDuplicateMapOutputs: true, mapColumn( StartTime = {Job開始時間(YYYY/MM/DD hh:mm:ss)}, FinishTime = {Job完了時間(YYYY/MM/DD hh:mm:ss)}, StoreName, Type, JobResult = {Job結果}, IsSuccess = {陳列成功可否}, SkuId = {陳列対象の商品JANコード}, SkuName = {商品名}, LocationId = {陳列対象のロケーションID}, ErrorCode = ErrCode, IsManualJob = ManualJob, JobId, Priority, EdgeType )) ~> sink1

With Import Projection:

source(output( Date as date 'yyyyMMdd', Type as string, StoreName as string, {Job開始時間(YYYY/MM/DD hh:mm:ss)} as string, {Job完了時間(YYYY/MM/DD hh:mm:ss)} as string, {Job結果} as string, {陳列成功可否} as boolean, {陳列対象の商品JANコード} as long, {商品} as string ), allowSchemaDrift: false, validateSchema: false, ignoreNoFilesFound: false) ~> source1 source1 sink(allowSchemaDrift: true, validateSchema: false, input( StartTime as timestamp, FinishTime as timestamp, StoreName as string, Type as string, JobResult as string, IsSuccess as boolean, SkuId as string, SkuName as string, LocationId as string, ErrorCode as string, IsManualJob as boolean, JobId as string, Priority as integer, EdgeType as string ), format: 'table', skipDuplicateMapInputs: true, skipDuplicateMapOutputs: true, mapColumn( StartTime = {Job開始時間(YYYY/MM/DD hh:mm:ss)}, FinishTime = {Job完了時間(YYYY/MM/DD hh:mm:ss)}, StoreName, Type, JobResult = {Job結果}, IsSuccess = {陳列成功可否}, SkuId = {陳列対象の商品JANコード}, SkuName = {商品名}, LocationId = {陳列対象のロケーションID}, ErrorCode = ErrCode, IsManualJob = ManualJob, JobId, Priority, EdgeType )) ~> sink1

Answer

@tevin.sales - Thanks for the question and using MS Q&A platform.

It seems that you are having an issue with mapping/creating a data flow between a CSV in a blob storage and an ADX table. When you create a source in a Data Flow and try to copy it to an ADX sink data, only a few fields are correctly copied over correctly. There are 18 fields and only 6 field's data are properly found. When you Data Preview the source, you see that more than half of the values are NULL.

It is possible that the issue is caused by the schema inference process. By default, ADF uses sample rows (for example, top 100 or 1000 rows data) to infer the schema, and the inferred result will be used as a schema to read data. So if your data stores have extra columns that don't appear in sample rows, the data of these extra columns are not read, moved, or transferred into sink data stores.

To overwrite the default behavior and bring in additional fields, ADF provides options for you to customize the source schema. You can specify additional/missing columns that could be missing in schema-infer-result in the data flow source projection to read the data.

You can try to customize the source schema by following the steps below:

Go to the source transformation in the data flow.
Click on the "Projection" tab.
Click on the "Import Schema" button.
Select the "From Connection" option.
Select the CSV file that you are using as the source.
Click on the "Import" button.
In the "Projection" tab, you can add or remove columns as needed.

For more details, refer to Source transformation in mapping data flows.

If this does not solve the issue, you can try to use the "Import Projection" option to import the schema from the CSV file. This will import the schema for all the columns in the CSV file, even if they are not present in the sample rows used for schema inference.

Hope this helps. Do let us know if you any further queries.

If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

Share via

Data Preview in ADF Source from CSV showing NULL Values

1 answer

Your answer