Data Preview in ADF Source from CSV showing NULL Values

tevin.sales 40 Reputation points
2024-04-08T09:15:54.9933333+00:00

Hello I am having an issue with mapping/creating a data flow between a CSV in a blob storage and an ADX table.

I have created the CSV source that is linked to a blob storage folder. I have imported the schema for it and when I preview the data it is showing correctly. However, when I create a source in a Data Flow node and I try to copy it to an ADX sink data, only a few fields are correctly copied over correctly. There are 18 fields and only 6 field's data are properly found. When I Data Preview the source, I see that more than half of the values are NULL. For reference, I have 3 other files that are creating in a different data flow where I am not running into the same issue. User's image

If I run the Import Projection field, it only imports 9 out of the 18 fields but at least when I view the Data with Data Preview, all the fields are not NULL. I don't know why it would be able to read correctly in the data source (preview source) but once it gets to a data flow source node, it cannot read correctly.
Example:

User's image

Source without Import Projection

source(output(\n\t\tDate as string,\n\t\tType as string,\n\t\tStoreName as string,\n\t\t{Job開始時間(YYYY/MM/DD hh:mm:ss)} as string,\n\t\t{Job完了時間(YYYY/MM/DD hh:mm:ss)} as string,\n\t\t{Job結果} as string,\n\t\t{陳列成功可否} as string,\n\t\t{陳列対象の商品JANコード} as string,\n\t\t{商品名} as string,\n\t\t{陳列対象のロケーションID} as string,\n\t\t{ロケーションX} as string,\n\t\t{ロケーションY} as string,\n\t\t{ロケーションZ} as string,\n\t\tErrCode as string,\n\t\tManualJob as string,\n\t\tJobId as string,\n\t\tPriority as string,\n\t\tEdgeType as string\n\t),\n\tallowSchemaDrift: false,\n\tvalidateSchema: false,\n\tignoreNoFilesFound: false) ~> source1\nsource1 sink(allowSchemaDrift: true,\n\tvalidateSchema: false,\n\tinput(\n\t\tStartTime as timestamp,\n\t\tFinishTime as timestamp,\n\t\tStoreName as string,\n\t\tType as string,\n\t\tJobResult as string,\n\t\tIsSuccess as boolean,\n\t\tSkuId as string,\n\t\tSkuName as string,\n\t\tLocationId as string,\n\t\tErrorCode as string,\n\t\tIsManualJob as boolean,\n\t\tJobId as string,\n\t\tPriority as integer,\n\t\tEdgeType as string\n\t),\n\tformat: 'table',\n\tskipDuplicateMapInputs: true,\n\tskipDuplicateMapOutputs: true,\n\tmapColumn(\n\t\tStartTime = {Job開始時間(YYYY/MM/DD hh:mm:ss)},\n\t\tFinishTime = {Job完了時間(YYYY/MM/DD hh:mm:ss)},\n\t\tStoreName,\n\t\tType,\n\t\tJobResult = {Job結果},\n\t\tIsSuccess = {陳列成功可否},\n\t\tSkuId = {陳列対象の商品JANコード},\n\t\tSkuName = {商品名},\n\t\tLocationId = {陳列対象のロケーションID},\n\t\tErrorCode = ErrCode,\n\t\tIsManualJob = ManualJob,\n\t\tJobId,\n\t\tPriority,\n\t\tEdgeType\n\t)) ~> sink1

With Import Projection:

source(output(\n\t\tDate as date 'yyyyMMdd',\n\t\tType as string,\n\t\tStoreName as string,\n\t\t{Job開始時間(YYYY/MM/DD hh:mm:ss)} as string,\n\t\t{Job完了時間(YYYY/MM/DD hh:mm:ss)} as string,\n\t\t{Job結果} as string,\n\t\t{陳列成功可否} as boolean,\n\t\t{陳列対象の商品JANコード} as long,\n\t\t{商品} as string\n\t),\n\tallowSchemaDrift: false,\n\tvalidateSchema: false,\n\tignoreNoFilesFound: false) ~> source1\nsource1 sink(allowSchemaDrift: true,\n\tvalidateSchema: false,\n\tinput(\n\t\tStartTime as timestamp,\n\t\tFinishTime as timestamp,\n\t\tStoreName as string,\n\t\tType as string,\n\t\tJobResult as string,\n\t\tIsSuccess as boolean,\n\t\tSkuId as string,\n\t\tSkuName as string,\n\t\tLocationId as string,\n\t\tErrorCode as string,\n\t\tIsManualJob as boolean,\n\t\tJobId as string,\n\t\tPriority as integer,\n\t\tEdgeType as string\n\t),\n\tformat: 'table',\n\tskipDuplicateMapInputs: true,\n\tskipDuplicateMapOutputs: true,\n\tmapColumn(\n\t\tStartTime = {Job開始時間(YYYY/MM/DD hh:mm:ss)},\n\t\tFinishTime = {Job完了時間(YYYY/MM/DD hh:mm:ss)},\n\t\tStoreName,\n\t\tType,\n\t\tJobResult = {Job結果},\n\t\tIsSuccess = {陳列成功可否},\n\t\tSkuId = {陳列対象の商品JANコード},\n\t\tSkuName = {商品名},\n\t\tLocationId = {陳列対象のロケーションID},\n\t\tErrorCode = ErrCode,\n\t\tIsManualJob = ManualJob,\n\t\tJobId,\n\t\tPriority,\n\t\tEdgeType\n\t)) ~> sink1

Azure Data Factory
Azure Data Factory
An Azure service for ingesting, preparing, and transforming data at scale.
10,681 questions
Azure Startups
Azure Startups
Azure: A cloud computing platform and infrastructure for building, deploying and managing applications and services through a worldwide network of Microsoft-managed datacenters.Startups: Companies that are in their initial stages of business and typically developing a business model and seeking financing.
258 questions
{count} votes

1 answer

Sort by: Most helpful
  1. PRADEEPCHEEKATLA-MSFT 89,466 Reputation points Microsoft Employee
    2024-04-09T04:07:05.1866667+00:00

    @tevin.sales - Thanks for the question and using MS Q&A platform.

    It seems that you are having an issue with mapping/creating a data flow between a CSV in a blob storage and an ADX table. When you create a source in a Data Flow and try to copy it to an ADX sink data, only a few fields are correctly copied over correctly. There are 18 fields and only 6 field's data are properly found. When you Data Preview the source, you see that more than half of the values are NULL.

    It is possible that the issue is caused by the schema inference process. By default, ADF uses sample rows (for example, top 100 or 1000 rows data) to infer the schema, and the inferred result will be used as a schema to read data. So if your data stores have extra columns that don't appear in sample rows, the data of these extra columns are not read, moved, or transferred into sink data stores.

    To overwrite the default behavior and bring in additional fields, ADF provides options for you to customize the source schema. You can specify additional/missing columns that could be missing in schema-infer-result in the data flow source projection to read the data.

    You can try to customize the source schema by following the steps below:

    1. Go to the source transformation in the data flow.
    2. Click on the "Projection" tab.
    3. Click on the "Import Schema" button.
    4. Select the "From Connection" option.
    5. Select the CSV file that you are using as the source.
    6. Click on the "Import" button.
    7. In the "Projection" tab, you can add or remove columns as needed.

    For more details, refer to Source transformation in mapping data flows.

    If this does not solve the issue, you can try to use the "Import Projection" option to import the schema from the CSV file. This will import the schema for all the columns in the CSV file, even if they are not present in the sample rows used for schema inference.

    Hope this helps. Do let us know if you any further queries.


    If this answers your query, do click Accept Answer and Yes for was this answer helpful. And, if you have any further query do let us know.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.