XML format in Data Factory in Microsoft Fabric

Article
06/25/2024

This article outlines how to configure XML format in the data pipeline of Data Factory in Microsoft Fabric.

Supported capabilities

XML format is supported for the following activities and connectors as source.

Category	Connector/Activity
Supported connector	Amazon S3
	Amazon S3 Compatible
	Azure Blob Storage
	Azure Data Lake Storage Gen1
	Azure Data Lake Storage Gen2
	Azure Files
	File system
	FTP
	Google Cloud Storage
	HTTP
	Lakehouse Files
	Oracle Cloud Storage
	SFTP
Supported activity	Copy activity (source/-)
	Lookup activity
	GetMetadata activity
	Delete activity

XML format in copy activity

To configure XML format, choose your connection in the source of data pipeline copy activity, and then select XML in the drop-down list of File format. Select Settings for further configuration of this format.

Screenshot showing file format settings.

XML as source

After you select Settings in the File format section, the following properties are shown in the pop-up File format settings dialog box.

Screenshot showing selecting file format.

Compression type: The compression codec used to read XML files. You can choose from None, bzip2, gzip, deflate, ZipDeflate, TarGZip or tar type in the drop-down list.

If you select ZipDeflate as the compression type, Preserve zip file name as folder is displayed under the Advanced settings in the Source tab.
- Preserve zip file name as folder: Indicates whether to preserve the source zip file name as a folder structure during copy.
  - If this box is checked (default), the service writes unzipped files to <specified file path>/<folder named as source zip file>/.
  - If this box is unchecked, the service writes unzipped files directly to <specified file path>. Make sure you don't have duplicated file names in different source zip files to avoid racing or unexpected behavior.
If you select TarGZip/tar as the compression type, Preserve compression file name as folder is displayed under the Advanced settings in the Source tab.
- Preserve compression file name as folder: Indicates whether to preserve the source compressed file name as a folder structure during copy.
  - If this box is checked (default), the service writes decompressed files to <specified file path>/<folder named as source compressed file>/.
  - If this box is unchecked, the service writes decompressed files directly to <specified file path>. Make sure you don't have duplicated file names in different source files to avoid racing or unexpected behavior.
Compression level: Specify the compression ratio when you select a compression type. You can choose from Fastest or Optimal.
- Fastest: The compression operation should complete as quickly as possible, even if the resulting file is not optimally compressed.
- Optimal: The compression operation should be optimally compressed, even if the operation takes a longer time to complete. For more information, see Compression Level topic.
Encoding: Specify the encoding type used to write test files. Select one type from the drop-down list. The default value is UTF-8.
Null value: Specifies the string representation of null value. The default value is empty string.

Under Advanced settings in the Source tab, the following XML format related properties are displayed.

Validation mode: Specifies whether to validate the XML schema. Select one mode from the drop-down list.
- None: Select this to not use validation mode.
- xsd: Select this to validate the XML schema using XSD.
- dtd: Select this to validate the XML schema using DTD.
Namespaces: Specify whether to enable namespace when parsing the XML files. It is selected by default.
Namespace prefix pairs: If the Namespaces is enabled, selecting + New and specify the URL and Prefix. You can add more pairs by selecting + New.
Namespace URI to prefix mapping is used to name fields when parsing the XML file. If an XML file has namespace and namespace is enabled, by default, the field name is the same as it is in the XML document. If there is an item defined for the namespace URI in this map, the field name is prefix:fieldName.
Detect data type: Specify whether to detect integer, double, and Boolean data types. It is selected by default.

Table summary

XML as source

The following properties are supported in the copy activity Source section when using XML format.

Name	Description	Value	Required	JSON script property
File format	The file format that you want to use.	XML	Yes	type (under `datasetSettings`): Xml
Compression type	The compression codec used to read XML files.	None bzip2 gzip deflate ZipDeflate TarGZip tar	No	type (under `compression`): bzip2 gzip deflate ZipDeflate TarGZip tar
Compression level	The compression ratio.	Fastest Optimal	No	level (under `compression`): Fastest Optimal
Encoding	The encoding type used to read test files.	"UTF-8" (by default),"UTF-8 without BOM", "UTF-16LE", "UTF-16BE", "UTF-32LE", "UTF-32BE", "US-ASCII", "UTF-7", "BIG5", "EUC-JP", "EUC-KR", "GB2312", "GB18030", "JOHAB", "SHIFT-JIS", "CP875", "CP866", "IBM00858", "IBM037", "IBM273", "IBM437", "IBM500", "IBM737", "IBM775", "IBM850", "IBM852", "IBM855", "IBM857", "IBM860", "IBM861", "IBM863", "IBM864", "IBM865", "IBM869", "IBM870", "IBM01140", "IBM01141", "IBM01142", "IBM01143", "IBM01144", "IBM01145", "IBM01146", "IBM01147", "IBM01148", "IBM01149", "ISO-2022-JP", "ISO-2022-KR", "ISO-8859-1", "ISO-8859-2", "ISO-8859-3", "ISO-8859-4", "ISO-8859-5", "ISO-8859-6", "ISO-8859-7", "ISO-8859-8", "ISO-8859-9", "ISO-8859-13", "ISO-8859-15", "WINDOWS-874", "WINDOWS-1250", "WINDOWS-1251", "WINDOWS-1252", "WINDOWS-1253", "WINDOWS-1254", "WINDOWS-1255", "WINDOWS-1256", "WINDOWS-1257", "WINDOWS-1258"	No	encodingName
Preserve zip file name as folder	Indicates whether to preserve the source zip file name as a folder structure during copy.	Selected (default) or unselect	No	preserveZipFileNameAsFolder (under `compressionProperties`->`type` as `ZipDeflateReadSettings`): true (default) or false
Preserve compression file name as folder	Indicates whether to preserve the source compressed file name as a folder structure during copy.	Selected (default) or unselect	No	preserveCompressionFileNameAsFolder (under `compressionProperties`->`type` as `TarGZipReadSettings` or `TarReadSettings`): true (default) or false
Null value	The string representation of null value.	<your null value> empty string (by default)	No	nullValue
Validation mode	Whether to validate the XML schema.	None xsd dtd	No	validationMode: xsd dtd
Namespaces	Whether to enable namespace when parsing the XML files.	Selected (default) or unselected	No	namespaces: true (default) or false
Namespace prefix pairs	Namespace URI to prefix mapping, which is used to name fields when parsing the XML file. If an XML file has namespace and namespace is enabled, by default, the field name is the same as it is in the XML document. If there is an item defined for the namespace URI in this map, the field name is `prefix:fieldName`.	< url >:< prefix >	No	namespacePrefixes: < url >:< prefix >
Detect data type	Whether to detect integer, double, and Boolean data types.	Selected (default) or unselected	No	detectDataType: true (default) or false

Connectors overview

Share via

XML format in Data Factory in Microsoft Fabric

Supported capabilities

XML format in copy activity

XML as source

Table summary

XML as source

Feedback

Additional resources

Share via

XML format in Data Factory in Microsoft Fabric

Supported capabilities

XML format in copy activity

XML as source

Table summary

XML as source

Related content

Feedback

Additional resources