IndexingParametersConfiguration Class

Reference

A dictionary of indexer-specific configuration properties. Each name is the name of a specific property. Each value must be of a primitive type.

Inheritance: azure.search.documents.indexes._generated._serialization.Model

IndexingParametersConfiguration

Constructor

IndexingParametersConfiguration(*, additional_properties: Dict[str, Any] | None = None, parsing_mode: str | _models.BlobIndexerParsingMode = 'default', excluded_file_name_extensions: str = '', indexed_file_name_extensions: str = '', fail_on_unsupported_content_type: bool = False, fail_on_unprocessable_document: bool = False, index_storage_metadata_only_for_oversized_documents: bool = False, delimited_text_headers: str | None = None, delimited_text_delimiter: str | None = None, first_line_contains_headers: bool = True, document_root: str | None = None, data_to_extract: str | _models.BlobIndexerDataToExtract = 'contentAndMetadata', image_action: str | _models.BlobIndexerImageAction = 'none', allow_skillset_to_read_file_data: bool = False, pdf_text_rotation_algorithm: str | _models.BlobIndexerPDFTextRotationAlgorithm = 'none', execution_environment: str | _models.IndexerExecutionEnvironment = 'standard', query_timeout: str = '00:05:00', **kwargs: Any)

Keyword-Only Parameters

Name	Description
additional_properties	dict[str, any] Unmatched properties from the message are deserialized to this collection.
parsing_mode	str or BlobIndexerParsingMode Represents the parsing mode for indexing from an Azure blob data source. Known values are: "default", "text", "delimitedText", "json", "jsonArray", and "jsonLines". Default value: default
excluded_file_name_extensions	str Comma-delimited list of filename extensions to ignore when processing from Azure blob storage. For example, you could exclude ".png, .mp4" to skip over those files during indexing.
indexed_file_name_extensions	str Comma-delimited list of filename extensions to select when processing from Azure blob storage. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types.
fail_on_unsupported_content_type	bool For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance.
fail_on_unprocessable_document	bool For Azure blobs, set to false if you want to continue indexing if a document fails indexing.
index_storage_metadata_only_for_oversized_documents	bool For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see https://video2.skills-academy.com/azure/search/search-limits-quotas-capacity.
delimited_text_headers	str For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index.
delimited_text_delimiter	str For CSV blobs, specifies the end-of-line single-character delimiter for CSV files where each line starts a new document (for example, "\|").
first_line_contains_headers	bool For CSV blobs, indicates that the first (non-blank) line of each blob contains headers. Default value: True
document_root	str For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property.
data_to_extract	str or BlobIndexerDataToExtract Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs. Known values are: "storageMetadata", "allMetadata", and "contentAndMetadata". Default value: contentAndMetadata
image_action	str or BlobIndexerImageAction Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer. Known values are: "none", "generateNormalizedImages", and "generateNormalizedImagePerPage". Default value: none
allow_skillset_to_read_file_data	bool If true, will create a path //document//file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill.
pdf_text_rotation_algorithm	str or BlobIndexerPDFTextRotationAlgorithm Determines algorithm for text extraction from PDF files in Azure blob storage. Known values are: "none" and "detectAngles". Default value: none
execution_environment	str or IndexerExecutionEnvironment Specifies the environment in which the indexer should execute. Known values are: "standard" and "private". Default value: standard
query_timeout	str Increases the timeout beyond the 5-minute default for Azure SQL database data sources, specified in the format "hh:mm:ss". Default value: 00:05:00

Variables

Name	Description
additional_properties	dict[str, any] Unmatched properties from the message are deserialized to this collection.
parsing_mode	str or BlobIndexerParsingMode Represents the parsing mode for indexing from an Azure blob data source. Known values are: "default", "text", "delimitedText", "json", "jsonArray", and "jsonLines".
excluded_file_name_extensions	str Comma-delimited list of filename extensions to ignore when processing from Azure blob storage. For example, you could exclude ".png, .mp4" to skip over those files during indexing.
indexed_file_name_extensions	str Comma-delimited list of filename extensions to select when processing from Azure blob storage. For example, you could focus indexing on specific application files ".docx, .pptx, .msg" to specifically include those file types.
fail_on_unsupported_content_type	bool For Azure blobs, set to false if you want to continue indexing when an unsupported content type is encountered, and you don't know all the content types (file extensions) in advance.
fail_on_unprocessable_document	bool For Azure blobs, set to false if you want to continue indexing if a document fails indexing.
index_storage_metadata_only_for_oversized_documents	bool For Azure blobs, set this property to true to still index storage metadata for blob content that is too large to process. Oversized blobs are treated as errors by default. For limits on blob size, see https://video2.skills-academy.com/azure/search/search-limits-quotas-capacity.
delimited_text_headers	str For CSV blobs, specifies a comma-delimited list of column headers, useful for mapping source fields to destination fields in an index.
delimited_text_delimiter	str For CSV blobs, specifies the end-of-line single-character delimiter for CSV files where each line starts a new document (for example, "\|").
first_line_contains_headers	bool For CSV blobs, indicates that the first (non-blank) line of each blob contains headers.
document_root	str For JSON arrays, given a structured or semi-structured document, you can specify a path to the array using this property.
data_to_extract	str or BlobIndexerDataToExtract Specifies the data to extract from Azure blob storage and tells the indexer which data to extract from image content when "imageAction" is set to a value other than "none". This applies to embedded image content in a .PDF or other application, or image files such as .jpg and .png, in Azure blobs. Known values are: "storageMetadata", "allMetadata", and "contentAndMetadata".
image_action	str or BlobIndexerImageAction Determines how to process embedded images and image files in Azure blob storage. Setting the "imageAction" configuration to any value other than "none" requires that a skillset also be attached to that indexer. Known values are: "none", "generateNormalizedImages", and "generateNormalizedImagePerPage".
allow_skillset_to_read_file_data	bool If true, will create a path //document//file_data that is an object representing the original file data downloaded from your blob data source. This allows you to pass the original file data to a custom skill for processing within the enrichment pipeline, or to the Document Extraction skill.
pdf_text_rotation_algorithm	str or BlobIndexerPDFTextRotationAlgorithm Determines algorithm for text extraction from PDF files in Azure blob storage. Known values are: "none" and "detectAngles".
execution_environment	str or IndexerExecutionEnvironment Specifies the environment in which the indexer should execute. Known values are: "standard" and "private".
query_timeout	str Increases the timeout beyond the 5-minute default for Azure SQL database data sources, specified in the format "hh:mm:ss".

Methods

as_dict	Return a dict that can be serialized using json.dump. Advanced usage might optionally use a callback as parameter: Key is the attribute name used in Python. Attr_desc is a dict of metadata. Currently contains 'type' with the msrest type and 'key' with the RestAPI encoded key. Value is the current value in this object. The string returned will be used to serialize the key. If the return type is a list, this is considered hierarchical result dict. See the three examples in this file: attribute_transformer full_restapi_key_transformer last_restapi_key_transformer If you want XML serialization, you can pass the kwargs is_xml=True.
deserialize	Parse a str using the RestAPI syntax and return a model.
enable_additional_properties_sending
from_dict	Parse a dict using given key extractor return a model. By default consider key extractors (rest_key_case_insensitive_extractor, attribute_key_case_insensitive_extractor and last_rest_key_case_insensitive_extractor)
is_xml_model
serialize	Return the JSON that would be sent to server from this model. This is an alias to as_dict(full_restapi_key_transformer, keep_readonly=False). If you want XML serialization, you can pass the kwargs is_xml=True.

as_dict

Return a dict that can be serialized using json.dump.

Advanced usage might optionally use a callback as parameter:

Key is the attribute name used in Python. Attr_desc is a dict of metadata. Currently contains 'type' with the msrest type and 'key' with the RestAPI encoded key. Value is the current value in this object.

The string returned will be used to serialize the key. If the return type is a list, this is considered hierarchical result dict.

See the three examples in this file:

attribute_transformer
full_restapi_key_transformer
last_restapi_key_transformer

If you want XML serialization, you can pass the kwargs is_xml=True.

as_dict(keep_readonly: bool = True, key_transformer: ~typing.Callable[[str, ~typing.Dict[str, ~typing.Any], ~typing.Any], ~typing.Any] = <function attribute_transformer>, **kwargs: ~typing.Any) -> MutableMapping[str, Any]

Parameters

Name	Description
key_transformer	<xref:function> A key transformer function.
keep_readonly	Default value: True

Returns

Type	Description
dict	A dict JSON compatible object

deserialize

Parse a str using the RestAPI syntax and return a model.

deserialize(data: Any, content_type: str | None = None) -> ModelType

Parameters

Name	Description
data Required	str A str using RestAPI structure. JSON by default.
content_type	str JSON by default, set application/xml if XML. Default value: None

Returns

Type	Description
	An instance of this model

Exceptions

Type	Description
DeserializationError if something went wrong

enable_additional_properties_sending

enable_additional_properties_sending() -> None

from_dict

Parse a dict using given key extractor return a model.

By default consider key extractors (rest_key_case_insensitive_extractor, attribute_key_case_insensitive_extractor and last_rest_key_case_insensitive_extractor)

from_dict(data: Any, key_extractors: Callable[[str, Dict[str, Any], Any], Any] | None = None, content_type: str | None = None) -> ModelType

Parameters

Name	Description
data Required	dict A dict using RestAPI structure
content_type	str JSON by default, set application/xml if XML. Default value: None
key_extractors	Default value: None

Returns

Type	Description
	An instance of this model

Exceptions

Type	Description
DeserializationError if something went wrong

is_xml_model

is_xml_model() -> bool

serialize

Return the JSON that would be sent to server from this model.

This is an alias to as_dict(full_restapi_key_transformer, keep_readonly=False).

If you want XML serialization, you can pass the kwargs is_xml=True.

serialize(keep_readonly: bool = False, **kwargs: Any) -> MutableMapping[str, Any]

Parameters

Name	Description
keep_readonly	bool If you want to serialize the readonly attributes Default value: False

Returns

Type	Description
dict	A dict JSON compatible object

Share via

IndexingParametersConfiguration Class

Constructor

Keyword-Only Parameters

Variables

Methods

as_dict

Parameters

Returns

deserialize

Parameters

Returns

Exceptions

enable_additional_properties_sending

from_dict

Parameters

Returns

Exceptions

is_xml_model

serialize

Parameters

Returns

Additional resources