MicrosoftLanguageStemmingTokenizer Class

Reference

Divides text using language-specific rules and reduces words to their base forms.

All required parameters must be populated in order to send to server.

Inheritance: azure.search.documents.indexes._generated.models._models_py3.LexicalTokenizer

MicrosoftLanguageStemmingTokenizer

Constructor

MicrosoftLanguageStemmingTokenizer(*, name: str, max_token_length: int = 255, is_search_tokenizer: bool = False, language: str | _models.MicrosoftStemmingTokenizerLanguage | None = None, **kwargs: Any)

Keyword-Only Parameters

Name	Description
name	str The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters. Required.
max_token_length	int The maximum token length. Tokens longer than the maximum length are split. Maximum token length that can be used is 300 characters. Tokens longer than 300 characters are first split into tokens of length 300 and then each of those tokens is split based on the max token length set. Default is 255. Default value: 255
is_search_tokenizer	bool A value indicating how the tokenizer is used. Set to true if used as the search tokenizer, set to false if used as the indexing tokenizer. Default is false.
language	str or MicrosoftStemmingTokenizerLanguage The language to use. The default is English. Known values are: "arabic", "bangla", "bulgarian", "catalan", "croatian", "czech", "danish", "dutch", "english", "estonian", "finnish", "french", "german", "greek", "gujarati", "hebrew", "hindi", "hungarian", "icelandic", "indonesian", "italian", "kannada", "latvian", "lithuanian", "malay", "malayalam", "marathi", "norwegianBokmaal", "polish", "portuguese", "portugueseBrazilian", "punjabi", "romanian", "russian", "serbianCyrillic", "serbianLatin", "slovak", "slovenian", "spanish", "swedish", "tamil", "telugu", "turkish", "ukrainian", and "urdu".

Variables

Name	Description
odata_type	str A URI fragment specifying the type of tokenizer. Required.
name	str The name of the tokenizer. It must only contain letters, digits, spaces, dashes or underscores, can only start and end with alphanumeric characters, and is limited to 128 characters. Required.
max_token_length	int The maximum token length. Tokens longer than the maximum length are split. Maximum token length that can be used is 300 characters. Tokens longer than 300 characters are first split into tokens of length 300 and then each of those tokens is split based on the max token length set. Default is 255.
is_search_tokenizer	bool A value indicating how the tokenizer is used. Set to true if used as the search tokenizer, set to false if used as the indexing tokenizer. Default is false.
language	str or MicrosoftStemmingTokenizerLanguage The language to use. The default is English. Known values are: "arabic", "bangla", "bulgarian", "catalan", "croatian", "czech", "danish", "dutch", "english", "estonian", "finnish", "french", "german", "greek", "gujarati", "hebrew", "hindi", "hungarian", "icelandic", "indonesian", "italian", "kannada", "latvian", "lithuanian", "malay", "malayalam", "marathi", "norwegianBokmaal", "polish", "portuguese", "portugueseBrazilian", "punjabi", "romanian", "russian", "serbianCyrillic", "serbianLatin", "slovak", "slovenian", "spanish", "swedish", "tamil", "telugu", "turkish", "ukrainian", and "urdu".

Methods

as_dict	Return a dict that can be serialized using json.dump. Advanced usage might optionally use a callback as parameter: Key is the attribute name used in Python. Attr_desc is a dict of metadata. Currently contains 'type' with the msrest type and 'key' with the RestAPI encoded key. Value is the current value in this object. The string returned will be used to serialize the key. If the return type is a list, this is considered hierarchical result dict. See the three examples in this file: attribute_transformer full_restapi_key_transformer last_restapi_key_transformer If you want XML serialization, you can pass the kwargs is_xml=True.
deserialize	Parse a str using the RestAPI syntax and return a model.
enable_additional_properties_sending
from_dict	Parse a dict using given key extractor return a model. By default consider key extractors (rest_key_case_insensitive_extractor, attribute_key_case_insensitive_extractor and last_rest_key_case_insensitive_extractor)
is_xml_model
serialize	Return the JSON that would be sent to server from this model. This is an alias to as_dict(full_restapi_key_transformer, keep_readonly=False). If you want XML serialization, you can pass the kwargs is_xml=True.

as_dict

Return a dict that can be serialized using json.dump.

Advanced usage might optionally use a callback as parameter:

Key is the attribute name used in Python. Attr_desc is a dict of metadata. Currently contains 'type' with the msrest type and 'key' with the RestAPI encoded key. Value is the current value in this object.

The string returned will be used to serialize the key. If the return type is a list, this is considered hierarchical result dict.

See the three examples in this file:

attribute_transformer
full_restapi_key_transformer
last_restapi_key_transformer

If you want XML serialization, you can pass the kwargs is_xml=True.

as_dict(keep_readonly: bool = True, key_transformer: ~typing.Callable[[str, ~typing.Dict[str, ~typing.Any], ~typing.Any], ~typing.Any] = <function attribute_transformer>, **kwargs: ~typing.Any) -> MutableMapping[str, Any]

Parameters

Name	Description
key_transformer	<xref:function> A key transformer function.
keep_readonly	Default value: True

Returns

Type	Description
dict	A dict JSON compatible object

deserialize

Parse a str using the RestAPI syntax and return a model.

deserialize(data: Any, content_type: str | None = None) -> ModelType

Parameters

Name	Description
data Required	str A str using RestAPI structure. JSON by default.
content_type	str JSON by default, set application/xml if XML. Default value: None

Returns

Type	Description
	An instance of this model

Exceptions

Type	Description
DeserializationError if something went wrong

enable_additional_properties_sending

enable_additional_properties_sending() -> None

from_dict

Parse a dict using given key extractor return a model.

By default consider key extractors (rest_key_case_insensitive_extractor, attribute_key_case_insensitive_extractor and last_rest_key_case_insensitive_extractor)

from_dict(data: Any, key_extractors: Callable[[str, Dict[str, Any], Any], Any] | None = None, content_type: str | None = None) -> ModelType

Parameters

Name	Description
data Required	dict A dict using RestAPI structure
content_type	str JSON by default, set application/xml if XML. Default value: None
key_extractors	Default value: None

Returns

Type	Description
	An instance of this model

Exceptions

Type	Description
DeserializationError if something went wrong

is_xml_model

is_xml_model() -> bool

serialize

Return the JSON that would be sent to server from this model.

This is an alias to as_dict(full_restapi_key_transformer, keep_readonly=False).

If you want XML serialization, you can pass the kwargs is_xml=True.

serialize(keep_readonly: bool = False, **kwargs: Any) -> MutableMapping[str, Any]

Parameters

Name	Description
keep_readonly	bool If you want to serialize the readonly attributes Default value: False

Returns

Type	Description
dict	A dict JSON compatible object

Share via

MicrosoftLanguageStemmingTokenizer Class

Constructor

Keyword-Only Parameters

Variables

Methods

as_dict

Parameters

Returns

deserialize

Parameters

Returns

Exceptions

enable_additional_properties_sending

from_dict

Parameters

Returns

Exceptions

is_xml_model

serialize

Parameters

Returns

Additional resources