TextCatalog Class

Reference

Definition

Namespace:: Microsoft.ML

Assembly:: Microsoft.ML.Transforms.dll

Package:: Microsoft.ML v3.0.1

Package:: Microsoft.ML v1.0.0

Package:: Microsoft.ML v1.1.0

Package:: Microsoft.ML v1.2.0

Package:: Microsoft.ML v1.3.1

Package:: Microsoft.ML v1.4.0

Package:: Microsoft.ML v1.5.5

Package:: Microsoft.ML v1.6.0

Package:: Microsoft.ML v1.7.0

Package:: Microsoft.ML v2.0.0

Important

Some information relates to prerelease product that may be substantially modified before it’s released. Microsoft makes no warranties, express or implied, with respect to the information provided here.

Collection of extension methods for the TransformsCatalog.

public static class TextCatalog

type TextCatalog = class

Public Module TextCatalog

Inheritance: Object
TextCatalog

Methods

ApplyWordEmbedding(TransformsCatalog+TextTransforms, String, String, String)	Create an WordEmbeddingEstimator, which is a text featurizer that converts vectors of text into numerical vectors using pre-trained embeddings models.
ApplyWordEmbedding(TransformsCatalog+TextTransforms, String, String, WordEmbeddingEstimator+PretrainedModelKind)	Create an WordEmbeddingEstimator, which is a text featurizer that converts a vector of text into a numerical vector using pre-trained embeddings models.
FeaturizeText(TransformsCatalog+TextTransforms, String, String)	Create a TextFeaturizingEstimator, which transforms a text column into a featurized vector of Single that represents normalized counts of n-grams and char-grams.
FeaturizeText(TransformsCatalog+TextTransforms, String, TextFeaturizingEstimator+Options, String[])	Create a TextFeaturizingEstimator, which transforms a text column into featurized vector of Single that represents normalized counts of n-grams and char-grams.
LatentDirichletAllocation(TransformsCatalog+TextTransforms, String, String, Int32, Single, Single, Int32, Int32, Int32, Int32, Int32, Int32, Int32, Boolean)	Create a LatentDirichletAllocationEstimator, which uses LightLDA to transform text (represented as a vector of floats) into a vector of Single indicating the similarity of the text with each topic identified.
NormalizeText(TransformsCatalog+TextTransforms, String, String, TextNormalizingEstimator+CaseMode, Boolean, Boolean, Boolean)	Creates a TextNormalizingEstimator, which normalizes incoming text in `inputColumnName` by optionally changing case, removing diacritical marks, punctuation marks, numbers, and outputs new text as `outputColumnName`.
ProduceHashedNgrams(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32, Boolean)	Create a NgramHashingEstimator, which copies the data from the column specified in `inputColumnName` to a new column: `outputColumnName` and produces a vector of counts of hashed n-grams.
ProduceHashedNgrams(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32, Boolean)	Create a NgramHashingEstimator, which takes the data from the multiple columns specified in `inputColumnNames` to a new column: `outputColumnName` and produces a vector of counts of hashed n-grams.
ProduceHashedWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32)	Create a WordHashBagEstimator, which maps the column specified in `inputColumnName` to a vector of counts of hashed n-grams in a new column named `outputColumnName`.
ProduceHashedWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Int32, Boolean, UInt32, Boolean, Int32)	Create a WordHashBagEstimator, which maps the multiple columns specified in `inputColumnNames` to a vector of counts of hashed n-grams in a new column named `outputColumnName`.
ProduceNgrams(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)	Creates a NgramExtractingEstimator which produces a vector of counts of n-grams (sequences of consecutive words) encountered in the input text.
ProduceWordBags(TransformsCatalog+TextTransforms, String, Char, Char, String, Int32)	Create a WordBagEstimator, which maps the column specified in `inputColumnName` to a vector of n-gram counts in a new column named `outputColumnName`.
ProduceWordBags(TransformsCatalog+TextTransforms, String, String, Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)	Create a WordBagEstimator, which maps the column specified in `inputColumnName` to a vector of n-gram counts in a new column named `outputColumnName`.
ProduceWordBags(TransformsCatalog+TextTransforms, String, String[], Int32, Int32, Boolean, Int32, NgramExtractingEstimator+WeightingCriteria)	Create a WordBagEstimator, which maps the multiple columns specified in `inputColumnNames` to a vector of n-gram counts in a new column named `outputColumnName`.
RemoveDefaultStopWords(TransformsCatalog+TextTransforms, String, String, StopWordsRemovingEstimator+Language)	Create a CustomStopWordsRemovingEstimator, which copies the data from the column specified in `inputColumnName` to a new column: `outputColumnName` and removes predifined set of text specific for `language` from it.
RemoveStopWords(TransformsCatalog+TextTransforms, String, String, String[])	Create a CustomStopWordsRemovingEstimator, which copies the data from the column specified in `inputColumnName` to a new column: `outputColumnName` and removes text specified in `stopwords` from it.
TokenizeIntoCharactersAsKeys(TransformsCatalog+TextTransforms, String, String, Boolean)	Create a TokenizingByCharactersEstimator, which tokenizes by splitting text into sequences of characters using a sliding window.
TokenizeIntoWords(TransformsCatalog+TextTransforms, String, String, Char[])	Create a WordTokenizingEstimator, which tokenizes input text using `separators` as separators.

Applies to

Share via

TextCatalog Class

Definition

Methods

Applies to

Additional resources