minCount: modalità conteggio di selezione delle funzionalità

Articolo
07/04/2024

Modalità di conteggio di selezione delle funzionalità usata nella trasformazione della selezione delle funzionalità selectFeatures.

Utilizzo

  minCount(count = 1, ...)

Argomenti

`count`

Soglia per la selezione delle funzionalità basata sul conteggio. Una funzionalità viene selezionata se e solo se almeno count esempi hanno un valore non predefinito nella funzionalità. Il valore predefinito è 1.

`...`

Argomenti aggiuntivi da passare direttamente al motore di calcolo Microsoft.

Dettagli

Quando si usa la modalità conteggio nella trasformazione di selezione delle funzionalità, una funzionalità viene selezionata se il numero di esempi ha almeno gli esempi di conteggio specificati di valori non predefiniti nella funzionalità. La trasformazione di selezione delle funzionalità della modalità conteggio è utile quando viene applicata insieme a una trasformazione hash categorica. Vedere anche categoricalHash. La selezione delle funzionalità di conteggio può rimuovere le funzionalità generate dalla trasformazione hash senza dati negli esempi.

Valore

Stringa di caratteri che definisce la modalità di conteggio.

Autore/i

Microsoft Corporation Microsoft Technical Support

Vedi anche

mutualInformation selectFeatures

Esempi


 trainReviews <- data.frame(review = c( 
         "This is great",
         "I hate it",
         "Love it",
         "Do not like it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I kind of hate it",
         "I do like it",
         "I really hate it",
         "It is very good",
         "I hate it a bunch",
         "I love it a bunch",
         "I hate it",
         "I like it very much",
         "I hate it very much.",
         "I really do love it",
         "I really do hate it",
         "Love it!",
         "Hate it!",
         "I love it",
         "I hate it",
         "I love it",
         "I hate it",
         "I love it"),
      like = c(TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE,
         FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, FALSE, TRUE, 
         FALSE, TRUE, FALSE, TRUE), stringsAsFactors = FALSE
     )

     testReviews <- data.frame(review = c(
         "This is great",
         "I hate it",
         "Love it",
         "Really like it",
         "I hate it",
         "I like it a lot",
         "I love it",
         "I do like it",
         "I really hate it",
         "I love it"), stringsAsFactors = FALSE)

 # Use a categorical hash transform which generated 128 features.
 outModel1 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7)))
 summary(outModel1)

 # Apply a categorical hash transform and a count feature selection transform
 # which selects only those hash features that has value.
 outModel2 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(
   categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7), 
   selectFeatures("reviewCatHash", mode = minCount())))
 summary(outModel2)

 # Apply a categorical hash transform and a mutual information feature selection transform
 # which selects those features appearing with at least a count of 5.
 outModel3 <- rxLogisticRegression(like~reviewCatHash, data = trainReviews, l1Weight = 0, 
     mlTransforms = list(
   categoricalHash(vars = c(reviewCatHash = "review"), hashBits = 7), 
   selectFeatures("reviewCatHash", mode = minCount(count = 5))))
 summary(outModel3)

Condividi tramite