Cross-Validation Formulas
When you generate a cross-validation report, it contains accuracy measures for each model, depending on the type of mining model (that is, the algorithm that was used to create the model), the data type of the predictable attribute, and the predictable attribute value, if any.
This section lists the measures used in the cross-validation report and describes the method of calculation.
For a breakdown of the accuracy measures by model type, see Measures in the Cross-Validation Report.
Formulas used for Cross-Validation Measures
Note
Important: These measures of accuracy are computed for each target attribute. For each attribute you can specify or omit a target value. If a case in the data set does not have any value for the target attribute, the case is treated as having a special value called the missing value. Rows that have missing values are not counted when computing the accuracy measure for a particular target attribute. Note that because the scores are computed for each attribute individually, if values are present for the target attribute but missing for other attributes, it does not affect the score for the target attribute.
Measure |
Applies To |
Implementation |
---|---|---|
True positive |
Discrete attribute, value is specified |
Count of cases that meet these conditions:
|
True Negative |
Discrete attribute, value is specified |
Count of cases that meet these conditions:
|
False positive |
Discrete attribute, value is specified |
Count of cases that meet these conditions:
|
False Negative |
Discrete attribute, value is specified |
Count of cases that meet these conditions:
|
Pass/fail |
Discrete attribute, no specified target |
Count of cases that meet these conditions:
|
Lift |
Discrete attribute. Target value can be specified but is not required. |
The mean log likelihood for all rows with values for the target attribute, where log likelihood for each case is calculated as Log(ActualProbability/MarginalProbability). To compute the mean, the sum of the log likelihood values is divided by the number of rows in the input dataset, excluding rows with missing values for the target attribute. Lift can be either a negative or positive value. A positive value means an effective model that outperforms the random guess. |
Log score |
Discrete attribute. Target value can be specified but is not required. |
Log of the actual probability for each case, summed, and then divided by the number of rows in the input dataset, excluding rows with missing values for the target attribute. Because probability is represented as a decimal fraction, log scores are always negative numbers. A score that is closer to 0 is a better score. |
Case likelihood |
Cluster |
Sum of the cluster likelihood scores for all cases, divided by the number of cases in the partition, excluding rows with missing values for the target attribute. |
Mean absolute error |
Continuous attribute |
Sum of the absolute error for all cases in the partition, divided by the number of cases in the partition. |
Root mean square error |
Continuous attribute |
Square root of the mean squared error for the partition. |
Root mean squared error |
Discrete attribute. Target value can be specified but is not required. |
Square root of the mean of the squares of complement of the probability score, divided by the number of cases in the partition, excluding rows with missing values for the target attribute. |
Root mean squared error |
Discrete attribute, no specified target. |
Square root of the mean of the squares of complement of the probability score, divided by the number of cases in the partition, excluding cases with missing values for the target attribute. |