rxCovCor: Covariance/Correlation Matrix
Description
Calculate the covariance, correlation, or sum of squares / cross-product matrix for a set of variables.
Usage
rxCovCor(formula, data, pweights = NULL, fweights = NULL, rowSelection = NULL,
transforms = NULL, transformObjects = NULL,
transformFunc = NULL, transformVars = NULL,
transformPackages = NULL,transformEnvir = NULL,
keepAll = TRUE, varTol = 1e-12, type = "Cov",
blocksPerRead = rxGetOption("blocksPerRead"),
reportProgress = rxGetOption("reportProgress"), verbose = 0,
computeContext = rxGetOption("computeContext"), ...)
rxCov(formula, data, pweights = NULL, fweights = NULL, rowSelection = NULL,
transforms = NULL, transformObjects = NULL,
transformFunc = NULL, transformVars = NULL,
transformPackages = NULL, transformEnvir = NULL,
keepAll = TRUE, varTol = 1e-12,
blocksPerRead = rxGetOption("blocksPerRead"),
reportProgress = rxGetOption("reportProgress"), verbose = 0,
computeContext = rxGetOption("computeContext"), ...)
rxCor(formula, data, pweights = NULL, fweights = NULL, rowSelection = NULL,
transforms = NULL, transformObjects = NULL,
transformFunc = NULL, transformVars = NULL,
transformPackages = NULL, transformEnvir = NULL,
keepAll = TRUE, varTol = 1e-12,
blocksPerRead = rxGetOption("blocksPerRead"),
reportProgress = rxGetOption("reportProgress"), verbose = 0,
computeContext = rxGetOption("computeContext"), ...)
rxSSCP(formula, data, pweights = NULL, fweights = NULL, rowSelection = NULL,
transforms = NULL, transformObjects = NULL,
transformFunc = NULL, transformVars = NULL,
transformPackages = NULL, transformEnvir = NULL,
keepAll = TRUE, varTol = 1e-12,
blocksPerRead = rxGetOption("blocksPerRead"),
reportProgress = rxGetOption("reportProgress"), verbose = 0,
computeContext = rxGetOption("computeContext"), ...)
## S3 method for class `rxCovCor':
print (x, header = TRUE, ...)
Arguments
formula
formula, as described in rxFormula, with all the terms on the right-hand side of the ~
separated by +
operators. Each term may be a single variable, a transformed variable, or the interaction of (transformed) variables separated by the :
operator. e.g. ~ x1 + log(x2) + x3 : x4
data
either a data source object, a character string specifying a .xdf file, or a data frame object.
pweights
character string specifying the variable to use as probability weights for the observations. Only one of pweights
and fweights
may be specified at a time.
fweights
character string specifying the variable to use as frequency weights for the observations. Only one of pweights
and fweights
may be specified at a time.
rowSelection
name of a logical variable in the data set (in quotes) or a logical expression using variables in the data set to specify row selection. For example, rowSelection = "old"
will use only observations in which the value of the variable old
is TRUE
. rowSelection = (age > 20) & (age < 65) & (log(income) > 10)
will use only observations in which the value of the age
variable is between 20 and 65 and the value of the log
of the income
variable is greater than 10. The row selection is performed after processing any data transformations (see the arguments transforms
or transformFunc
). As with all expressions, rowSelection
can be defined outside of the function call using the expression function.
transforms
an expression of the form list(name = expression, ...)
representing the first round of variable transformations. As with all expressions, transforms
(or rowSelection
) can be defined outside of the function call using the expression function.
transformObjects
a named list containing objects that can be referenced by transforms
, transformsFunc
, and rowSelection
.
transformFunc
variable transformation function. See rxTransform for details.
transformVars
character vector of input data set variables needed for the transformation function. See rxTransform for details.
transformPackages
character vector defining additional R packages (outside of those specified in rxGetOption("transformPackages")
) to be made available and preloaded for use in variable transformation functions, e.g., those explicitly defined in RevoScaleR functions via their transforms
and transformFunc
arguments or those defined implicitly via their formula
or rowSelection
arguments. The transformPackages
argument may also be NULL
, indicating that no packages outside rxGetOption("transformPackages")
will be preloaded.
transformEnvir
user-defined environment to serve as a parent to all environments developed internally and used for variable data transformation. If transformEnvir = NULL
, a new "hash" environment with parent baseenv()
is used instead.
keepAll
logical value. If TRUE
, all of the columns are kept in the returned matrix. If FALSE
, columns (and corresponding rows in the returned matrix) that are symbolic linear combinations of other columns, see alias, are dropped.
varTol
numeric tolerance used to identify columns in the data matrix that have near zero variance. If the variance of a column is less than or equal to varTol
and keepAll=TRUE
, that column is dropped from the data matrix.
type
character string specifying the type of matrix to return. The supported types are:
"SSCP"
: Sums of Squares / Cross Products matrix."Cov"
: covariance matrix."Cor"
: correlation matrix.
Thetype
argument is case insensitive, e.g."SSCP"
and"sscp"
are equivalent.
blocksPerRead
number of blocks to read for each chunk of data read from the data source.
reportProgress
integer value with options:
0
: no progress is reported.1
: the number of processed rows is printed and updated.2
: rows processed and timings are reported.3
: rows processed and all timings are reported.
verbose
integer value. If 0
, no additional output is printed. If 1
, additional summary information is printed.
computeContext
a valid RxComputeContext. The RxSpark
and RxHadoopMR
compute contexts distribute the computation among the nodes specified by the compute context; for other compute contexts, the computation is distributed if possible on the local computer.
...
additional arguments to be passed directly to the Revolution Compute Engine.
x
an object of class "rxCovCor".
header
logical value. If TRUE
, header information is printed.
Details
The rxCovCor
, and the appropriate convenience functions rxCov
,
rxCor
and rxSSCP
, calculates either the covariance, Pearson's
correlation, or a sums of squares/cross-product matrix, which may or may not
use probability or frequency weights.
The sums of squares/cross-product matrix differs from the other two output types
in that an initial column of 1
s or square root of the weights, if specified,
is added to the data matrix prior to multiplication so the first row and first
column must be dropped from the output to obtain the cross-product of just the
specified data matrix.
Value
For rxCovCor
, an object of class "rxCovCor" with the following list
elements:
CovCor
numeric matrix representing either the (weighted) covariance, correlation, or sum of squares/cross-product.
StdDevs
For type = "Cor" and "Cov", numeric vector of (weighted) standard deviations of the columns. For type = "SSCP", the standard deviations are not calculated and the return value is numeric(0)
.
Means
numeric vector containing the (weighted) column means.
valid.obs
number of valid observations.
missing.obs
number of missing observations.
SumOfWeights
either the sum of the weights or NA
if no weights are specified.
DroppedVars
character vector containing the names of the data columns that were dropped during the calculations.
DroppedVarIndexes
integer vector containing the indices of the data columns that were dropped during the calculations.
params
parameters sent to Microsoft R Services Compute Engine.
call
the matched call.
formula
formula as described in rxFormula.
For rxCov
, a covariance matrix.
For rxCor
, a correlation matrix.
For rxSSCP
, a sum of squares/cross-product matrix.
Author(s)
Microsoft Corporation Microsoft Technical Support
See Also
cov, cor, rxCovData, rxCorData.
Examples
# Obtain all components from rxCovCor
form <- ~ Sepal.Length + Sepal.Width + Petal.Length + Petal.Width
allCov <- rxCovCor(form, data = iris, type = "Cov")
allCov
# Direct access to covariance or correlation matrix
rxCov(form, data = iris, reportProgress = 0)
cov(iris[,1:4])
rxCor(form, data = iris, reportProgress = 0)
cor(iris[,1:4])
# Cross-product of data matrix (need to drop first row and column of output)
rxSSCP(form, data = iris, reportProgress = 0)[-1, -1]
crossprod(as.matrix(iris[,1:4]))