Index
MatrixLM.Mlm
MatrixLM.Predictors
MatrixLM.RawData
MatrixLM.Response
MatrixLM.add_intercept
MatrixLM.calc_coeffs
MatrixLM.calc_preds
MatrixLM.calc_preds!
MatrixLM.calc_resid
MatrixLM.calc_resid!
MatrixLM.calc_sigma
MatrixLM.calc_sigma
MatrixLM.calc_var
MatrixLM.center
MatrixLM.coef
MatrixLM.contr
MatrixLM.cov_est
MatrixLM.design_matrix
MatrixLM.design_matrix_names
MatrixLM.diagonal
MatrixLM.diagonal
MatrixLM.diagonal
MatrixLM.fitted
MatrixLM.get_X
MatrixLM.get_Y
MatrixLM.get_Z
MatrixLM.get_dummy
MatrixLM.get_dummy
MatrixLM.kron_diag
MatrixLM.mlm
MatrixLM.mlm_fit
MatrixLM.mlm_fit
MatrixLM.mlm_perms
MatrixLM.perm_pvals
MatrixLM.predict
MatrixLM.remove_intercept
MatrixLM.resid
MatrixLM.shrink_sigma
MatrixLM.shuffle_cols
MatrixLM.shuffle_rows
MatrixLM.t_stat
MatrixLM.@mlmformula
Description
MatrixLM.Mlm
— TypeMlm(B::Array{Float64,2}, varB::Array{Float64,2}, sigma::Array{Float64,2},
data::RawData, weights, targetType, lambda::Float64)
Type for storing the results of an mlm model fit.
MatrixLM.Predictors
— TypePredictors(X::AbstractArray{Float64,2}, Z::AbstractArray{Float64,2},
hasXIntercept::Bool, hasZIntercept::Bool)
Type for storing predictor (covariate) matrices. Also stores boolean variables hasXIntercept and hasZIntercept (if they are not supplied, they default to false).
MatrixLM.RawData
— TypeRawData(response::Response, predictors::Predictors)
Type for storing response and predictor matrices
Also stores dimensions of matrices as n, m, p, and q.
n
: number of rows of X = number of rows of Ym
: number of rows of Z = number of columns of Yp
: number of columns of Xq
: number of columns of Z
The constructor will compute n, m, p, and q based on the response and predictor matrices and assert that they are consistent.
MatrixLM.Response
— TypeResponse(Y::AbstractArray{Float64,2})
Type for storing response matrix
MatrixLM.add_intercept
— Methodadd_intercept(A::AbstractArray{Float64,2})
Insert an intercept column (column of ones) at the beginning of a 2d array.
Arguments
A::AbstractArray{Float64,2}
: 2d array of floats
Value
Returns A with an intercept column
MatrixLM.calc_coeffs
— Methodcalc_coeffs(X::AbstractArray{Float64,2}, Y::AbstractArray{Float64,2},
Z::AbstractArray{Float64,2}, XTX::AbstractArray{Float64,2},
ZTZ::AbstractArray{Float64,2})
Calculates the the coefficient estimates
Arguments
X::AbstractArray{Float64,2}
: The row covariates, with all categorical variables coded in appropriate contrastsY::AbstractArray{Float64,2}
: The multivariate responseZ::AbstractArray{Float64,2}
: The column covariates, with all categorical variables coded in appropriate contrastsXTX::AbstractArray{Float64,2}
: X*transpose(X) product as a 2d array of floatsZTZ::AbstractArray{Float64,2}
: Z*transpose(Z) product as a 2d array of floats
Value
2d array of floats
MatrixLM.calc_preds!
— Methodcalc_preds!(preds::AbstractArray{Float64,2},
X::AbstractArray{Float64,2},
Z::AbstractArray{Float64,2},
B::AbstractArray{Float64,2})
Predict values in place
Arguments
preds::AbstractArray{Float64,2}
: The predicted values, to be updated in placeX::AbstractArray{Float64,2}
: The row covariates, standardized as necessaryZ::AbstractArray{Float64,2}
: The column covariates, standardized as necessaryB::AbstractArray{Float64,2}
: Coefficient estimates
Value
None; updates predicted values in place.
MatrixLM.calc_preds
— Methodcalc_preds(X::AbstractArray{Float64,2},
Z::AbstractArray{Float64,2},
B::AbstractArray{Float64,2})
Predict values
Arguments
X::AbstractArray{Float64,2}
: The row covariates, standardized as necessaryZ::AbstractArray{Float64,2}
: The column covariates, standardized as necessaryB::AbstractArray{Float64,2}
: Coefficient estimates
Value
2d array of floats
MatrixLM.calc_resid!
— Methodcalc_resid!(resid::AbstractArray{Float64,2},
X::AbstractArray{Float64,2},
Y::AbstractArray{Float64,2},
Z::AbstractArray{Float64,2},
B::AbstractArray{Float64,2})
Calculate residuals in place
Arguments
resid
: 2d array of floats consisting of the residuals, to be updated in placeX
: 2d array of floats consisting of the row covariates, standardized as necessaryY
: 2d array of floats consisting of the multivariate response observations, standardized as necessaryZ
: 2d array of floats consisting of the column covariates, standardized as necessaryB
: 2d array of floats consisting of coefficient estimates
Value
None; updates residuals in place.
MatrixLM.calc_resid
— Methodcalc_resid(X::AbstractArray{Float64,2}, Y::AbstractArray{Float64,2},
Z::AbstractArray{Float64,2}, B::AbstractArray{Float64,2})
Calculate residuals
Arguments
X::AbstractArray{Float64,2}
: The row covariates, standardized as necessaryY::AbstractArray{Float64,2}
: The multivariate response observations, standardized as necessaryZ::AbstractArray{Float64,2}
: The column covariates, standardized as necessaryB::AbstractArray{Float64,2}
: The coefficient estimates
Value
2d array of floats
MatrixLM.calc_sigma
— Methodcalc_sigma(resid::AbstractArray{Float64,2}, targetType::AbstractString)
Estimates variance of errors and the shrinkage coefficient, with variance shrinkage.
Arguments
resid::AbstractArray{Float64,2}
: 2d array of floats consisting of the residualstargetType::AbstractString
: Indicating the target type toward which to shrink the variance. Acceptable inputs are "A", "B", "C", and "D".- "A": Target is identity matrix
- "B": Target is diagonal matrix with constant diagonal
- "C": Target is has same diagonal element, and same off-diagonal element
- "D": Target is diagonal matrix with unequal entries
Value
Tuple
sigma
: 2d array of floats; shrunk estimated variance of errorslambda
: floating scalar; estimated shrinkage coefficient (0 = no shrinkage, 1 = complete shrinkage)
MatrixLM.calc_sigma
— Methodcalc_sigma(resid::AbstractArray{Float64,2}, targetType::Nothing)
Estimates variance of errors and the shrinkage coefficient, without variance shrinkage.
Arguments
resid::AbstractArray{Float64,2}:
2d array of floats consisting of the residualstargetType
:nothing
Value
Tuple
sigma
: 2d array of floats; estimated variance of errorslambda
: 0.0
Some notes
Since this version of calc_sigma
does not implement variance shrinkage, the shrinkage coefficient lambda is 0.
MatrixLM.calc_var
— Methodcalc_var(X::AbstractArray{Float64,2}, Z::AbstractArray{Float64,2},
XTX::AbstractArray{Float64,2}, ZTZ::AbstractArray{Float64,2},
sigma::AbstractArray{Float64,2})
Calculate the variance (diagonal of the covariance matrix) of the coefficient estimates.
Arguments
X::AbstractArray{Float64,2}
: The row covariates, with all categorical variables coded in appropriate contrastsZ::AbstractArray{Float64,2}
: The column covariates, with all categorical variables coded in appropriate contrastsXTX::AbstractArray{Float64,2}
: X*transpose(X) product as a 2d array of floatsZTZ::AbstractArray{Float64,2}
: Z*transpose(Z) product as a 2d array of floatssigma::AbstractArray{Float64,2}
: 2d array of floats consisting of the estimated sigma
Value
2d array of floats
MatrixLM.center
— Methodcenter(A::AbstractArray{Float64,2})
Centers columns of a 2d array
Arguments
A::AbstractArray{Float64,2}
: 2d array of floats
Value
2d array of floats
MatrixLM.coef
— Methodcoef(MLM::Mlm)
Extracts coefficients from Mlm object
Arguments
MLM::Mlm
: Mlm object
Value
2d array of floats
MatrixLM.contr
— Functioncontr(df::DataFrames.DataFrame, cVars::AbstractArray{Symbol,1},
cTypes::AbstractArray{String,1}=repeat(["treat"], inner=length(cVars)),
trtRefs::AbstractArray= repeat([nothing], inner=length(cVars)))
Converts categorical variables in a DataFrame to specified contrast types. All other variables are left as-is.
Arguments
df::DataFrames.DataFrame
: DataFrame of variablescVar::Symbol
: symbol for the categorical variable in df to be convertedcTypes::AbstractArray{String,1}
: 1d array of character strings of the same length ascVars
, indicating the types of contrasts to use. Defaults to treatment contrasts ("treat") for all variables incVars
. Other options include "sum" for sum contrasts, "noint" for treatment contrasts with no intercept, and "sumnoint" for sum contrasts with no intercept. For "treat"cTypes
, you can also specify the level to use as the reference treatment usingtrtRefs
.trtRefs::AbstractArray
: optional 1d array of character strings of the same length ascVars
, specifying the level to use as the references for treatment contrasts. Defaults to nothing for all variables incVars
.
Value
DataFrame with same variables as the original DataFrame, but categorical variables converted to dummy contrasts.
Some notes
If cVars
consists of only an empty Symbol, i.e. cVars=[Symbol()]
, this will signal to the function that no contrasts should be created. The original DataFrame will be returned.
MatrixLM.cov_est
— Methodcov_est(resid::AbstractArray{Float64,2})
Estimates error variance and its variance/covariance
Arguments
resid::AbstractArray{Float64,2}
: 2d array of floats consisting of the residuals
Value
Tuple
- est: 2d array of floats; estimate
- varest: 2d array of floats; variance/covariance estimate
2d array of floats
MatrixLM.design_matrix
— Methoddesign_matrix(frml, df::DataFrame,cntrst::Dict{Symbol, AbstractContrasts})
Returns the design matrix based on the formula terms and the data source.
# Arguments
- `frml`: formula terms generated by the macro `@mlmformula`
- `df`: dataframe containing the data source table
- `cntrst`: dictionnary describing encoding method for categorical or ordinal variables, based on `StatsModels.jl`
design_matrix(frml, df::DataFrame, cntrst::Vector)
Returns the design matrix based on the formula terms and the data source.
# Arguments
- `frml`: formula terms generated by the macro `@mlmformula`
- `df`: dataframe containing the data source table
- `cntrst`: A vector containing tuples of variable names and corresponding encoding function.
design_matrix(frml, df::DataFrame)
Returns the default design matrix based on the formula terms and the data source, where
all the categorical variables are dummy coded.
# Arguments
- `frml`: formula terms generated by the macro `@mlmformula`
- `df`: dataframe containing the data source table
MatrixLM.design_matrix_names
— Methoddesign_matrix_names(frml, df,cntrst::Dict{Symbol, AbstractContrasts})
Returns the columns names of the design matrix based on the formula terms and the data source.
# Arguments
- `frml`: formula terms generated by the macro `@mlmformula`
- `df`: dataframe containing the data source table
- `cntrst`: dictionnary describing encoding method for categorical or ordinal variables, based on `StatsModels.jl`
design_matrix_names(frml, df::DataFrame, cntrst::Vector)
Returns the design matrix based on the formula terms and the data source.
# Arguments
- `frml`: formula terms generated by the macro `@mlmformula`
- `df`: dataframe containing the data source table
- `cntrst`: A vector containing tuples of variable names and corresponding encoding function.
design_matrix_names(frml, df::DataFrame)
Returns the columns names of the design matrix based on the formula terms and the data source, where
all the categorical variables are dummy coded.
# Arguments
- `frml`: formula terms generated by the macro `@mlmformula`
- `df`: dataframe containing the data source table
MatrixLM.diagonal
— Methoddiagonal(A::AbstractArray{Float64,2})
Get the diagonal of a 2d array of floats. This just calls the base diag function.
Arguments
A::AbstractArray{Float64,2}
: 2d array of floats
Value
1d array of floats
MatrixLM.diagonal
— Methoddiagonal(A::AbstractArray{Float64,1})
Get the diagonal of a 1d array of floats. Behaves like an identity function (returns itself).
Arguments
A::AbstractArray{Float64,1}
: 1d array of floats
Value
1d array of floats
Some notes
Originally intended for use when A is a 1 by 1 array, so may have unintended consequences for a 1d array of length > 1.
MatrixLM.diagonal
— Methoddiagonal(A::Float64)
Get the diagonal of a single scalar (float) value. Behaves like an identity function (returns itself).
Arguments
A::Float64
: floating scalar
Value
Floating scalar
MatrixLM.fitted
— Methodfitted(MLM::Mlm)
Calculate fitted values of an Mlm object
Arguments
MLM::Mlm
: Mlm object
Value
Response object
MatrixLM.get_X
— Methodget_X(data::RawData)
Extract X matrix from RawData object
Arguments
data::RawData
: RawData object
Value
2d array
MatrixLM.get_Y
— Methodget_Y(data::RawData)
Extract Y matrix from RawData object
Arguments
data::RawData
: RawData object
Value
2d array
MatrixLM.get_Z
— Methodget_Z(data::RawData)
Extract Z matrix from RawData object
Arguments
data::RawData
: RawData object
Value
2d array
MatrixLM.get_dummy
— Methodget_dummy(df::DataFrames.DataFrame, cVar::Symbol,
cType::String, trtRef::Nothing)
Convert categorical variable to dummy indicators using specified contrast type. This covers all cases except for treatment contrasts with a specified reference level.
Arguments
df::DataFrames.DataFrame
: DataFrame of variablescVar::Symbol
: symbol for the categorical variable in df to be convertedcType::String
: character string indicating the type of contrast to use forcVar
trtRef::Nothing
: nothing
Value
DataFrame of dummy variables for the specified categorical variable
MatrixLM.get_dummy
— Methodget_dummy(df::DataFrames.DataFrame, cVar::Symbol,
cType::String, trtRef::String)
Convert categorical variables to for treatment contrasts with a specified reference level.
Arguments
df::DataFrames.DataFrame
: DataFrame of variablescVar::Symbol
: symbol for the categorical variable in df to be convertedcType::String
: character string indicating the type of contrast to use forcVar
trtRef::String
: character string specifying the level in cVar to use as the reference
Value
DataFrame of dummy variables for the specified categorical variable
MatrixLM.kron_diag
— Methodkron_diag(A, B)
Compute the diagonal of the Kronecker product of arrays or scalars
Arguments
A
: square 2d array of floats, a 1d array of floats, or a scalarB
: square 2d array of floats, a 1d array of floats, or a scalar
Value
2d array of floats
MatrixLM.mlm
— Methodmlm(data::RawData; addXIntercept::Bool=true, addZIntercept::Bool=true, weights=nothing, targetType=nothing)
Matrix linear model using least squares method. Column weighted least squares and shrinkage of the variance of the errors are options.
Arguments
data::RawData
: RawData object
Keyword arguments
addXIntercept::Bool
: boolean flag indicating whether or not to include anX
intercept (row main effects). Defaults totrue
.addZIntercept::Bool
: boolean flag indicating whether or not to include aZ
intercept (column main effects). Defaults totrue
.weights
: 1d array of floats to use as column weights forY
, ornothing
. If the former, must be the same length as the number of columns ofY
. Defaults tonothing
.targetType
: string indicating the target type toward which to shrink the error variance, ornothing
. If the former, acceptable inputs are "A", "B", "C", and "D". Defaults tonothing
.- "A": Target is identity matrix
- "B": Target is diagonal matrix with constant diagonal
- "C": Target is has same diagonal element, and same off-diagonal element
- "D": Target is diagonal matrix with unequal entries
Value
An Mlm object
MatrixLM.mlm_fit
— Methodmlm_fit(data::RawData, weights::Nothing, targetType)
Matrix linear model using least squares method. Optionally incorporates shrinkage of the variance of the errors.
Arguments
data::RawData
: RawData objectweights::Nothing
:nothing
targetType
: string indicating the target type toward which to shrink the error variance, ornothing
. If the former, acceptable inputs are "A", "B", "C", and "D".- "A": Target is identity matrix
- "B": Target is diagonal matrix with constant diagonal
- "C": Target is has same diagonal element, and same off-diagonal element
- "D": Target is diagonal matrix with unequal entries
Value
An Mlm object
MatrixLM.mlm_fit
— Methodmlm_fit(data::RawData, weights::Array{Float64,1}, targetType)
Matrix linear model using column weighted least squares method. Optionally incorporates shrinkage of the variance of the errors.
Arguments
data
::RawData : RawData objectweights
::Array{Float64,1} : 1d array of floats to use as column weights forY
. Must be the same length as the number of columns ofY
.targetType
: string indicating the target type toward which to shrink the error variance, ornothing
. If the former, acceptable inputs are "A", "B", "C", and "D".- "A": Target is identity matrix
- "B": Target is diagonal matrix with constant diagonal
- "C": Target is has same diagonal element, and same off-diagonal element
- "D": Target is diagonal matrix with unequal entries
Value
An Mlm object
MatrixLM.mlm_perms
— Functionmlm_perms(data::RawData, nPerms::Int64=1000;
permFun::Function=shuffle_rows,
addXIntercept::Bool=true, addZIntercept::Bool=true,
weights=nothing, targetType=nothing, isMainEff::Bool=false)
Obtains permutation p-values for MLM t-statistics.
Arguments
data::RawData
: RawData objectnPerms::Int64=1000
: Number of permutations. Defaults to1000
.
Keyword arguments
- permFun::Function: function used to permute
Y
. Defaults toshuffle_rows
(shuffles rows ofY
). - addXIntercept::Bool=true: Boolean flag indicating whether or not to include an
X
intercept (row main effects). Defaults totrue
. - addZIntercept::Bool=true: Boolean flag indicating whether or not to include a
Z
intercept (column main effects). Defaults totrue
. - weights: 1d array of floats to use as column weights for
Y
, ornothing
. If the former, must be the same length as the number of columns ofY
. Defaults tonothing
. - targetType: string indicating the target type toward which to shrink the error variance, or
nothing
. If the former, acceptable inputs are "A", "B", "C", and "D". Defaults tonothing
.- "A": Target is identity matrix
- "B": Target is diagonal matrix with constant diagonal
- "C": Target is has same diagonal element, and same off-diagonal element
- "D": Target is diagonal matrix with unequal entries
- isMainEff::Bool: boolean flag indicating whether or not to include p-values for the main effects
Value
Tuple
tStats
: 2d array of floats; t-statisticspvals
: 2d array of floats; permutation p-values
Some notes
Permutations are computed in parallel when possible.
MatrixLM.perm_pvals
— Functionperm_pvals(fun::Function, data::RawData, nPerms::Int64=1000;
permFun::Function=shuffle_rows, funArgs...)
Obtains permutation p-values.
Arguments
fun::Function
: function that returns a test statisticdata::RawData
: RawData objectnPerms::Int64
: number of permutations. Defaults to1000
.
Keyword arguments
permFun::Function
: function used to permuteY
. Defaults toshuffle_rows
(shuffles rows ofY
).funArgs
: variable keyword arguments to be passed intofun
Value
Tuple
testStats
: 2d array of floats; t-statisticspvals
: 2d array of floats; permutation p-values
Some notes
Permutations are computed in parallel when possible.
MatrixLM.predict
— Functionpredict(MLM::Mlm, newPredictors::Predictors=MLM.data.predictors)
Calculates new predictions based on Mlm object
Arguments
MLM::Mlm
: Mlm objectnewPredictors::Predictors
: Predictors object. Defaults to thedata.predictors
field in the Mlm object used to fit the model.
Value
Response object
MatrixLM.remove_intercept
— Methodremove_intercept(A::AbstractArray{Float64,2})
Remove the intercept column, assumed to be the first column of a 2d array.
Arguments
A::AbstractArray{Float64,2}
: 2d array of floats
Value
Returns A without the intercept column
MatrixLM.resid
— Functionresid(MLM::Mlm, newData::RawData=MLM.data)
Calculates residuals of an Mlm object
Arguments
MLM::Mlm
: Mlm objectnewData::RawData
: RawData object. Defaults to thedata
field in the Mlm object used to fit the model.
Value
2d array of floats
MatrixLM.shrink_sigma
— Methodshrink_sigma(resid::AbstractArray{Float64,2}, targetType::String)
Estimates variance of errors and the shrinkage coefficient
Arguments
resid::AbstractArray{Float64,2}
: 2d array of floats consisting of the residualstargetType::String
: string indicating the target type toward which to shrink the variance. Acceptable inputs are "A", "B", "C", and "D".- "A": Target is identity matrix
- "B": Target is diagonal matrix with constant diagonal
- "C": Target is has same diagonal element, and same off-diagonal element
- "D": Target is diagonal matrix with unequal entries
Value
Tuple
sigma
: 2d array of floats; shrunk estimated variance of errorslambda
: floating scalar; estimated shrinkage coefficient (0 = no shrinkage, 1 = complete shrinkage)
Reference
Ledoit, O., & Wolf, M. (2003). Improved estimation of the covariance matrix of stock returns with an application to portfolio selection. Journal of empirical finance, 10(5), 603-621.
MatrixLM.shuffle_cols
— Methodshuffle_cols(A::AbstractArray{Float64,2})
Shuffles columns of a 2d array
Arguments
A::AbstractArray{Float64,2}
: 2d array of floats
Value
Returns A with columns shuffled
MatrixLM.shuffle_rows
— Methodshuffle_rows(A::AbstractArray{Float64,2})
Shuffles rows of a 2d array
Arguments
A::AbstractArray{Float64,2}
: 2d array of floats
Value
Returns A with rows shuffled
MatrixLM.t_stat
— Functiont_stat(MLM::Mlm, isMainEff::Bool=false)
Calculates t-statistics of an Mlm object
Arguments
MLM::Mlm
: Mlm objectisMainEff::Bool
: boolean flag indicating whether or not to include t-statistics for the main effects
Value
2d array of floats
MatrixLM.@mlmformula
— Macromlmformula(ex)
Capture and parse a formula expression for matrix linear model.
The `@mlmformula` domain-specific language serves the purpose of facilitating table-to-matrix transformations.
It is structured to be intuitive for users who have experience with other statistical software.
An elementary formula in this language consists of individual terms. These terms may either be symbols that reference
data columns or literal numbers `0` or `1`. They are combined by the operators `+`, `&`, and `*`.
To ensure correct parsing of the formula, the `@mlmformula`` macro needs to be invoked within parentheses.
This macro is built upon the `@formula` macro from the `StatsModels.jl` package.
# Example
```julia
julia> @mlmformula(1 + varA * VarB)
1
varA(unknown)
VarB(unknown)
varA(unknown) & VarB(unknown)
```