Functions and Types
Flexible Multivariate Linear Mixed Models (flxMLMM)
FlxQTL.FlxQTL
— ModuleFlxQTL
flexible Multivariate Linear Mixed Model based QTL analysis tools for structured multiple traits.
FlxQTL.flxMLMM
— ModuleflxMLMM
A module designed for fitting a Multivariate Linear Mixed Model run by Nesterov's Accelerated Gradient with restarting scheme incorporated with Expectation Conditional Maximization.
The model:
$Y=XBZ'+R+E$, where $E(vec(Y))= (Z \otimes X)vec(B)$, $var(vec(Y))= \tau^2 K_G \otimes K_C + I_n \otimes \Sigma$
FlxQTL.flxMLMM.K2Eig
— Function K2Eig(Kg,Kc::Array{Float64,2},LOCO::Bool=false)
Returns a two pairs of eigenvectors and eigenvalues for genetic and climatic relatedness matrices.
Arguments
Kg
: A matrix of a genetic kinship, or 3-d array of that ifLOCO
sets to betrue
.Kc
: A matrix of a climatic relatedness.LOCO
: Boolean. Default isfalse
(no LOCO). (Leave One Chromosome Out).LOCO
is only connected to the genetic kinship (Kg
).
Output
Tg
: A matrix of eigenvectors forKg
, or 3-d array of eigenvectors ifLOCO
sets to betrue
.λg
: A vector of eigenvalues forKg
, or matrix of eigenvalues ifLOCO
sets to betrue
.Tc
: A matrix of eigenvectors forKc
.λc
: A vector of eigenvalues forKc
See K2eig
.
Examples
For a genetic kinship calculated under LOCO
(3-d array of kinship),
Tg,λg,Tc,λc = K2Eig(Kg,Kc,true)
produces a 3-d array of Tg
, matrices of λg
, Tc
, and a vector of λc
.
FlxQTL.flxMLMM.K2eig
— Function K2eig(K,LOCO::Bool=false)
Returns eigenvectors and eigenvalues of a (genetic, climatic) relatedness, or 3-d array of these of a genetic relatedness if LOCO
is true
.
Arguments
K
: A matrix of (genetic or climatic) relatedness (Default). 3-d array of genetic relatedness (LOCO
sets to be true.)LOCO
: Boolean. Default isfalse
(no LOCO). (Leave One Chromosome Out).
Output
T
: A matrix of eigenvectors, or 3-d array of eigenvectors ifLOCO
sets to betrue
.λ
: A vector of eigenvalues, or matrix of eigenvalues ifLOCO
sets to betrue
.
See also K2Eig
.
Examples
For a (climatic) relatedness, or genetic relatedness for LOCO =false
,
T, λ = K2eig(K)
produces a matrix of T
and a vector of λ
.
For a genetic kinship calculated under LOCO
(3-d array of kinship),
T, λ = K2eig(K,true)
produces a 3-d array of T
and a matrix of λ
.
FlxQTL.flxMLMM.envScan
— Function envScan(Midx::Array{Int64,1},cross::Int64,Tg,Tc::Array{Float64,2},Λg,λc::Array{Float64,1},
Y0::Array{Float64,2},XX::Markers,Z0::Array{Float64,2},LOCO::Bool=false;
Xnul::Array{Float64,2}=ones(1,size(Y0,2)),itol=1e-4,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
Implement environment scan conditional on a genetic marker of interest (QTL). Each of q
trait covariate data is scanned (regressed) given a major QTL selected from genome scan, geneScan
to obtain LOD scores.
Arguments
Midx
: A vector of genetic marker indices (major QTL) selected based on LOD scores fromgeneScan
.cross
: An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.Tg
: A n x n matrix of eigenvectors fromK2eig
, orK2Eig
. Returns 3d-array of eigenvectors as many as Chromosomes ifLOCO
is true.Tc
: A m x m matrix of eigenvectors from climatic relatedness matrix.Λg
: A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues ifLOCO
is true.λc
: A m x 1 vector of eigenvalues from climatic relatedness matrix. Useones(m)
for no climatic information added.Y0
: A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e.Y0[1,:]
(a vector) ->Y[[1],:]
(a matrix) .XX
: A type ofMarkers
.Z0
: A m x q matrix of low-dimensional trait covariate data for environment scan, i.e. minimum or maximum monthly temperature data, monthly photoperiod data, etc.LOCO
: Boolean. Default isfalse
(no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out).
Keyword Arguments
Xnul
: A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.itol
: A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is1e-3
.tol0
: A tolerance controlling ECM under H1: existence of QTL. Default is1e-3
.tol
: A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is1e-4
.ρ
: A tunning parameter controlling $\tau^2$. Default is0.001
.
!!! Note
- When some LOD scores return negative values, reduce tolerences for ECM to
tol0 = 1e-4
. It works in most cases. If not, can reduce bothtol0
andtol
to1e-4
or further.
Output
LODs
: A vector of LOD scores by envrionment scan when including each major QTL.B
: A 3-d array ofB
(fixed effects) matrices under H1: existence of an environment factor (covariate) conditional on a major QTL.est0
: A type ofEcmNestrv.Approx
including parameter estimates under H0: no environment factor conditional on a major QTL.
FlxQTL.flxMLMM.gene2Scan
— Functiongene2Scan(cross::Int64,Tg,Tc::Array{Float64,2},Λg,λc::Array{Float64,1},Y::Array{Float64,2},
XX::Markers,Z::Array{Float64,2},LOCO::Bool=false;ρ=0.001,Xnul::Array{Float64,2}=ones(1,size(Y,2)),
df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*5,itol=1e-4,tol0=1e-3,tol::Float64=1e-4)
gene2Scan(cross::Int64,Tg,Λg,Y::Array{Float64,2},XX::Markers,Z::Array{Float64,2},LOCO::Bool=false;
Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*5,
itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
gene2Scan(cross::Int64,Tg,Λg,Y::Array{Float64,2},XX::Markers,LOCO::Bool=false;Xnul::Array{Float64,2}=ones(1,size(Y,2)),
df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*5,itol=1e-4,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
Implement 2d-genome scan with/without LOCO (Leave One Chromosome Out). Note that the second gene2Scan
includes getKc
for precomputing Kc
– no need of precomputing and doing eigen-decomposition to Kc
separately. The last gene2Scan
is run by a conventional MLMM (no Z, i.e. Z=I):
\[vec(Y) \sim MVN((Z \otimes X)vec(B) (or XBZ') , K \otimes \Sigma_1 +I \otimes \Sigma_2),\]
where K
is a genetic kinship, $\Sigma_1, \Sigma_2$ are covariance matrices for random and error terms, respectively. Z
can be replaced with an identity matrix.
Arguments
cross
: An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.Tg
: A n x n matrix of eigenvectors fromK2eig
, orK2Eig
. Returns 3d-array of eigenvectors as many as Chromosomes ifLOCO
is true.Tc
: A m x m matrix of eigenvectors from climatic relatedness matrix.Λg
: A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues ifLOCO
is true.λc
: A m x 1 vector of eigenvalues from climatic relatedness matrix. Useones(m)
for no climatic information added.Y
: A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e.Y0[1,:]
(a vector) ->Y[[1],:]
(a matrix) .XX
: A type ofMarkers
.Z
: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). To use the identity matrix just as the conventional MLMM, simply typediagm(ones(m))
.LOCO
: Boolean. Default isfalse
(no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out).
Keyword Arguments
Xnul
: A matrix of covariates. Default is intercepts (1's). Unless adding covariates, just leave as it is. SeegeneScan
.Prior
: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. A large scaled covariance matrix (a weakly informative prior) is default.df_prior
: degrees of freedom, $\nu_0$ for Inverse-Wishart distributon.m+1
(weakly informative) is default.itol
: A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is1e-3
.tol0
: A tolerance controlling ECM under H1: existence of QTL. Default is1e-3
.tol
: A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is1e-4
.ρ
: A tunning parameter controlling $\tau^2$. Default is0.001
.
!!! Note
- When some LOD scores return negative values, reduce tolerences for ECM to
tol0 = 1e-4
. It works in most cases. If not, can reduce bothtol0
andtol
to1e-4
or further. 2D scan is usually computationally heavy and takes much time especially for large data with high-dimensional traits, i.e. largem
andn
. This will be worse when the conventional LMM is chosen.
Output
LODs
: LOD scores. Can change to $- \log_{10}{P}$ usinglod2logP
.est0
: A type ofEcmNestrv.Approx
including parameter estimates under H0: no QTL.
FlxQTL.flxMLMM.geneScan
— FunctiongeneScan(cross::Int64,Tg,Tc::Array{Float64,2},Λg,λc::Array{Float64,1},Y::Array{Float64,2},XX::Markers,Z::Array{Float64,2},
LOCO::Bool=false;m=size(Y,1),tdata::Bool=false,LogP::Bool=false,Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,
Prior::Matrix{Float64}=cov(Y,dims=2)*5,itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
geneScan(cross::Int64,Tg::Union{Array{Float64,3},Array{Float64,2}},Tc::Array{Float64,2},Λg::Union{Array{Float64,2},Array{Float64,1}},
λc::Array{Float64,1},Y::Array{Float64,2},XX::Markers,LOCO::Bool=false;m=size(Y,1),LogP::Bool=false,Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,
Prior::Matrix{Float64}=cov(Y,dims=2)*5,itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
geneScan(cross::Int64,Tg,Λg,Y::Array{Float64,2},XX::Markers,LOCO::Bool=false;m=size(Y,1),Xnul::Array{Float64,2}=ones(1,size(Y,2)),
df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*5,tdata::Bool=false,LogP::Bool=false,
itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
gene1Scan(cross::Int64,Tg,Λg,Y::Array{Float64,2},XX::Markers,Z::Array{Float64,2},LOCO::Bool=false;m=size(Y,1),
Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*5,
tdata::Bool=false,LogP::Bool=false,itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
Implement 1d-genome scan with/without LOCO (Leave One Chromosome Out). Note that gene1Scan
includes getKc
for precomputing Kc
– no need of precomputing and doing eigen-decomposition to Kc
separately. The third geneScan()
is based on a conventional MLMM:
\[vec(Y) \sim MVN((Z \otimes X)vec(B) (or XBZ'), K \otimes \Sigma_1 +I \otimes \Sigma_2),\]
where K
is a genetic kinship, $\Sigma_1, \Sigma_2$ are covariance matrices for random and error terms, respectively. Z
can be replaced with an identity matrix.
Arguments
cross
: An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.Tg
: A n x n matrix of eigenvectors fromK2eig
, orK2Eig
. Returns 3d-array of eigenvectors as many as Chromosomes ifLOCO
is true.Tc
: A m x m matrix of eigenvectors from climatic relatedness matrix.Λg
: A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues ifLOCO
is true.λc
: A m x 1 vector of eigenvalues from climatic relatedness matrix. Useones(m)
for no climatic information added.Y
: A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e.Y0[1,:]
(a vector) ->Y[[1],:]
(a matrix) .XX
: A type ofMarkers
.Z
: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If nothing to insert inZ
, just exclude it or insert an identity matrix,Matrix(1.0I,m,m)
. m traits x q phenotypic covariates.LOCO
: Boolean. Default isfalse
(no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out).
Keyword Arguments
Xnul
: A matrix of covariates. Default is intercepts (1's):Xnul= ones(1,size(Y0))
. Adding covariates (C) isXnul= vcat(ones(1,m),C)
wheresize(C)=(c,m)
form = size(Y0,1)
.Prior
: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. A large scaled covariance matrix (a weakly informative prior) is default.df_prior
: degrees of freedom, $\nu_0$ for Inverse-Wishart distributon.m+1
(weakly informative) is default.itol
: A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is1e-3
.tol0
: A tolerance controlling ECM under H1: existence of QTL. Default is1e-3
.tol
: A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is1e-4
.ρ
: A tunning parameter controlling $\tau^2$. Default is0.001
.LogP
: Boolean. Default isfalse
. Returns $-\log_{10}{P}$ instead of LOD scores iftrue
.
!!! Note
- When some LOD scores return negative values, reduce tolerences for ECM to
tol0 = 1e-4
. It works in most cases. If not, can reduce bothtol0
andtol
to1e-4
or further.
Output
LODs
(orlogP
) : LOD scores. Can change to $- \log_{10}{P}$ inlod2logP
ifLogP = true
.B
: A 3-d array ofB
(fixed effects) matrices under H1: existence of QTL. If covariates are added toXnul
:Xnul= [ones(1,size(Y0)); Covariates]
, ex. For sex covariates in 4-way cross analysis, B[:,2,100], B[:,3:5,100] are effects for sex, the rest genotypes of the 100th QTL, respectively.est0
: A type ofEcmNestrv.Approx
including parameter estimates under H0: no QTL.
FlxQTL.flxMLMM.getKc
— MethodgetKc(Y::Array{Float64,2};m=size(Y,1),Z=diagm(ones(m)), df_prior=m+1,
Prior::Matrix{Float64}=cov(Y,dims=2)*5,Xnul::Array{Float64,2}=ones(1,size(Y,2)),
itol=1e-2,tol::Float64=1e-3,ρ=0.001)
Pre-estimate Kc
by regressing Y
on Xnul
, i.e. estimating environmental covariates under H0: no QTL
.
Argument
Y
: A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e.Y0[1,:]
(a vector) ->Y[[1],:]
(a matrix) .
Keyword Arguments
Z
: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). An identity matrix, $I_m$, is default.Xnul
: A matrix of covariates. Default is intercepts (1's):Xnul= ones(1,size(Y0))
. Adding covariates (C) isXnul= vcat(ones(1,m),C)
wheresize(C)=(c,m)
form = size(Y0,1)
.Prior
: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. A large scaled covariance matrix (a weakly informative prior) is default.df_prior
: degrees of freedom, $\nu_0$ for Inverse-Wishart distributon.m+1
(weakly informative) is default.itol
: A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is1e-3
.tol
: A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is1e-4
.ρ
: A tunning parameter controlling $\tau^2$. Default is0.001
.
Output
InitKc
: A type of struct of arrays, including pre-estimatedKc
,and null estimates of B
,Σ
,τ2
used as initial values insidegene1Scan
, one ofgeneScan
functions, orgene2Scan
.
Examples
julia> K0 = getKc(Y)
julia> K0.Kc # for Kc
julia> K0.B # for B under H0
FlxQTL.flxMLMM.permTest
— Method permTest(nperm::Int64,cross,Kg,Kc,Y,XX::Markers;pval=[0.05 0.01],m=size(Y,1),Z=diagm(ones(m)),df_prior=m+1,
Prior::Matrix{Float64}=cov(Y,dims=2)*5,Xnul=ones(1,size(Y,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)
permTest(nperm::Int64,cross,Kg,Y,XX::Markers;pval=[0.05 0.01],df_prior=m+1,
Prior::Matrix{Float64}=cov(Y,dims=2)*5,Xnul=ones(1,size(Y,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)
Implement permutation test to get thresholds at the levels of type 1 error, α
. Note that the last permTest()
is for the conventional MLMM:
\[vec(Y)\sim MVN((I \otimes X)vec(B) (or BX), K \otimes \Sigma_1 +I \otimes \Sigma_2),\]
where K
is a genetic kinship, $\Sigma_1, \Sigma_2$ are covariance matrices for random and error terms, respectively.
Arguments
nperm
: An integer indicating the number of permutation to be implemented.cross
: An integer indicating the number of alleles or genotypes. Ex.2
for RIF,4
for four-way cross,8
for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.Kg
: A n x n genetic kinship matrix. Should be symmetric positive definite.Kc
: A m x m climatic relatedness matrix. Should be symmetric positive definite.Y
: A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e.Y[1,:]
(a vector) ->Y[[1],:]
(a matrix) .XX
: A type ofMarkers
.
Keyword Arguments
pval
: A vector of p-values to get their quantiles. Default is[0.05 0.01]
(without comma).Xnul
: A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.Z
: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). Default is an identity matrix for the dimension of m traits x q phenotypic covariates.Prior
: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. A large scaled covariance matrix (a weakly informative prior) is default.df_prior
: degrees of freedom, $\nu_0$ for Inverse-Wishart distributon.m+1
(weakly informative) is default.itol
: A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is1e-3
.tol0
: A tolerance controlling ECM under H1: existence of QTL. Default is1e-3
.tol
: A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is1e-4
.ρ
: A tunning parameter controlling $\tau^2$. Default is0.001
.
Output
maxLODs
: A nperm x 1 vector of maximal LOD scores by permutation.H1par_perm
: A type of struct,EcmNestrv.Approx(B,τ2,Σ,loglik)
including parameter estimates orEcmNestrv.Result(B,Vc,Σ,loglik)
for a conventional MLMM under H0: no QTL by permutation.cutoff
: A vector of thresholds corresponding topval
.
FlxQTL.EcmNestrv
— ModuleEcmNestrv
A module for base algorithms using ECM (Expectation-Conditional Maxization) with Speed restarting Nesterov's accelerated gradient method to fit a flexible multivariate linear mixed model (flxMLMM).
Genetic Relatedness Matrices (GRM)
FlxQTL.GRM
— Module GRM
A module for computing Genetic Relatedness Matrix (or kinship).
FlxQTL.GRM.kinship4way
— Method kinship4way(genmat::Array{Float64,2})
Computes a kinship for four-way cross data counting different alleles between two markers: ex. AB-AB=0; AB-AC=1; AB-CD=2,$\dots$ Note: In R/qtl, genotypes are labeled as 1=AC; 2=BC; 3=AD; 4=BD by the function, read.cross
.
Argument
genmat
: A matrix of genotypes forfour-way cross
$(1,2, \dots)$. size(genematrix)= (p,n), forp
genetic markers xn
individuals(or lines).
Output
Returns a n x n symmetric matrix containing 1's on the diagonal.
FlxQTL.GRM.kinshipCtr
— Method kinshipCtr(genmat::Array{Float64,2})
Calculates a kinship by a centered genotype matrix (linear kernel), i.e. genotypes subtracted by marker mean.
Argument
genmat
: A matrix of genotype data (0,1,2). size(genmat)=(p,n) forp
markers xn
individuals
Output
Returns a n x n symmetric matrix. See also kinshipStd
.
FlxQTL.GRM.kinshipGs
— Method kinshipGs(climate::Array{Float64,2},ρ::Float64)
Computes a kinship matrix using the Gaussian Kernel.
Arguments
climate
: A matrix of genotype, or climate information data. size(climate)=(r,m), such thatr
genotype markers (or days/years of climate factors, i.e. precipitations, temperatures, etc.), andm
individuals (or environments/sites)ρ
: A free parameter determining the width of the kernel. Could be attained empirically.
Output
Returns a m x m symmetric (positive definite) matrix containing 1's on the diagonal.
FlxQTL.GRM.kinshipLin
— MethodkinshipLin(mat,cross)
Calculates a kinship (or climatic relatedness, kinshipGs
) matrix by linear kernel.
Arguments
mat
: A matrix of genotype (or allele) probabilities usually extracted from R/qtl, R/qtl2, or the counterpart packages. size(mat)= (p,n) for p genetic markers x n individuals.cross
: A scalar indicating instances of alleles or genotypes in a genetic marker. ex. 1 for genotypes (labeled as 0,1,2), 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
Output
Returns a n x n symmetric (positive definite) matrix containing 1's on the diagonal.
See also kinshipCtr
, kinshipStd
for genetype data.
FlxQTL.GRM.kinshipLoco
— Function kinshipLoco(kin,g::Markers,cross::Int64=1)
Generates a 3-d array of symmetric positive definite kinship matrices using LOCO (Leave One Chromosome Out) witout shrinkage intensity estimation. When a kinship is not positive definite, a tweak like a weighted average of kinship and Identity is used to correct minuscule negative eigenvalues.
Arguments
kin
: A function of computing a kinship. Can only use withkinshipCtr
,kinshipStd
for genotypes, and withkinshipLin
for genotype (or allele) probabilities.g
: A struct of arrays, typeMarkers
.cross
: A scalar indicating instances of alleles or genotypes in a genetic marker. ex. 1 for genotypes (0,1,2) as default, 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
Output
Returns 3-d array of n x n symmetric positive definite matrices as many as Chromosomes. Refer to shrinkgLoco
.
FlxQTL.GRM.kinshipMan
— Method kinshipMan(genematrix::Array{Float64,2})
Calculates a kinship matrix using a manhattan distance. Missing values need to be either omitted or imputed. This function is for recombinant inbred line (RIL) (AA/BB), not for 4-way cross genotype data. See kinship4way
.
Argument
genematrix
: A matrix of genotypes, i.e. 0,1 (or 1,2). size(genematrix)= (p,n) forp
genetic markers xn
individuals(or lines).
Output
Returns a n x n symmetric matrix containing 1's on the diagonal.
FlxQTL.GRM.kinshipStd
— Method kinshipStd(genmat::Array{Float64,2})
Calculates a kinship by a standardized (or normalized) genotype matrix (linear kernel), i.e. genotypes subtracted by marker mean and divided by marker standard deviation. Can also do with climatic information data. See kinshipGs
.
Argument
genmat
: A matrix of genotype data (0,1,2). size(genmat)=(p,n) forp
markers xn
individuals
Output
Returns a n x n symmetric matrix. See also kinshipCtr
.
FlxQTL.GRM.shrinkg
— Method shrinkg(f,nb::Int64,geno)
Estimates a full-rank positive definite kinship matrix by shrinkage intensity estimation (bootstrap). Can only use with kinshipMan
, kinship4way
. This function runs faster by CPU parallelization. Add workers/processes using addprocs()
function before running for speedup.
Arguments
f
: A function of computing a kinship. Can only use withkinshipMan
,kinship4way
.nb
: An integer indicating the number of bootstrap. It does not have to be a large number.geno
: A matrix of genotypes. SeekinshipMan
,kinship4way
for dimension.
Example
julia> using FlxQTL
julia> addprocs(8)
julia> K = shinkage(kinshipMan,20,myGeno)
Output
Returns a full-rank symmetric positive definite matrix.
FlxQTL.GRM.shrinkgLoco
— Method shrinkgLoco(kin,nb,g::Markers)
Generates 3-d array of full-rank positive definite kinship matrices by shrinkage intensity estimation (bootstrap) using a LOCO (Leave One Chromosome Out) scheme.
Argument
kin
: A function of computing a kinship. Can only use withkinshipMan
,kinship4way
nb
: An integer indicating the number of bootstrap.g
: A struct of arrays, typeMarkers
.
Output
Returns 3-d array of n x n symmetric positive definite matrices as many as Chromosomes.
Multivariate Linear Models (MLM)
FlxQTL.MLM
— Module MLM
A module for fitting general Multivariate Linear Models motivated by functional data analysis via mle or reml. The default fitting method is mle. ( i.e. reml=false)
The model: $Y=XBZ'+E$, where $E(Y)=XBZ'$ (or $E(vec(Y))= (Z \otimes X)vec(B)$ ), $var(vec(E))=\Sigma \otimes I.$ size(Y)=(n,m), size(X)=(n,p), size(Z)=(m,q).
FlxQTL.MLM.Estimat
— Type Estimat(B::Array{Float64,2},Σ::Array{Float64,2},loglik::Float64)
A struct of arrays for results by fitting a multivariate linear model, mGLM()
. The results are B
(fixed effects), Σ
(m x m covariance matrix), loglik
(a value of log-likelihood by mle or reml).
FlxQTL.MLM.mGLM
— Function mGLM(Y::Array{Float64,2},X::Array{Float64,2},Z::Array{Float64,2},reml::Bool=false)
mGLM(Y::Array{Float64,2},X::Array{Float64,2},reml::Bool=false)
Fitting multivariate General Linear Models via MLE (or REML) and returns a type of a struct Estimat
.
Arguments
Y
: A matrix of response variables, i.e. traits. size(Y)=(n,m) for n individuals x m traitsX
: A matrix of independent variables, i.e. genotypes or genotype probabilities including intercept or/and covariates. size(X)=(n,p) for n individuals x p markers including intercept or/and covariatesZ
: An optional matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If nothing to insert inZ
, just exclude it or insertMatrix(1.0I,m,m)
. size(Z)=(m,q) for m traits x q phenotypic covariates.reml
: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented iftrue
.
Output
Returns Estimat
.
Utility Functions (Util)
FlxQTL.Util
— Module Util
A module for utility functions.
FlxQTL.Util.Markers
— TypeMarkers(name::Array{String,1},chr::Array{Any,1},pos::Array{Float64,1},X::Array{Float64,2})
A struct of arrays creating genotype or genotype probability data for genome scan.
Arguments
name
: A vector of marker nameschr
: A vector of Chromosomespos
: A vector of marker positions (cM)X
: A matrix of genotypes or genotype probabilities
FlxQTL.Util.Y_huber
— MethodY_huber(Y::Array{Float64,2})
Rescale Y (phenotype or trait data) to be less sensitive to outliers using by Huber loss function and MAD (median absolute deviation). size(Y)=(m,n) for m
trait and n
individuals.
FlxQTL.Util.array2mat
— Method array2mat(cross::Int64,X0::Array{Float64,3})
Returns a 3-d array to a matrix of genotype probabilities. size(X0)=(p,cross,n) –> (p1,n), where p1
= cross*p
for p
markers, cross
alleles or genotypes, and n
individuals. See mat2array
.
FlxQTL.Util.getFinoidx
— MethodgetFinoidx(phenoData::Array{Union{Missing,Float64},2})
Attains indices of phenotype data without missing values.
Argument
phenoData
: A matrix of phenotype (or trait) data including missing values. size(phenoData) = (m,n) form
traits andn
individuals.
FlxQTL.Util.getGenoidx
— FunctiongetGenoidx(GenoData::Union{Array{Any,2},Array{Float64,2}},maf::Float64=0.025)
Attains genotype indices to drop correlated or bad markers.
Arguments
GenoData
: A matrix of genotype data. size(GenoData)= (p,n) forp
markers andn
individuals.maf
: A scalar for dropping criterion of markers. Default is0.025
i.e. markers of MAF < 0.025 are dropped.
FlxQTL.Util.lod2logP
— Methodlod2logP(LODs::Union{Array{Float64,1},Array{Any,1}},v::Int64)
Caculates $-\log_{10}{P}$ from LOD scores.
Arguments
LODs
: A vector of LOD scores computed from genome scan.v
: Degrees of freedom for Chi-squared distribution.
!!! NOTE
- To compute degrees of freedom for 1D-genome scan (
geneScan
),v = prod(size(B1))-prod(size(B0)
, whereB1
andB0
are effect size matrices under H1 of a full model (intercept + covariates + QTL) and H0 of no QTL (intercept + covariates), respectively.
FlxQTL.Util.mat2array
— Method mat2array(cross::Int64,X0)
Returns a matrix of genotype probabilities to 3-d array. size(X0)=(p1,n) –> (p,cross,n), where p1
= cross*p
for p
markers, cross
alleles or genotypes, and n
individuals.
Argument
cross
: An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.X0
: A matrix of genotype probability data computed from r/qtl or r/qtl2.
See array2mat
.
FlxQTL.Util.mat2vec
— Method mat2vec(mat)
Stacks a matrix to a vector, i.e. vectorizing a matrix.
FlxQTL.Util.newMarkers
— FunctionnewMarkers(XX::Markers,cross::Int64,cM::Int64=2)
Returns a struct of Markers by keeping only markers positioned in every cM
centimorgans for 2-d genome scan to avoid singularity.
Arguments
XX
: A type ofMarkers
. SeeMarkers
.cross
: An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.cM
: An integer of dropping criterion of markers. Default is 2, i.e. keeping only markers in every 2 cM, or dropping markers within 2cM between 2 markers.
FlxQTL.Util.ordrMarkers
— MethodordrMarkers(markers::Array{Any,2})
Rearrange by CPU parallelization marker information composed of marker name, chr, position obtained from rqtl2, which is not listed in order (excluding X
chromosome).
Argument
markers
: An array of marker information.
FlxQTL.Util.setSeed
— MethodsetSeed(seedNum::Int64)
Assigns different numbers of seeds to workers (or processes) for reproducibility.
Arguments
seedNum
: An integer. A minimum seed number to assign a worker. For distributed computing, seed numbers are generated for multiple workers by increasing 1, e.g. seedNum = 123 & 10 workers,setSeed
generates seeds from 123 to 132 and assigns to corresponding workers (processes).
Examples
julia> using Distributed
julia> addprocs(10)
julia> @everywhere using flxQTL
julia> setSeed(123)
FlxQTL.Util.sortBycM
— Function sortBycM(Chr::Any,XX::Markers,cross::Int64,cM::Int64=2)
Returns marker indices in Chromosome Chr
and the corresponding genotype probabilities keeping only markers positioned in every cM
centimorgans for 2-d genome scan to avoid singularity.
Arguments
Chr
: A type of Any indicating a particular chromosome to sort markers out.XX
: A type ofMarkers
.cross
: An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.cM
: An integer of dropping criterion of markers. Default is 2, i.e. keeping only markers in every 2 cM, or dropping markers within 2cM between 2 markers.
See also newMarkers