Functions and Types
Flexible Multivariate Linear Mixed Models (flxMLMM)
FlxQTL.FlxQTL — ModuleFlxQTLFlexible QTL analysis tools for structured multiple traits fitting a Multivariate Linear Mixed Model or a Multivariate Linear Model.
FlxQTL.flxMLMM — ModuleflxMLMMA module designed for fitting a Multivariate Linear Mixed Model run by the Nesterov's Accelerated Gradient with restarting scheme embedding the Expectation Conditional Maximization to estimate MLEs. REML is not supported.
The general form of Multivariate Linear Mixed model is
\[vec(Y) \sim MVN((X' \otimes Z)vec(B) (or ZBX), K \otimes \V_C +I \otimes \Sigma),\]
where $Z = I_m$, K is a genetic kinship, and $V_C, \Sigma$ are variance component and error matrices, respectively.
The FlxQTL model (flxMLMM) estimates a scalar parameter $\tau^2$ under H1 to efficiently estimate the high dimensional variance component, i.e. $\Omega \approx \tau^2 V_C$ as well as $Z \neq I_m$ to estimate much smaller B than the former model with Z = I. dim(Y) = (m traits, n individuals), dim(X) = (p markers, n), dim(Z) = (m, q trait covariates).
FlxQTL.flxMLMM.K2eig — Function K2eig(K,LOCO::Bool=false)Returns eigenvectors and eigenvalues of genetic relatedness, or 3-d array of these of a genetic relatedness if LOCO is true.
Arguments
K: A matrix of genetic relatedness (Default). 3-d array of genetic relatedness (LOCOsets to be true.)LOCO: Boolean. Default isfalse(no LOCO). (Leave One Chromosome Out).
Output
T: A matrix of eigenvectors, or a 3-d array of eigenvectors ifLOCOsets to betrue.λ: A vector of eigenvalues, or a matrix of eigenvalues ifLOCOsets to betrue.
Examples
For a null variance component, or genetic relatedness for LOCO =false,
T, λ = K2eig(K)
produces a matrix of T and a vector of λ.
For a genetic kinship calculated under LOCO (a 3-d array of kinship matrices),
T, λ = K2eig(K,true)
produces a 3-d array of matrices T and a matrix of λ.
FlxQTL.flxMLMM.gene2Scan — Functiongene2Scan(cross::Int64,Tg::Union{Array{Float64,3},Matrix{Float64}},Λg::Union{Matrix{Float64},Vector{Float64}},
Y::Array{Float64,2},XX::Markers,LOCO::Bool=false;m=size(Y,1),Z=diagm(ones(m)),
Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*3,
itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
gene2Scan(Tg,Λg,Y,XX,cross,false;m=size(Y,1),Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,
Prior::Matrix{Float64}=cov(Y,dims=2)*3,kmin::Int64=1,itol=1e-4,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)Implement 2d-genome scan with/without LOCO (Leave One Chromosome Out). The first function is a FlxQTL model with/without Z that preestimate a null variance component matrix ($V_C$) under H0 : no QTL, followed by its adjustment by a scalar parameter under H1 : existing QTL. The third geneScan() is based on a conventional MLMM that estimate all parameters under H0/H1. The conventional MLMM is defined as
\[vec(Y) \sim MVN((X' \otimes Z)vec(B) (or ZBX), K \otimes \V_C +I \otimes \Sigma),\]
where Z is identity, K is a genetic kinship, and $V_C, \Sigma$ are variance component and error matrices, respectively.
The FlxQTL model estimates a scalar parameter $\tau^2$ under H1 to efficiently estimate the high dimensional variance component, i.e. $\Omega \approx \tau^2 V_C$
Arguments
cross: An integer (Int64) indicating the occurrence of combination of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degrees of freedom for the effect size of the genetic marker when doing genome scan.Tg: A n x n matrix of eigenvectors, or a 3d-array of eigenvectors ifLOCOis true. See alsoK2eig.Λg: A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues ifLOCOis true.Y: A m x n matrix of response variables, i.e. m traits by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e.Y0[1,:](a vector) ->Y[[1],:](a matrix) .XX: A type ofMarkers.Z: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (wavelet, polynomials, B-splines, etc.). If no assumption among traits, insert an identity matrix,Matrix(1.0I,m,m), or use the secondgeneScan().LOCO: Boolean. Default isfalse(no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out) iftrue.
Keyword Arguments
Xnul: A matrix of covariates. Default is intercepts (1's):Xnul= ones(1,size(Y,2)). Adding covariates (C) isXnul= vcat(ones(1,n),C)wheresize(C)=(c,n).Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon.m+1(weakly informative) is default.itol: A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is1e-3.tol0: A tolerance controlling ECM under H1: existence of QTL. Default is1e-3.tol: A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is1e-4.ρ: A tunning parameter controlling $\tau^2$. Default is0.001.
!!! Note
- When some LOD scores return negative values, reduce tolerences for ECM to
tol0 = 1e-4, or increasedf_prior, such that $m+1 \le df_prior \le 2m$. The easiest setting isdf_prior = Int64(ceil(1.9m))for numerical stability.
Output
LODs: LOD scores. Can change to $- \log_{10}{P}$ usinglod2logP.est0: A type ofEcmNestrv.Approxincluding parameter estimates under H0: no QTL.
FlxQTL.flxMLMM.geneScan — FunctiongeneScan(cross::Int64,Tg::Union{Array{Float64,3},Matrix{Float64}},Λg::Union{Matrix{Float64},Vector{Float64}},
Y::Array{Float64,2},XX::Markers,Z::Array{Float64,2},LOCO::Bool=false;m=size(Y,1),H0_up::Bool=false,
Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*3,
LogP::Bool=false,itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
geneScan(cross,Tg,Λg,Y,XX,false;m=size(Y,1),H0_up::Bool=false,Xnul::Array{Float64,2}=ones(1,size(Y,2)),
df_prior=m+1,Prior=cov(Y,dims=2)*3,LogP=false,itol=1e-3,tol0=1e-3,tol=1e-4,ρ=0.001)
geneScan(Tg,Λg,Y,XX,cross,false;Xnul=ones(1,size(Y,2)),df_prior=m+1,Prior=cov(Y,dims=2)*3,LogP=false,
itol=1e-3,tol0=1e-3,tol=1e-4,ρ=0.001)Implement 1d-genome scan with/without LOCO (Leave One Chromosome Out). The first two functions are FlxQTL models with/without Z that preestimate a null variance component matrix ($V_C$) under H0 : no QTL, followed by its adjustment by a scalar parameter under H1 : existing QTL. The third geneScan() is based on a conventional MLMM that estimate all parameters under H0/H1. The conventional MLMM is defined as
\[vec(Y) \sim MVN((X' \otimes Z)vec(B) (or ZBX), K \otimes \V_C +I_n \otimes \Sigma),\]
where Z is identity, K is a genetic kinship, and $V_C, \Sigma$ are variance component and error matrices, respectively.
The FlxQTL model estimates a scalar parameter $\tau^2$ under H1 to efficiently estimate the high dimensional variance component, i.e. $\Omega \approx \tau^2 V_C$, where $Z \neq I$.
Arguments
cross: An integer (Int64) indicating the occurrence of combination of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degrees of freedom for the effect size of the genetic marker when doing genome scan.Tg: A n x n matrix of eigenvectors, or a 3d-array of eigenvectors ifLOCOis true. See alsoK2eig.Λg: A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues ifLOCOis true.Y: A m x n matrix of response variables, i.e. m traits by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e.Y0[1,:](a vector) ->Y[[1],:](a matrix) .XX: A type ofMarkers.Z: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (wavelet, polynomials, B-splines, etc.). If no assumption among traits, insert an identity matrix,Matrix(1.0I,m,m), or use the secondgeneScan().LOCO: Boolean. Default isfalse(no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out) iftrue.
Keyword Arguments
Xnul: A matrix of covariates. Default is intercepts (1's):Xnul= ones(1,size(Y,2)). Adding covariates (C) isXnul= vcat(ones(1,n),C)wheresize(C)=(c,n).Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon.m+1(weakly informative) is default.H0_up: Default returns null estimates,est0from the conventional MLMM. It is recommended settingH0_up=truefor $m \ge 11$ to avoid negative LODs.itol: A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is1e-3.tol0: A tolerance controlling ECM under H1: existence of QTL. Default is1e-3.tol: A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is1e-4.ρ: A tunning parameter controlling $\tau^2$. Default is0.001.LogP: Boolean. Default isfalse. Returns $-\log_{10}{P}$ instead of LOD scores iftrue.
!!! Note
- When some LOD scores return negative values, reduce tolerences for ECM to
tol0 = 1e-4, or increasedf_prior, such that $m+1 \le df_prior \le 2m$. The easiest setting isdf_prior = Int64(ceil(1.9m))for numerical stability.
Output
LODs(orlogP) : LOD scores. Can change to $- \log_{10}{P}$ inlod2logPifLogP = true.B: A 3-d array ofB(fixed effects) matrices under H1: existence of QTL. If covariates are added toXnul:Xnul= [ones(1,size(Y,2)); Covariates], ex. For sex covariates in 4-way cross analysis, B[:,2,100], B[:,3:5,100] are effects for sex, the rest genotypes of the 100th QTL, respectively.est0: A type ofEcmNestrv.Resultincluding parameter estimates under H0: no QTL for both functions. ReturnsEcmNestrv.ApproxifH0_up=truefor the FlxQTL model only.
FlxQTL.flxMLMM.permTest — Method permTest(nperm::Int64,cross,Kg,Y,XX::Markers;pval=[0.05 0.01],m=size(Y,1),Z=diagm(ones(m)),df_prior=m+1,
Prior::Matrix{Float64}=cov(Y,dims=2)*3,Xnul=ones(1,size(Y,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)
mlmmTest(nperm::Int64,cross,Kg,Y,XX::Markers;pval=[0.05 0.01],df_prior=m+1,
Prior::Matrix{Float64}=cov(Y,dims=2)*3,Xnul=ones(1,size(Y,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)Implement permutation test to get thresholds at the levels of type 1 error, α. Note that mlmmTest() is based on the conventional MLMM (Z=I). The FlxQTL model is defined as
\[vec(Y)\sim MVN((X' \otimes Z)vec(B) (or ZBX), K \otimes \Omega +I \otimes \Sigma),\]
where K is a genetic kinship, and $\Omega \approx \tau^2V_C$, $\Sigma$ are covariance matrices for random and error terms, respectively. $V_C$ is pre-estimated under the null model (H0) of no QTL from the conventiona MLMM, which is equivalent to the FlxQTL model for $\tau^2 =1$.
!!! NOTE
permutationTest()is implemented bygeneScanwith LOCO.
Arguments
nperm: An integer indicating the number of permutation to be implemented.cross: An integer indicating the number of alleles or genotypes. Ex.2for RIF,4for four-way cross,8for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.Kg: A n x n genetic kinship matrix. Should be symmetric positive definite.Y: A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e.Y[1,:](a vector) ->Y[[1],:](a matrix) .XX: A type ofMarkers.
Keyword Arguments
pval: A vector of p-values to get their quantiles. Default is[0.05 0.01](without comma).Xnul: A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.Z: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just useZ = I(default).Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon.m+1(weakly informative) is default.itol: A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is1e-3.tol0: A tolerance controlling ECM under H1: existence of QTL. Default is1e-3.tol: A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is1e-4.ρ: A tunning parameter controlling $\tau^2$. Default is0.001.
!!! Note
- When some LOD scores return negative values, reduce tolerences for ECM to
tol0 = 1e-4, or increasedf_prior, such that $m+1 \le df_prior \le 2m$. The easiest setting isdf_prior = Int64(ceil(1.9m))for numerical stability.
Output
maxLODs: A nperm x 1 vector of maximal LOD scores by permutation.H1par_perm: A type of struct,EcmNestrv.Approx(B,τ2,Σ,loglik)including parameter estimates orEcmNestrv.Result(B,Vc,Σ,loglik)for a conventional MLMM under H0: no QTL by permutation.cutoff: A vector of thresholds corresponding topval.
FlxQTL.flxMLMM.permutationTest — Method permutationTest(nperm,cross,Kg,Y,XX;pval=[0.05 0.01],Z=diagm(ones(m)),Xnul=ones(1,size(Y,2)),
df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*3,
LOCO_all::Bool=false,itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001,δ =0.01)Implement permutation test from the distribution of maximum LOD scores by LOCO to get thresholds at the levels of type 1 error, α. The FlxQTL model is defined as
\[vec(Y)\sim MVN((X' \otimes Z)vec(B) (or ZBX), K \otimes \Omega +I \otimes \Sigma),\]
where K is a genetic kinship, and $\Omega \approx \tau^2V_C$, $\Sigma$ are covariance matrices for random and error terms, respectively. $V_C$ is pre-estimated under the null model (H0) of no QTL from the conventiona MLMM, which is equivalent to the FlxQTL model for $\tau^2 =1$.
!!! NOTE
permTest()andmlmmTest()are implemented bygeneScanwithout LOCO.
Arguments
nperm: An integer (Int64) indicating the number of permutation to be implemented.cross: An integer (Int64) indicating the number of combination of alleles or genotypes. Ex.2for RIF,4for four-way cross,8for HS mouse (allele probabilities), etc. This value is related to degrees of freedom for the effect size of a genetic marker when doing genome scan.Kg: A 3d-array of n x n genetic kinship matrices. Should be symmetric positive definite.Y: A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e.Y[1,:](a vector) ->Y[[1],:](a matrix) .XX: A type ofMarkers.
Keyword Arguments
pval: A vector of p-values to get their quantiles. Default is[0.05 0.01](without comma).Xnul: A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.Z: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just useZ = diagm(ones(m)) or Matrix(1.0I,m,m)(default).Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon.m+1(weakly informative) is default.LOCO_all: Boolean. Default isfalse, which implementsgeneScan(LOCO=true)with permuted data but a null variance component (Vc) preestimated only once with a kinship (kinshipLinfor genotype (or allele) probabilities, orkinshipStdfor genotypes) byLOCO=falseimplicitly. It is recommended settingtruefor high-dimensional traits for better accuracy, i.e. $m\le 16 \sim 18$.itol: A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is1e-3.tol0: A tolerance controlling ECM under H1: existence of QTL. Default is1e-3.tol: A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is1e-4.ρ: A tunning parameter controlling $\tau^2$. Default is0.001.δ: A tuning parameter to correct q non-positive definite kinship without LOCO to pre-estimate a null variance component for low- to medium-dimensional traits ($m \le 16 \sim 18$) only. Thisno-LOCOkinship is computed inside the function for efficient computation.
!!! Note
- When some LOD scores return negative values, reduce tolerences for ECM to
tol0 = 1e-4, or increasedf_prior, such that $m+1 \le df_prior \le 2m$. The easiest setting isdf_prior = Int64(ceil(1.9m))for numerical stability.
Output
maxLODs: A nperm x 1 vector of maximal LOD scores by permutation.H1par_perm: A vector of structs,EcmNestrv.Approx(B,τ2,Σ,loglik)for each Chromosome per permutation, i.e.# of Chromosomesxnperm.cutoff: A vector of thresholds corresponding topval.
FlxQTL.EcmNestrv — ModuleEcmNestrvA module for base algorithms using the ECM (Expectation-Conditional Maxization) with the Speed restarting Nesterov's accelerated gradient method to fit a flexible multivariate linear mixed model (flxMLMM).
Genetic Relatedness Matrices (GRM)
FlxQTL.GRM — Module GRMA module for computing Genetic Relatedness Matrix (or kinship).
FlxQTL.GRM.kinship4way — Method kinship4way(genmat::Array{Float64,2})Computes a kinship for four-way cross data counting the occurrence of different alleles between two markers: ex. AB-AB=0; AB-AC=1; AB-CD=2,$\dots$ Note: In R/qtl, genotypes are labeled as 1=AC; 2=BC; 3=AD; 4=BD by the function, read.cross().
Argument
genmat: A matrix of genotypes forfour-way cross$(1,2, \dots)$. size(genematrix)= (p,n), forpgenetic markers xnindividuals(or lines).
Output
Returns a n x n symmetric matrix containing 1's on the diagonal.
FlxQTL.GRM.kinshipCtr — Method kinshipCtr(genmat::Array{Float64,2})Calculates a kinship with a centered genotype matrix (linear kernel), where a marker mean is subtracted from its genotypes.
Argument
genmat: A matrix of genotype data (0,1,2). size(genmat)=(p,n) forpmarkers xnindividuals
Output
Returns a n x n symmetric matrix. See also kinshipStd.
FlxQTL.GRM.kinshipGK — Method kinshipGK(G::Array{Float64,2},ρ::Float64)Computes a non-linear kinship matrix using a Gaussian Kernel.
Arguments
G: A matrix of genotype. size(G)=(r,c), such thatrgenotype markers andcindividuals (or lines).ρ: A free parameter determining the width of the kernel. Could be attained empirically.
Output
Returns a c x c symmetric (positive definite) matrix containing 1's on the diagonal.
FlxQTL.GRM.kinshipLin — MethodkinshipLin(mat,cross)Calculates a kinship matrix by linear kernel for genotype (or allele) probabilities. This is a general form of kinshipCtr, kinshipStd for genetype data, i.e. XX'.
Arguments
mat: A matrix of genotype (or allele) probabilities usually extracted from R/qtl, R/qtl2, or the counterpart packages. size(mat)= (p,n) for p genetic markers x n individuals.cross(>1) : An integer indicating the occurrence or combination of alleles or genotypes in a genetic marker. ex. 1 for genotypes (labeled as 0,1,2), 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
Output
Returns a n x n symmetric (positive definite) matrix containing 1's on the diagonal.
FlxQTL.GRM.kinshipLoco — Function kinshipLoco(kin,g::Markers,cross::Int64=1,ρ=0.01)Generates a 3-d array of symmetric positive definite kinship matrices using LOCO (Leave One Chromosome Out) witout shrinkage intensity estimation. This is designed for linear kernel based kinship functions. See Arguments.
Arguments
kin: A function of computing a kinship. Can only use withkinshipCtr,kinshipStdfor genotypes, and withkinshipLinfor genotype (or allele) probabilities.g: A struct of arrays, typeMarkers.cross: A scalar indicating the occurrence or combination of alleles or genotypes in a genetic marker. Simply, it coincides with the number of columns per marker in genotype (or allele) probability data; for instance, 1 for genotypes (0,1,2) as default, 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.ρ: A real number to correct a non-positive definite kinship. Default is 0.01. When a kinship is not corrected to be positive definite with the default value, slight increase ofρcan adjust to minuscule negative eigenvalues.
Output
Returns a 3-d array of n x n symmetric positive definite matrices as many as Chromosomes. Refer to shrinkgLoco for comparison.
FlxQTL.GRM.kinshipMan — Method kinshipMan(genematrix::Array{Float64,2})Calculates a kinship matrix using a manhattan distance method. Missing values need to be either omitted or imputed. This function is designed for recombinant inbred line (RIL) (AA/BB), not for 4-way cross genotype data. See kinship4way.
Argument
genematrix: A matrix of genotypes, i.e. 0,1 (or 1,2). size(genematrix)= (p,n) forpgenetic markers xnindividuals(or lines).
Output
Returns a n x n symmetric matrix containing 1's on the diagonal.
FlxQTL.GRM.kinshipStd — Method kinshipStd(genmat::Array{Float64,2})Produces a kinship with a standardized (or normalized) genotype matrix (linear kernel), i.e., centered genetypes divided by the standard deviation of a marker.
Argument
genmat: A matrix of genotype data (0,1,2). size(genmat)=(p,n) forpmarkers xnindividuals
Output
Returns a n x n symmetric matrix. See also kinshipCtr.
FlxQTL.GRM.shrinkg — Method shrinkg(f,nb::Int64,geno)Estimates a full-rank positive definite kinship matrix by shrinkage intensity estimation (bootstrap). Can only use with kinshipMan, kinship4way. This function runs fast by CPU parallelization. Add workers/processes using an addprocs() function before running it for speedup.
Arguments
f: A function of computing a kinship. Can only use withkinshipMan,kinship4way.nb: An integer indicating the number of bootstrap. It does not have to be a large number.geno: A matrix of genotypes. size(geno) = (p,n) forpmarkers xnindividuals.
Example
julia> using FlxQTL
julia> addprocs(8)
julia> K = shinkage(kinshipMan,20,myGeno)Output
Returns a full-rank symmetric positive definite matrix.
FlxQTL.GRM.shrinkgLoco — Method shrinkgLoco(kin,nb,g::Markers)Generates a 3-d array of full-rank positive definite kinship matrices by shrinkage intensity estimation (bootstrap) using a LOCO (Leave One Chromosome Out) scheme. Can only use with kinshipMan, kinship4way.
Arguments
kin: A function of computing a kinship. Can only use withkinshipMan,kinship4waynb: An integer indicating the number of bootstrap.g: A struct of arrays, typeMarkers.
Output
Returns 3-d array of n x n symmetric positive definite matrices as many as Chromosomes.
Flexible Multivariate Linear Models (flxMLM)
FlxQTL.flxMLM — ModuleflxMLMA module designed for fitting a Multivariate Linear Model by the (Residual) Maximum Likelihood (REML or MLE) method. The model:
$Y=XBZ'+E$, where $E(vec(Y))= (Z \otimes X)vec(B)$, $Var(vec(Y))= \Sigma \otimes I_n$,
where dim(Y)= (n individuals, m traits), dim(X) = (n,p markers), and dim(Z) = (m, q trait covariates).
FlxQTL.flxMLM.mlm1Scan — Functionmlm1Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;LogP::Bool=false,
Xnul::Matrix{Float64}=ones(size(Y,1),1))
mlm1Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;LogP::Bool=false,
Xnul::Matrix{Float64}=ones(size(Y,1),1))Implement 1d-genome scan. The second mlm1Scan() is for Z=I case; one can also run the first by inserting an identity matrix (Matrix(1.0I,m,m)) into Z. math vec(Y) \sim MVN ((Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I_n), where size(Y)=(n,m), size(X)=(n,p), and size(Z)=(m,q).
Arguments
cross: An integer indicating the occurrence of alleles or genotypes. ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.Y: A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in the arguement. i.e.Y0[:,1](a vector) ->Y[:,[1]](a matrix) .XX: A type ofMarkers. Be cautious when combining genotype infomation into the struct ofMarkers;size(X) = (n,p).Z: An optional m x q matrix of low-dimensional phenotypic (or trait) covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just useZ = I.reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented iftrue.
Keyword Arguments
Xnul: A matrix of covariates. Default is intercepts (1's):Xnul= ones(size(Y,1),1). Adding covariates (C) isXnul= hcat(ones(n),C)wheresize(C)=(c,n)forn = size(Y,1).LogP: Boolean. Default isfalse. Returns $-\log_{10}{P}$ instead of LOD scores iftrue.
Output
LODs(orlogP) : LOD scores. Can change to $- \log_{10}{P}$ inlod2logPifLogP = true.B: A 3-d array ofB(fixed effects) matrices under H1: existence of QTL. If covariates are added toXnul:Xnul= [ones(size(Y,1)) Covariates], ex. For sex covariates in 4-way cross analysis, B[2,:,100], B[3:5,:,100] are effects for sex, the rest genotypes of the 100th QTL, respectively.est0: A type ofMLM.Estimatincluding parameter estimates under H0: no QTL.
FlxQTL.flxMLM.mlm2Scan — Function mlm2Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;Xnul::Matrix{Float64}=ones(size(Y,1),1))
mlm2Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;Xnul::Matrix{Float64}=ones(size(Y,1),1))Implement 2d-genome scan. The second mlm2Scan() is for Z=I case; one can also run the first by inserting an identity matrix (Matrix(1.0I,m,m)) into Z. math vec(Y) \sim MVN ((Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I_n), where size(Y)=(n,m), size(X)=(n,p), and size(Z)=(m,q).
Arguments
cross: An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.Y: A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in arguement. i.e.Y0[:,1](a vector) ->Y[:,[1]](a matrix) .XX: A type ofMarkers. Be cautious when combining genotype infomation into the struct ofMarkers;size(X) = (n,p).Z: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just useZ = I.reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented iftrue.
Keyword Arguments
Xnul: A matrix of covariates. Default is intercepts (1's). Unless adding covariates, just leave as it is. Seemlm1Scan.
Output
LODs: LOD scores. Can change to $- \log_{10}{P}$ usinglod2logP.est0: A type ofMLM.Estimatincluding parameter estimates under H0: no QTL.
FlxQTL.flxMLM.mlmTest — Function mlmTest(nperm::Int64,cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;
Xnul::Matrix{Float64}=ones(size(Y,1),1),pval=[0.05 0.01])
mlmTest(nperm::Int64,cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;
Xnul::Matrix{Float64}=ones(size(Y,1),1),pval=[0.05 0.01])Implement permutation test to get thresholds at the levels of type 1 error, α. The second mlmTest() is for Z=I case; one can also run the first by inserting an identity matrix (Matrix(1.0I,m,m)) into Z. math vec(Y) \sim MVN ((Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I_n), where size(Y)=(n,m), size(X)=(n,p), and size(Z)=(m,q).
Arguments
nperm: An integer indicating the number of permutation to be implemented.cross: An integer indicating the incidence of alleles or genotypes per marker. Ex.2for RIF,4for four-way cross,8for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.Y: A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in arguement. i.e.Y[:,1](a vector) ->Y[:,[1]](a matrix) .XX: A type ofMarkers. Be cautious when combining genotype infomation into the struct ofMarkers;size(X) = (n,p).reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented iftrue.
Keyword Arguments
pval: A vector of p-values to get their quantiles. Default is[0.05 0.01](without comma).Xnul: A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.Z: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just useZ = I.
Output
maxLODs: A nperm x 1 vector of maximal LOD scores by permutation.H1par_perm: A type of struct,MLM.Estimat(B,Σ,loglik)including parameter estimates for a MLM under H0: no QTL by permutation.cutoff: A vector of thresholds corresponding topval.
FlxQTL.MLM — Module MLMA module for fitting general Multivariate Linear Models motivated by functional data analysis via mle or reml. The default fitting method is mle. ( i.e. reml=false)
The model: $Y=XBZ'+E$, where $E(Y)=XBZ'$ (or $E(vec(Y))= (Z \otimes X)vec(B)$ ), $var(vec(E))=\Sigma \otimes I.$ size(Y)=(n,m), size(X)=(n,p), size(Z)=(m,q).
FlxQTL.MLM.Estimat — Type Estimat(B::Array{Float64,2},Σ::Array{Float64,2},loglik::Float64)A struct of arrays for results by fitting a multivariate linear model, mGLM(). It includes B(fixed effects), Σ (m x m covariance matrix), loglik(a value of log-likelihood by mle or reml).
FlxQTL.MLM.mGLM — Function mGLM(Y::Array{Float64,2},X::Array{Float64,2},Z::Array{Float64,2},reml::Bool=false)
mGLM(Y::Array{Float64,2},X::Array{Float64,2},reml::Bool=false)Fitting multivariate General Linear Models via MLE (or REML) and returns a type of a struct Estimat.
Arguments
Y: A matrix of response variables, i.e. traits. size(Y)=(n,m) for n individuals x m traitsX: A matrix of independent variables, i.e. genotypes or genotype probabilities including intercept or/and covariates. size(X)=(n,p) for n individuals x p markers including intercept or/and covariatesZ: An optional matrix of low-dimensional phenotypic or trait covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). size(Z)=(m,q) for m traits x q phenotypic covariates. If the data does not assume any particular trait relation, just useZ = Matrix(1.0I,m,m)or the second functon.reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented iftrue.
Output
Returns Estimat .
Utility Functions (Util)
FlxQTL.Util — Module UtilA module for utility functions.
FlxQTL.Util.Markers — TypeMarkers(name::Array{String,1},chr::Array{Any,1},pos::Array{Float64,1},X::Array{Float64,2})A struct of arrays creating genotype or genotype probability data for genome scan.
Arguments
name: A vector of marker nameschr: A vector of Chromosomespos: A vector of marker positions (cM)X: A matrix of genotypes or genotype probabilities
FlxQTL.Util.Y_huber — MethodY_huber(Y::Array{Float64,2})Rescale Y (phenotype or trait data) to be less sensitive to outliers using by Huber loss function and MAD (median absolute deviation). size(Y)=(m,n) for m trait and n individuals.
FlxQTL.Util.array2mat — Method array2mat(cross::Int64,X0::Array{Float64,3})Returns a 3-d array to a matrix of genotype probabilities. size(X0)=(cross,n,p) –> (p1,n), where p1 = cross*p for p markers, cross alleles or genotypes, and n individuals. See mat2array.
FlxQTL.Util.getFinoidx — MethodgetFinoidx(phenoData::Array{Union{Missing,Float64},2})Attains indices of phenotype data without missing values.
Argument
phenoData: A matrix of phenotype (or trait) data including missing values. size(phenoData) = (m,n) formtraits andnindividuals.
FlxQTL.Util.getGenoidx — FunctiongetGenoidx(GenoData::Union{Array{Any,2},Array{Float64,2}},maf::Float64=0.025)Attains genotype indices to drop correlated or bad markers.
Arguments
GenoData: A matrix of genotype data. size(GenoData)= (p,n) forpmarkers andnindividuals.maf: A scalar for dropping criterion of markers. Default is0.025i.e. markers of MAF < 0.025 are dropped.
FlxQTL.Util.lod2logP — Methodlod2logP(LODs::Union{Array{Float64,1},Array{Any,1}},v::Int64)Caculates $-\log_{10}{P}$ from LOD scores.
Arguments
LODs: A vector of LOD scores computed from genome scan.v: Degrees of freedom for Chi-squared distribution.
!!! NOTE
- To compute degrees of freedom for 1D-genome scan (
geneScan),v = prod(size(B1))-prod(size(B0), whereB1andB0are effect size matrices under H1 of a full model (intercept + covariates + QTL) and H0 of no QTL (intercept + covariates), respectively.
FlxQTL.Util.mat2array — Method mat2array(cross::Int64,X0)Returns a matrix of genotype probabilities to 3-d array. size(X0)=(p1,n) –> (cross,n,p), where p1 = cross*p for p markers, cross alleles or genotypes, and n individuals.
Argument
cross: An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.X0: A matrix of genotype probability data computed from r/qtl or r/qtl2.
See array2mat.
FlxQTL.Util.mat2vec — Method mat2vec(mat)Stacks a matrix to a vector, i.e. vectorizing a matrix.
FlxQTL.Util.newMarkers — FunctionnewMarkers(XX::Markers,cross::Int64,cM::Int64=2)Returns a struct of Markers by keeping only markers positioned in every cM centimorgans for any genome scan to avoid singularity.
Arguments
XX: A type ofMarkers. SeeMarkers.cross: An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.cM: An integer of dropping criterion of markers. Default is 2, i.e. keeping only markers in every 2 cM, or dropping markers within 2cM between 2 markers.
FlxQTL.Util.ordrMarkers — MethodordrMarkers(markers::Array{Any,2})Rearrange by CPU parallelization marker information composed of marker name, chr, position obtained from rqtl2, which is not listed in order (excluding X chromosome).
Argument
markers: An array of marker information.
FlxQTL.Util.setSeed — MethodsetSeed(seedNum::Int64)Assigns different numbers of seeds to workers (or processes) for reproducibility.
Arguments
seedNum: An integer. A minimum seed number to assign a worker. For distributed computing, seed numbers are generated for multiple workers by increasing 1, e.g. seedNum = 123 & 10 workers,setSeedgenerates seeds from 123 to 132 and assigns to corresponding workers (processes).
Examples
julia> using Distributed
julia> addprocs(10)
julia> @everywhere using flxQTL
julia> setSeed(123)
FlxQTL.Util.sortBycM — Function sortBycM(Chr::Any,XX::Markers,cross::Int64,cM::Int64=2)Returns marker indices in Chromosome Chr and the corresponding genotype probabilities keeping only markers positioned in every cM centimorgans for any genome scan to avoid singularity.
Arguments
Chr: A type of Any indicating a particular chromosome to sort markers out.XX: A type ofMarkers.cross: An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.cM: An integer of dropping criterion of markers. Default is 2, i.e. keeping only markers in every 2 cM, or dropping markers within 2cM between 2 markers.
See also newMarkers