Functions and Types
Flexible Multivariate Linear Mixed Models (flxMLMM)
FlxQTL.FlxQTL — Module
FlxQTLFlexible QTL analysis tools for structured multiple traits fitting a Multivariate Linear Mixed Model or a Multivariate Linear Model.
FlxQTL.flxMLMM — Module
flxMLMMA module designed for fitting a Multivariate Linear Mixed Model run by the Nesterov's Accelerated Gradient with restarting scheme embedding the Expectation Conditional Maximization to estimate MLEs. REML is not supported. The FlxQTL model is defined as
\[vec(Y)\sim MVN((X' \otimes Z)vec(B) (or ZBX), K \otimes \Omega +I \otimes \Sigma),\]
where K is a genetic kinship, and $\Omega \approx \tau^2V_C$, $\Sigma$ are covariance matrices for random and error terms, respectively. $V_C$ is pre-estimated under the null model (H0) of no QTL from the conventional MLMM, which is equivalent to the FlxQTL model for $\tau^2 =1$. $Z \neq I_m$ estimates much smaller B than the former model with Z = I, where dim(Y) = (m traits, n individuals), and dim(X) = (p markers, n), dim(Z) = (m, q trait covariates).
FlxQTL.flxMLMM.K2eig — Function
K2eig(K,LOCO::Bool=false)Returns eigenvectors and eigenvalues of genetic relatedness, or 3-d array of these of a genetic relatedness if LOCO is true.
Arguments
K: A matrix of genetic relatedness (Default). 3-d array of genetic relatedness (LOCOsets to be true.)LOCO: Boolean. Default isfalse(no LOCO). (Leave One Chromosome Out).
Output
T: A matrix of eigenvectors, or a 3-d array of eigenvectors ifLOCOsets to betrue.λ: A vector of eigenvalues, or a matrix of eigenvalues ifLOCOsets to betrue.
Examples
For a null variance component, or genetic relatedness for LOCO =false,
T, λ = K2eig(K)
produces a matrix of T and a vector of λ.
For a genetic kinship calculated under LOCO (a 3-d array of kinship matrices),
T, λ = K2eig(K,true)
produces a 3-d array of matrices T and a matrix of λ.
FlxQTL.flxMLMM.gene1Scan — Function
gene1Scan(cross::Int64,Tg::Union{Array{Float64,3},Matrix{Float64}},Λg::Union{Matrix{Float64},Vector{Float64}},
Y::Array{Float64,2},XX::Markers,LOCO::Bool=false;penalize::Bool=false,Z=diagm(ones(m)),H0_up::Bool=false,
Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*3,
LogP::Bool=false,itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
gene1Scan(Tg,Λg,Y::Array{Float64,2},XX::Markers,cross::Int64,LOCO::Bool=false;penalize::Bool=false,Xnul::Array{Float64,2}=ones(1,size(Y,2)),
df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*3,LogP::Bool=false,
itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)Implement 1d-genome scan with/without LOCO (Leave One Chromosome Out) fitted by FlxQTL models with/without Z and with the option of penalization that preestimate a null variance component matrix ($V_C$) under H0 : no QTL, followed by its adjustment by a scalar parameter under H1 : existing QTL. The second type gene1Scan() is fitted by the conventional MLMM that estimate all parameters under H0/H1 with the option of penalization. The FlxQTL model is defined as
\[vec(Y)\sim MVN((X' \otimes Z)vec(B) (or ZBX), Kg \otimes \Omega +I \otimes \Sigma),\]
where Kg is a genetic kinship, and $\Omega \approx \tau^2V_C$, $\Sigma$ are covariance matrices for random and error terms, respectively. $V_C$ is pre-estimated under the null model (H0) of no QTL from the conventional MLMM, which is equivalent to the FlxQTL model for $\tau^2 =1$.
Arguments
cross: An integer (Int64) indicating the occurrence of combination of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degrees of freedom for the effect size of the genetic marker when doing genome scan.Tg: A n x n matrix of eigenvectors, or a 3d-array of eigenvectors ifLOCOis true. See alsoK2eig.Λg: A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues ifLOCOis true.Y: A m x n matrix of response variables, i.e. m traits by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e.Y0[1,:](a vector) ->Y[[1],:](a matrix) .XX: A type ofMarkers.Z: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (wavelet, polynomials, B-splines, etc.). If no assumption among traits, insert an identity matrix,Matrix(1.0I,m,m), or use the secondgeneScan().LOCO: Boolean. Default isfalse(no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out) iftrue.
Keyword Arguments
penalize: Boolean. Default isfalse(no penalization). For higher dimensional traits, i.e. largem=size(Y,1), penalization is recommended, i.e. setpenalize=truefor numerical stability with adjustment ofdf_prioror/andPriorif necessary.Xnul: A matrix of covariates. Default is intercepts (1's):Xnul= ones(1,size(Y,2)). Adding covariates (C) isXnul= vcat(ones(1,n),C)wheresize(C)=(c,n).Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon.m+1(weakly informative) is default.H0_up: Default returns null estimates,est0from the conventional MLMM. It is recommended settingH0_up=truefor higher dimensional traits, e.g. $m \ge 18 or more$ depending on the data, to avoid negative LODs.itol: A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is1e-3.tol0: A tolerance controlling ECM under H1: existence of QTL. Default is1e-3.tol: A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is1e-4.ρ: A tunning parameter controlling $\tau^2$. Default is0.001.LogP: Boolean. Default isfalse. Returns $-\log_{10}{P}$ instead of LOD scores iftrue.
!!! Note
- If some LOD scores return negative values under penalization or no penalization, then you may reduce tolerences for ECM to e.g.,
tol0 = 1e-4(no penalization), setH0_up=true, or (and) switch to penalization (penalize=true) followed by adjustingdf_prior, such that $m+1 \le$df_prior$< 2m$ to avoid singularity errors. The last resort could bedf_prior = Int64(ceil(1.9m))when any of them would not work. Adjustingdf_prioris more effective than doingPrior; we do recommend this adjustment for higher dimensional traits with genotype probability data, depending on the data–we have tested no penalization option witout error up to $m = 16$ with genotype probabilities orm = 30with genotypes. Lower dimensional traits with penalization slow performance.
Output
result: A vector of LOD scores,LODsas default or $- \log_{10}{P}$ bylod2logPifLogP = true.B: A 3-d array ofB(fixed effects) matrices under H1: existence of QTL. If sex covariates, e.g. size(C)=(1,n), are added toXnul:Xnul= [ones(1,size(Y,2)); C]in 4-way cross analysis, B[:,2,100], B[:,3:5,100] are effects for sex, the rest genotypes of the 100th QTL, respectively.H0est: A type ofEcmNestrv.Resultincluding parameter estimates under H0: no QTL for both functions.
FlxQTL.flxMLMM.gene2Scan — Function
gene2Scan(cross::Int64,Tg::Union{Array{Float64,3},Matrix{Float64}},Λg::Union{Matrix{Float64},Vector{Float64}},
Y::Array{Float64,2},XX::Markers,LOCO::Bool=false,penalize::Bool=false;m=size(Y,1),Z=diagm(ones(m)),
Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*3,
itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
gene2Scan(Tg,Λg,Y,XX,cross,LOCO,penalize;Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,
Prior::Matrix{Float64}=cov(Y,dims=2)*3,itol=1e-4,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)Implement 2d-genome scan with/without LOCO (Leave One Chromosome Out). The first function is a FlxQTL model with/without Z and with the option of penalization that preestimate a null variance component matrix ($V_C$) under H0 : no QTL, followed by its adjustment by a scalar parameter under H1 : existing QTL. The second type of gene2Scan() is fitted by the conventional MLMM that estimate all parameters under H0/H1 with the option of penalization. The FlxQTL model is defined as
\[vec(Y)\sim MVN((X' \otimes Z)vec(B) (or ZBX), Kg \otimes \Omega +I \otimes \Sigma),\]
where Kg is a genetic kinship, and $\Omega \approx \tau^2V_C$, $\Sigma$ are covariance matrices for random and error terms, respectively. $V_C$ is pre-estimated under the null model (H0) of no QTL from the conventional MLMM, which is equivalent to the FlxQTL model for $\tau^2 =1$.
Arguments
cross: An integer (Int64) indicating the occurrence of combination of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degrees of freedom for the effect size of the genetic marker when doing genome scan.Tg: A n x n matrix of eigenvectors, or a 3d-array of eigenvectors ifLOCOis true. See alsoK2eig.Λg: A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues ifLOCOis true.Y: A m x n matrix of response variables, i.e. m traits by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e.Y0[1,:](a vector) ->Y[[1],:](a matrix) .XX: A type ofMarkers.Z: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (wavelet, polynomials, B-splines, etc.). If no assumption among traits, insert an identity matrix,Matrix(1.0I,m,m), or use the secondgeneScan().LOCO: Boolean. Default isfalse(no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out) iftrue.
Keyword Arguments
penalize: Boolean. Default isfalse(no prior used for penalization). For higher dimensional traits, i.e.large m=size(Y,1), penalization is recommended; setpenalize=truewith adjustment ofdf_prioror/andPriorif necessary.Xnul: A matrix of covariates. Default is intercepts (1's):Xnul= ones(1,size(Y,2)). Adding covariates (C) isXnul= vcat(ones(1,n),C)wheresize(C)=(c,n).Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon.m+1(weakly informative) is default.itol: A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is1e-3.tol0: A tolerance controlling ECM under H1: existence of QTL. Default is1e-3.tol: A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is1e-4.ρ: A tunning parameter controlling $\tau^2$. Default is0.001.
!!! Note
- If some LOD scores return negative values under penalization or no penalization, you may reduce tolerences for ECM to e.g.,
tol0 = 1e-4(no penalization), or switch to penalization (penalize=true), follwed by adjustingdf_prior, such that $m+1 \le$df_prior$< 2m$ to avoid singluarity errors. The last resort could bedf_prior = Int64(ceil(1.9m))unless any of them would work. Adjustingdf_prioris more effective than doingPrior; we do recommend this adjustment for higher dimensional traits with genotype probability data, depending on the data–we have tested no penalization option witout error up to $m = 16$ with genotype probabilities orm = 30with genotypes. Lower dimensional traits with penalization slow performance.
Output
LODs: A matrix of LOD scores. Can change to $- \log_{10}{P}$ usinglod2logP.H0est: A type ofEcmNestrv.Resultincluding parameter estimates under H0: no QTL.
FlxQTL.flxMLMM.mlmmTest — Function
mlmmTest(nperm::Int64,cross::Int64,Kg,Y,XX::Markers;pval=[0.05 0.01],df_prior=m+1,
Prior::Matrix{Float64}=cov(Y,dims=2)*3,Xnul=ones(1,size(Y,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)Implement permutation test without LOCO to get thresholds at the levels of type 1 error, α by the conventional MLMM (Z=I) with an option of penalization. See also permutationTest.
Arguments
nperm: An integer indicating the number of permutation to be implemented.cross: An integer indicating the number of alleles or genotypes. Ex.2for RIF,4for four-way cross,8for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.Kg: A n x n genetic kinship matrix. Should be symmetric positive definite.Y: A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e.Y[1,:](a vector) ->Y[[1],:](a matrix) .XX: A type ofMarkers.penalize: Boolean. Default isfalse(no penalization). For higher dimensional traits, i.e.large m=size(Y,1), penalization is recommended; setpenalize=truewith adjustment ofdf_prioror/andPriorif necessary.
Keyword Arguments
pval: A vector of p-values to get their quantiles. Default is[0.05 0.01](without comma).Xnul: A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon.m+1(weakly informative) is default.itol: A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is1e-4.tol0: A tolerance controlling ECM under H1: existence of QTL. Default is1e-3.tol: A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is1e-4.ρ: A tunning parameter controlling $\tau^2$. Default is0.001.
!!! Note
- When some LOD scores return negative values, you may reduce tolerence for ECM (
tol0) to run longer, or increasedf_prior, where $m+1 \le$df_prior$< 2m$ to avoid singularity errors. The last resort could bedf_prior = Int64(ceil(1.9m))unless any of them would works.
Output
maxLODs: A nperm x 1 vector of maximal LOD scores by permutation.H1par_perm: A type of struct,EcmNestrv.Approx(B,τ2,Σ,loglik)including parameter estimates orEcmNestrv.Result(B,Vc,Σ,loglik)for a conventional MLMM under H0: no QTL by permutation.cutoff: A vector of thresholds corresponding topval.
FlxQTL.flxMLMM.permutationTest — Function
permutationTest(nperm::Int64,cross::Int64,Kg::Array{Float64,3},Y::Array{Float64,2},XX::Markers,LOCO::Bool=false,penalize::Bool=false;pval=[0.05 0.01],
Z=diagm(ones(m)),df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*3,Xnul=ones(1,size(Y,2)),
itol=1e-4,tol0=1e-4,tol=1e-4,ρ=0.001)
permTest(nperm,cross,Kg::Array{Float64,3},Y,XX;pval=[0.05 0.01],Z=diagm(ones(m)),df_prior=m+1,Prior=cov(Y,dims=2)*3,Xnul=ones(1,size(Y,2)),
itol=1e-4,tol0=1e-4,tol=1e-4,ρ=0.001,δ=0.01)Implement permutation test by estimating the distribution of maximum LOD scores to get thresholds at the levels of type 1 error, α. The FlxQTL model is defined as
\[vec(Y)\sim MVN((X' \otimes Z)vec(B) (or ZBX), Kg \otimes \Omega +I \otimes \Sigma),\]
where Kg is a genetic kinship, and $\Omega \approx \tau^2V_C$, $\Sigma$ are covariance matrices for random and error terms, respectively. $V_C$ is estimated under the null model (H0) of no QTL from the conventional MLMM, which is equivalent to the FlxQTL model for $\tau^2 =1$.
!!! NOTE
permutationTest()has options of penalization as well as LOCO. Depending on the data size and computer capacity, one can select any combination. There may exist slight to moderate differences in the choice of LOCO and penalization.permTest()is implemented by permutation test with LOCO, but $\Omega$ is estimated with a genetic kinship bykinshipLinwithout LOCO.mlmmTestis implemented by the conventional MLMM without LOCO.
Arguments
nperm: An integer (Int64) indicating the number of permutation to be implemented.cross: An integer (Int64) indicating the number of combination of alleles or genotypes. Ex.2for RIF,4for four-way cross,8for HS mouse (allele probabilities), etc. This value is related to degrees of freedom for the effect size of a genetic marker when doing genome scan.Kg: A 3d-array of n x n genetic kinship matrices (LOCO = trueor only forpermTest()) or a matrix of genetic kinship for the default option ofLOCO = false. Should be symmetric positive definite.Y: A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e.Y[1,:](a vector) ->Y[[1],:](a matrix) .XX: A type ofMarkers.LOCO: Boolean. Default isfalse(no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out) iftrue.penalize: Boolean. Default isfalse(no penalization). For higher dimensional traits, i.e. largem=size(Y,1), penalization is recommended, i.e. setpenalize=truefor numerical stability with adjustment ofdf_prioror/andPriorif necessary.
Keyword Arguments
pval: A vector of p-values to get their quantiles. Default is[0.05 0.01](without comma).Xnul: A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.Z: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just useZ = diagm(ones(m)) or Matrix(1.0I,m,m)(default).Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon.m+1(weakly informative) is default.itol: A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is1e-3.tol0: A tolerance controlling ECM under H1: existence of QTL. Default is1e-3.tol: A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is1e-4.ρ: A tunning parameter controlling $\tau^2$. Default is0.001.δ: A tuning parameter inpermTest()to correct a non-positive definite kinship without LOCO to pre-estimate a null variance component. Thisno-LOCOkinship is computed inside the function for efficient computation.
!!! Note
- When some LOD scores return negative values, you may reduce tolerences for ECM to
tol0 = 1e-4, or increasedf_prior, where $m+1 \le$df_prior$< 2m$. The last resort could bedf_prior = Int64(ceil(1.9m))to avoid sigularity errors unless any of them works. Adjustingdf_priorworks better than doing'Prior; we do recommend this adjustment for higher dimensional traits with genotype probability data, depending on the data–we have tested no penalization option witout error up to $m = 16$ with genotype probabilities orm = 30with genotypes. Lower dimensional traits with penalization slow performance. - This LOCO version of permutation test is desirable to be implemented by high-performance computers.
Output
maxLODs: A nperm x 1 vector of maximal LOD scores by permutation.H1par_perm: A vector of structs,EcmNestrv.Approx(B,τ2,Σ,loglik)for each Chromosome per permutation, i.e.# of Chromosomesxnperm.cutoff: A vector of thresholds corresponding topval.
FlxQTL.EcmNestrv — Module
EcmNestrvA module for base algorithms using the ECM (Expectation-Conditional Maxization) with the Speed restarting Nesterov's accelerated gradient method to fit a flexible multivariate linear mixed model (flxMLMM).
Genetic Relatedness Matrices (GRM)
FlxQTL.GRM — Module
GRMA module for computing Genetic Relatedness Matrix (or kinship).
FlxQTL.GRM.kinship4way — Method
kinship4way(genmat::Array{Float64,2})Computes a kinship for four-way cross data counting the occurrence of different alleles between two markers: ex. AB-AB=0; AB-AC=1; AB-CD=2,$\dots$ Note: In R/qtl, genotypes are labeled as 1=AC; 2=BC; 3=AD; 4=BD by the function, read.cross().
Argument
genmat: A matrix of genotypes forfour-way cross$(1,2, \dots)$. size(genematrix)= (p,n), forpgenetic markers xnindividuals(or lines).
Output
Returns a n x n symmetric matrix containing 1's on the diagonal.
FlxQTL.GRM.kinshipCtr — Method
kinshipCtr(genmat::Array{Float64,2})Calculates a kinship with a centered genotype matrix (linear kernel), where a marker mean is subtracted from its genotypes.
Argument
genmat: A matrix of genotype data (0,1,2). size(genmat)=(p,n) forpmarkers xnindividuals
Output
Returns a n x n symmetric matrix. See also kinshipStd.
FlxQTL.GRM.kinshipGK — Method
kinshipGK(G::Array{Float64,2},ρ::Float64)Computes a non-linear kinship matrix using a Gaussian Kernel.
Arguments
G: A matrix of genotype. size(G)=(r,c), such thatrgenotype markers andcindividuals (or lines).ρ: A free parameter determining the width of the kernel. Could be attained empirically.
Output
Returns a c x c symmetric (positive definite) matrix containing 1's on the diagonal.
FlxQTL.GRM.kinshipLin — Method
kinshipLin(mat,cross)Calculates a kinship matrix by linear kernel for genotype (or allele) probabilities. This is a general form of kinshipCtr, kinshipStd for genetype data, i.e. XX'.
Arguments
mat: A matrix of genotype (or allele) probabilities usually extracted from R/qtl, R/qtl2, or the counterpart packages. size(mat)= (p,n) for p genetic markers x n individuals.cross(>1) : An integer indicating the occurrence or combination of alleles or genotypes in a genetic marker. ex. 1 for genotypes (labeled as 0,1,2), 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
Output
Returns a n x n symmetric (positive definite) matrix containing 1's on the diagonal.
FlxQTL.GRM.kinshipLoco — Function
kinshipLoco(kin,g::Markers,cross::Int64=1,ρ=0.01)Generates a 3-d array of symmetric positive definite kinship matrices using LOCO (Leave One Chromosome Out) witout shrinkage intensity estimation. This is designed for linear kernel based kinship functions. See Arguments.
Arguments
kin: A function of computing a kinship. Can only use withkinshipCtr,kinshipStdfor genotypes, and withkinshipLinfor genotype (or allele) probabilities.g: A struct of arrays, typeMarkers.cross: A scalar indicating the occurrence or combination of alleles or genotypes in a genetic marker. Simply, it coincides with the number of columns per marker in genotype (or allele) probability data; for instance, 1 for genotypes (0,1,2) as default, 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.ρ: A real number to correct a non-positive definite kinship. Default is 0.01. When a kinship is not corrected to be positive definite with the default value, slight increase ofρcan adjust to minuscule negative eigenvalues.
Output
Returns a 3-d array of n x n symmetric positive definite matrices as many as Chromosomes. Refer to shrinkgLoco for comparison.
FlxQTL.GRM.kinshipMan — Method
kinshipMan(genematrix::Array{Float64,2})Calculates a kinship matrix using a manhattan distance method. Missing values need to be either omitted or imputed. This function is designed for recombinant inbred line (RIL) (AA/BB), not for 4-way cross genotype data. See kinship4way.
Argument
genematrix: A matrix of genotypes, i.e. 0,1 (or 1,2). size(genematrix)= (p,n) forpgenetic markers xnindividuals(or lines).
Output
Returns a n x n symmetric matrix containing 1's on the diagonal.
FlxQTL.GRM.kinshipStd — Method
kinshipStd(genmat::Array{Float64,2})Produces a kinship with a standardized (or normalized) genotype matrix (linear kernel), i.e., centered genetypes divided by the standard deviation of a marker.
Argument
genmat: A matrix of genotype data (0,1,2). size(genmat)=(p,n) forpmarkers xnindividuals
Output
Returns a n x n symmetric matrix. See also kinshipCtr.
FlxQTL.GRM.shrinkg — Method
shrinkg(f,nb::Int64,geno)Estimates a full-rank positive definite kinship matrix by shrinkage intensity estimation (bootstrap). Can only use with kinshipMan, kinship4way. This function runs fast by CPU parallelization. Add workers/processes using an addprocs() function before running it for speedup.
Arguments
f: A function of computing a kinship. Can only use withkinshipMan,kinship4way.nb: An integer indicating the number of bootstrap. It does not have to be a large number.geno: A matrix of genotypes. size(geno) = (p,n) forpmarkers xnindividuals.
Example
julia> using FlxQTL
julia> addprocs(8)
julia> K = shinkage(kinshipMan,20,myGeno)Output
Returns a full-rank symmetric positive definite matrix.
FlxQTL.GRM.shrinkgLoco — Method
shrinkgLoco(kin,nb,g::Markers)Generates a 3-d array of full-rank positive definite kinship matrices by shrinkage intensity estimation (bootstrap) using a LOCO (Leave One Chromosome Out) scheme. Can only use with kinshipMan, kinship4way.
Arguments
kin: A function of computing a kinship. Can only use withkinshipMan,kinship4waynb: An integer indicating the number of bootstrap.g: A struct of arrays, typeMarkers.
Output
Returns 3-d array of n x n symmetric positive definite matrices as many as Chromosomes.
Flexible Multivariate Linear Models (flxMLM)
FlxQTL.flxMLM — Module
flxMLMA module designed for fitting a Multivariate Linear Model by the (Residual) Maximum Likelihood (REML or MLE) method. The model:
$Y=XBZ'+E$,
where $E(vec(Y))= (Z \otimes X)vec(B)$, $Var(vec(Y))= \Sigma \otimes I_n$,
dim(Y)= (n individuals, m traits), dim(X) = (n,p markers), and dim(Z) = (m, q trait covariates).
FlxQTL.flxMLM.mlm1Scan — Function
mlm1Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;LogP::Bool=false,
Xnul::Matrix{Float64}=ones(size(Y,1),1))
mlm1Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;LogP::Bool=false,
Xnul::Matrix{Float64}=ones(size(Y,1),1))Implement 1d-genome scan. The second mlm1Scan() is for Z=I case; one can also run the first by inserting an identity matrix (Matrix(1.0I,m,m)) into Z.
$vec(Y) \sim MVN ((Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I_n),$
where size(Y)=(n,m), size(X)=(n,p), and size(Z)=(m,q).
Arguments
cross: An integer indicating the occurrence of alleles or genotypes. ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.Y: A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in the arguement. i.e.Y0[:,1](a vector) ->Y[:,[1]](a matrix) .XX: A type ofMarkers. Be cautious when combining genotype infomation into the struct ofMarkers;size(X) = (n,p).Z: An optional m x q matrix of low-dimensional phenotypic (or trait) covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just useZ = I.reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented iftrue.
Keyword Arguments
Xnul: A matrix of covariates. Default is intercepts (1's):Xnul= ones(size(Y,1),1). Adding covariates (C) isXnul= hcat(ones(n),C)wheresize(C)=(c,n)forn = size(Y,1).LogP: Boolean. Default isfalse. Returns $-\log_{10}{P}$ instead of LOD scores iftrue.
Output
LODs(orlogP) : LOD scores. Can change to $- \log_{10}{P}$ inlod2logPifLogP = true.B: A 3-d array ofB(fixed effects) matrices under H1: existence of QTL. If covariates are added toXnul:Xnul= [ones(size(Y,1)) Covariates], ex. For sex covariates in 4-way cross analysis, B[2,:,100], B[3:5,:,100] are effects for sex, the rest genotypes of the 100th QTL, respectively.est0: A type ofMLM.Estimatincluding parameter estimates under H0: no QTL.
FlxQTL.flxMLM.mlm2Scan — Function
mlm2Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;Xnul::Matrix{Float64}=ones(size(Y,1),1))
mlm2Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;Xnul::Matrix{Float64}=ones(size(Y,1),1))Implement 2d-genome scan. The second mlm2Scan() is for Z=I case; one can also run the first by inserting an identity matrix (Matrix(1.0I,m,m)) into Z.
$vec(Y) \sim MVN ((Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I_n),$
where size(Y)=(n,m), size(X)=(n,p), and size(Z)=(m,q).
Arguments
cross: An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.Y: A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in arguement. i.e.Y0[:,1](a vector) ->Y[:,[1]](a matrix) .XX: A type ofMarkers. Be cautious when combining genotype infomation into the struct ofMarkers;size(X) = (n,p).Z: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just useZ = I.reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented iftrue.
Keyword Arguments
Xnul: A matrix of covariates. Default is intercepts (1's). Unless adding covariates, just leave as it is. Seemlm1Scan.
Output
LODs: LOD scores. Can change to $- \log_{10}{P}$ usinglod2logP.est0: A type ofMLM.Estimatincluding parameter estimates under H0: no QTL.
FlxQTL.flxMLM.mlmTest — Function
mlmTest(nperm::Int64,cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;
Xnul::Matrix{Float64}=ones(size(Y,1),1),pval=[0.05 0.01])
mlmTest(nperm::Int64,cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;
Xnul::Matrix{Float64}=ones(size(Y,1),1),pval=[0.05 0.01])Implement permutation test to get thresholds at the levels of type 1 error, α. The second mlmTest() is for Z=I case; one can also run the first by inserting an identity matrix (Matrix(1.0I,m,m)) into Z.
$vec(Y) \sim MVN ((Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I_n),$
where size(Y)=(n,m), size(X)=(n,p), and size(Z)=(m,q).
Arguments
nperm: An integer indicating the number of permutation to be implemented.cross: An integer indicating the incidence of alleles or genotypes per marker. Ex.2for RIF,4for four-way cross,8for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.Y: A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in arguement. i.e.Y[:,1](a vector) ->Y[:,[1]](a matrix) .XX: A type ofMarkers. Be cautious when combining genotype infomation into the struct ofMarkers;size(X) = (n,p).reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented iftrue.
Keyword Arguments
pval: A vector of p-values to get their quantiles. Default is[0.05 0.01](without comma).Xnul: A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.Z: An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just useZ = I.
Output
maxLODs: A nperm x 1 vector of maximal LOD scores by permutation.H1par_perm: A type of struct,MLM.Estimat(B,Σ,loglik)including parameter estimates for a MLM under H0: no QTL by permutation.cutoff: A vector of thresholds corresponding topval.
FlxQTL.MLM — Module
MLMA module for fitting general Multivariate Linear Models motivated by functional data analysis via mle or reml. The default fitting method is mle. ( i.e. reml=false)
The model: $Y=XBZ'+E$,
where $E(Y)=XBZ'$ (or $E(vec(Y))= (Z \otimes X)vec(B)$ ),
$var(vec(E))=\Sigma \otimes I,$
and size(Y)=(n,m), size(X)=(n,p), and size(Z)=(m,q).
FlxQTL.MLM.Estimat — Type
Estimat(B::Array{Float64,2},Σ::Array{Float64,2},loglik::Float64)A struct of arrays for results by fitting a multivariate linear model, mGLM(). It includes B(fixed effects), Σ (m x m covariance matrix), loglik(a value of log-likelihood by mle or reml).
FlxQTL.MLM.mGLM — Function
mGLM(Y::Array{Float64,2},X::Array{Float64,2},Z::Array{Float64,2},reml::Bool=false)
mGLM(Y::Array{Float64,2},X::Array{Float64,2},reml::Bool=false)Fitting multivariate General Linear Models via MLE (or REML) and returns a type of a struct Estimat.
Arguments
Y: A matrix of response variables, i.e. traits. size(Y)=(n,m) for n individuals x m traitsX: A matrix of independent variables, i.e. genotypes or genotype probabilities including intercept or/and covariates. size(X)=(n,p) for n individuals x p markers including intercept or/and covariatesZ: An optional matrix of low-dimensional phenotypic or trait covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). size(Z)=(m,q) for m traits x q phenotypic covariates. If the data does not assume any particular trait relation, just useZ = Matrix(1.0I,m,m)or the second functon.reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented iftrue.
Output
Returns Estimat .
Utility Functions (Util)
FlxQTL.Util — Module
UtilA module for utility functions.
FlxQTL.Util.Markers — Type
Markers(name::Array{String,1},chr::Array{Any,1},pos::Array{Float64,1},X::Array{Float64,2})A struct of arrays creating genotype or genotype probability data for genome scan.
Arguments
name: A vector of marker nameschr: A vector of Chromosomespos: A vector of marker positions (cM)X: A matrix of genotypes or genotype probabilities
FlxQTL.Util.Y_huber — Method
Y_huber(Y::Array{Float64,2})Rescale Y (phenotype or trait data) to be less sensitive to outliers using by Huber loss function and MAD (median absolute deviation). size(Y)=(m,n) for m trait and n individuals.
FlxQTL.Util.array2mat — Method
array2mat(cross::Int64,X0::Array{Float64,3})Returns a 3-d array to a matrix of genotype probabilities. size(X0)=(cross,n,p) –> (p1,n), where p1 = cross*p for p markers, cross alleles or genotypes, and n individuals. See mat2array.
FlxQTL.Util.getFinoidx — Method
getFinoidx(phenoData::Union{Array{Union{Missing,Float64},2},Matrix{Float64}})Attains indices of phenotype data without missing values.
Argument
phenoData: A matrix of phenotype (or trait) data including missing values. size(phenoData) = (m,n) formtraits andnindividuals.
FlxQTL.Util.getGenoidx — Function
getGenoidx(GenoData::Union{Array{Any,2},Array{Float64,2}},maf::Float64=0.025)Attains genotype indices to drop correlated or bad markers.
Arguments
GenoData: A matrix of genotype data. size(GenoData)= (p,n) forpmarkers andnindividuals.maf: A scalar for dropping criterion of markers. Default is0.025i.e. markers of MAF < 0.025 are dropped.
FlxQTL.Util.lod2logP — Method
lod2logP(LODs::Union{Array{Float64,1},Array{Any,1}},v::Int64)Caculates $-\log_{10}{P}$ from LOD scores.
Arguments
LODs: A vector of LOD scores computed from genome scan.v: Degrees of freedom for Chi-squared distribution.
!!! NOTE
- To compute degrees of freedom for 1D-genome scan (
geneScan),v = prod(size(B1))-prod(size(B0), whereB1andB0are effect size matrices under H1 of a full model (intercept + covariates + QTL) and H0 of no QTL (intercept + covariates), respectively.
FlxQTL.Util.mat2array — Method
mat2array(cross::Int64,X0)Returns a matrix of genotype probabilities to 3-d array. size(X0)=(p1,n) –> (cross,n,p), where p1 = cross*p for p markers, cross alleles or genotypes, and n individuals.
Argument
cross: An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.X0: A matrix of genotype probability data computed from r/qtl or r/qtl2.
See array2mat.
FlxQTL.Util.mat2vec — Method
mat2vec(mat)Stacks a matrix to a vector, i.e. vectorizing a matrix.
FlxQTL.Util.newMarkers — Function
newMarkers(XX::Markers,cross::Int64,cM::Int64=2)Returns a struct of Markers by keeping only markers positioned in every cM centimorgans for any genome scan to avoid singularity.
Arguments
XX: A type ofMarkers. SeeMarkers.cross: An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.cM: An integer of dropping criterion of markers. Default is 2, i.e. keeping only markers in every 2 cM, or dropping markers within 2cM between 2 markers.
FlxQTL.Util.ordrMarkers — Method
ordrMarkers(markers::Array{Any,2})Rearrange by CPU parallelization marker information composed of marker name, chr, position obtained from rqtl2, which is not listed in order (excluding X chromosome).
Argument
markers: An array of marker information.
FlxQTL.Util.setSeed — Method
setSeed(seed::Integer))Assigns different seed numbers to generated worker(s) in a deterministic and reproducibile way.
Arguments
seed: An integer.
Examples
julia> using Distributed, Random
julia> addprocs(10)
julia> @everywhere using FlxQTL
julia> setSeed(123)FlxQTL.Util.sortBycM — Function
sortBycM(Chr::Any,XX::Markers,cross::Int64,cM::Int64=2)Returns marker indices in Chromosome Chr and the corresponding genotype probabilities keeping only markers positioned in every cM centimorgans for any genome scan to avoid singularity.
Arguments
Chr: A type of Any indicating a particular chromosome to sort markers out.XX: A type ofMarkers.cross: An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.cM: An integer of dropping criterion of markers. Default is 2, i.e. keeping only markers in every 2 cM, or dropping markers within 2cM between 2 markers.
See also newMarkers