Functions and Types

Flexible Multivariate Linear Mixed Models (flxMLMM)

FlxQTL.FlxQTLModule
FlxQTL

Flexible QTL analysis tools for structured multiple traits fitting a Multivariate Linear Mixed Model or a Multivariate Linear Model.

source
FlxQTL.flxMLMMModule
flxMLMM

A module designed for fitting a Multivariate Linear Mixed Model run by the Nesterov's Accelerated Gradient with restarting scheme embedding the Expectation Conditional Maximization to estimate MLEs. REML is not supported. The FlxQTL model is defined as

\[vec(Y)\sim MVN((X' \otimes Z)vec(B) (or ZBX), K \otimes \Omega +I \otimes \Sigma),\]

where K is a genetic kinship, and $\Omega \approx \tau^2V_C$, $\Sigma$ are covariance matrices for random and error terms, respectively. $V_C$ is pre-estimated under the null model (H0) of no QTL from the conventional MLMM, which is equivalent to the FlxQTL model for $\tau^2 =1$. $Z \neq I_m$ estimates much smaller B than the former model with Z = I, where dim(Y) = (m traits, n individuals), and dim(X) = (p markers, n), dim(Z) = (m, q trait covariates).

source
FlxQTL.flxMLMM.K2eigFunction
   K2eig(K,LOCO::Bool=false)

Returns eigenvectors and eigenvalues of genetic relatedness, or 3-d array of these of a genetic relatedness if LOCO is true.

Arguments

  • K : A matrix of genetic relatedness (Default). 3-d array of genetic relatedness (LOCO sets to be true.)
  • LOCO : Boolean. Default is false (no LOCO). (Leave One Chromosome Out).

Output

  • T : A matrix of eigenvectors, or a 3-d array of eigenvectors if LOCO sets to be true.
  • λ : A vector of eigenvalues, or a matrix of eigenvalues if LOCO sets to be true.

Examples

For a null variance component, or genetic relatedness for LOCO =false,

 T, λ = K2eig(K)

produces a matrix of T and a vector of λ.

For a genetic kinship calculated under LOCO (a 3-d array of kinship matrices),

 T, λ = K2eig(K,true)

produces a 3-d array of matrices T and a matrix of λ.

source
FlxQTL.flxMLMM.gene1ScanFunction
 gene1Scan(cross::Int64,Tg::Union{Array{Float64,3},Matrix{Float64}},Λg::Union{Matrix{Float64},Vector{Float64}},
           Y::Array{Float64,2},XX::Markers,LOCO::Bool=false;penalize::Bool=false,Z=diagm(ones(m)),H0_up::Bool=false,
           Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*3,
           LogP::Bool=false,itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
 gene1Scan(Tg,Λg,Y::Array{Float64,2},XX::Markers,cross::Int64,LOCO::Bool=false;penalize::Bool=false,Xnul::Array{Float64,2}=ones(1,size(Y,2)),
            df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*3,LogP::Bool=false,
            itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)

Implement 1d-genome scan with/without LOCO (Leave One Chromosome Out) fitted by FlxQTL models with/without Z and with the option of penalization that preestimate a null variance component matrix ($V_C$) under H0 : no QTL, followed by its adjustment by a scalar parameter under H1 : existing QTL. The second type gene1Scan() is fitted by the conventional MLMM that estimate all parameters under H0/H1 with the option of penalization. The FlxQTL model is defined as

\[vec(Y)\sim MVN((X' \otimes Z)vec(B) (or ZBX), Kg \otimes \Omega +I \otimes \Sigma),\]

where Kg is a genetic kinship, and $\Omega \approx \tau^2V_C$, $\Sigma$ are covariance matrices for random and error terms, respectively. $V_C$ is pre-estimated under the null model (H0) of no QTL from the conventional MLMM, which is equivalent to the FlxQTL model for $\tau^2 =1$.

Arguments

  • cross : An integer (Int64) indicating the occurrence of combination of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degrees of freedom for the effect size of the genetic marker when doing genome scan.
  • Tg : A n x n matrix of eigenvectors, or a 3d-array of eigenvectors if LOCO is true. See also K2eig.
  • Λg : A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues if LOCO is true.
  • Y : A m x n matrix of response variables, i.e. m traits by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y0[1,:] (a vector) ->Y[[1],:] (a matrix) .
  • XX : A type of Markers.
  • Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (wavelet, polynomials, B-splines, etc.). If no assumption among traits, insert an identity matrix, Matrix(1.0I,m,m), or use the second geneScan().
  • LOCO : Boolean. Default is false (no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out) if true.

Keyword Arguments

  • penalize : Boolean. Default is false (no penalization). For higher dimensional traits, i.e. large m=size(Y,1), penalization is recommended, i.e. set penalize=true for numerical stability with adjustment of df_prior or/and Prior if necessary.
  • Xnul : A matrix of covariates. Default is intercepts (1's): Xnul= ones(1,size(Y,2)). Adding covariates (C) is Xnul= vcat(ones(1,n),C) where size(C)=(c,n).
  • Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.
  • df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon. m+1 (weakly informative) is default.
  • H0_up : Default returns null estimates, est0 from the conventional MLMM. It is recommended setting H0_up=true for higher dimensional traits, e.g. $m \ge 18 or more$ depending on the data, to avoid negative LODs.
  • itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
  • tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
  • tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
  • ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.
  • LogP : Boolean. Default is false. Returns $-\log_{10}{P}$ instead of LOD scores if true.

!!! Note

  • If some LOD scores return negative values under penalization or no penalization, then you may reduce tolerences for ECM to e.g., tol0 = 1e-4 (no penalization), set H0_up=true, or (and) switch to penalization (penalize=true) followed by adjusting df_prior, such that $m+1 \le$ df_prior $< 2m$ to avoid singularity errors. The last resort could be df_prior = Int64(ceil(1.9m)) when any of them would not work. Adjusting df_prior is more effective than doing Prior; we do recommend this adjustment for higher dimensional traits with genotype probability data, depending on the data–we have tested no penalization option witout error up to $m = 16$ with genotype probabilities or m = 30 with genotypes. Lower dimensional traits with penalization slow performance.

Output

  • result : A vector of LOD scores,LODs as default or $- \log_{10}{P}$ by lod2logP if LogP = true.
  • B : A 3-d array of B (fixed effects) matrices under H1: existence of QTL. If sex covariates, e.g. size(C)=(1,n), are added to Xnul : Xnul= [ones(1,size(Y,2)); C] in 4-way cross analysis, B[:,2,100], B[:,3:5,100] are effects for sex, the rest genotypes of the 100th QTL, respectively.
  • H0est : A type of EcmNestrv.Result including parameter estimates under H0: no QTL for both functions.
source
FlxQTL.flxMLMM.gene2ScanFunction
gene2Scan(cross::Int64,Tg::Union{Array{Float64,3},Matrix{Float64}},Λg::Union{Matrix{Float64},Vector{Float64}},
           Y::Array{Float64,2},XX::Markers,LOCO::Bool=false,penalize::Bool=false;m=size(Y,1),Z=diagm(ones(m)),
           Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*3,
           itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
gene2Scan(Tg,Λg,Y,XX,cross,LOCO,penalize;Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,
              Prior::Matrix{Float64}=cov(Y,dims=2)*3,itol=1e-4,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)

Implement 2d-genome scan with/without LOCO (Leave One Chromosome Out). The first function is a FlxQTL model with/without Z and with the option of penalization that preestimate a null variance component matrix ($V_C$) under H0 : no QTL, followed by its adjustment by a scalar parameter under H1 : existing QTL. The second type of gene2Scan() is fitted by the conventional MLMM that estimate all parameters under H0/H1 with the option of penalization. The FlxQTL model is defined as

\[vec(Y)\sim MVN((X' \otimes Z)vec(B) (or ZBX), Kg \otimes \Omega +I \otimes \Sigma),\]

where Kg is a genetic kinship, and $\Omega \approx \tau^2V_C$, $\Sigma$ are covariance matrices for random and error terms, respectively. $V_C$ is pre-estimated under the null model (H0) of no QTL from the conventional MLMM, which is equivalent to the FlxQTL model for $\tau^2 =1$.

Arguments

  • cross : An integer (Int64) indicating the occurrence of combination of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degrees of freedom for the effect size of the genetic marker when doing genome scan.
  • Tg : A n x n matrix of eigenvectors, or a 3d-array of eigenvectors if LOCO is true. See also K2eig.
  • Λg : A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues if LOCO is true.
  • Y : A m x n matrix of response variables, i.e. m traits by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y0[1,:] (a vector) ->Y[[1],:] (a matrix) .
  • XX : A type of Markers.
  • Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (wavelet, polynomials, B-splines, etc.). If no assumption among traits, insert an identity matrix, Matrix(1.0I,m,m), or use the second geneScan().
  • LOCO : Boolean. Default is false (no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out) if true.

Keyword Arguments

  • penalize : Boolean. Default is false (no prior used for penalization). For higher dimensional traits, i.e. large m=size(Y,1), penalization is recommended; set penalize=true with adjustment of df_prior or/and Prior if necessary.
  • Xnul : A matrix of covariates. Default is intercepts (1's): Xnul= ones(1,size(Y,2)). Adding covariates (C) is Xnul= vcat(ones(1,n),C) where size(C)=(c,n).
  • Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.
  • df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon. m+1 (weakly informative) is default.
  • itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
  • tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
  • tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
  • ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.

!!! Note

  • If some LOD scores return negative values under penalization or no penalization, you may reduce tolerences for ECM to e.g., tol0 = 1e-4 (no penalization), or switch to penalization (penalize=true), follwed by adjusting df_prior, such that $m+1 \le$ df_prior $< 2m$ to avoid singluarity errors. The last resort could be df_prior = Int64(ceil(1.9m)) unless any of them would work. Adjusting df_prior is more effective than doing Prior; we do recommend this adjustment for higher dimensional traits with genotype probability data, depending on the data–we have tested no penalization option witout error up to $m = 16$ with genotype probabilities or m = 30 with genotypes. Lower dimensional traits with penalization slow performance.

Output

  • LODs : A matrix of LOD scores. Can change to $- \log_{10}{P}$ using lod2logP.
  • H0est : A type of EcmNestrv.Result including parameter estimates under H0: no QTL.
source
FlxQTL.flxMLMM.mlmmTestFunction
 mlmmTest(nperm::Int64,cross::Int64,Kg,Y,XX::Markers;pval=[0.05 0.01],df_prior=m+1,
             Prior::Matrix{Float64}=cov(Y,dims=2)*3,Xnul=ones(1,size(Y,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)

Implement permutation test without LOCO to get thresholds at the levels of type 1 error, α by the conventional MLMM (Z=I) with an option of penalization. See also permutationTest.

Arguments

  • nperm : An integer indicating the number of permutation to be implemented.
  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
  • Kg : A n x n genetic kinship matrix. Should be symmetric positive definite.
  • Y : A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y[1,:] (a vector) -> Y[[1],:] (a matrix) .
  • XX : A type of Markers.
  • penalize : Boolean. Default is false (no penalization). For higher dimensional traits, i.e. large m=size(Y,1), penalization is recommended; set penalize=true with adjustment of df_prior or/and Prior if necessary.

Keyword Arguments

  • pval : A vector of p-values to get their quantiles. Default is [0.05 0.01] (without comma).
  • Xnul : A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.
  • Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.
  • df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon. m+1 (weakly informative) is default.
  • itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-4.
  • tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
  • tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
  • ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.

!!! Note

  • When some LOD scores return negative values, you may reduce tolerence for ECM (tol0) to run longer, or increase df_prior, where $m+1 \le$ df_prior $< 2m$ to avoid singularity errors. The last resort could be df_prior = Int64(ceil(1.9m)) unless any of them would works.

Output

  • maxLODs : A nperm x 1 vector of maximal LOD scores by permutation.
  • H1par_perm : A type of struct, EcmNestrv.Approx(B,τ2,Σ,loglik) including parameter estimates or EcmNestrv.Result(B,Vc,Σ,loglik) for a conventional MLMM under H0: no QTL by permutation.
  • cutoff : A vector of thresholds corresponding to pval.
source
FlxQTL.flxMLMM.permutationTestFunction
 permutationTest(nperm::Int64,cross::Int64,Kg::Array{Float64,3},Y::Array{Float64,2},XX::Markers,LOCO::Bool=false,penalize::Bool=false;pval=[0.05 0.01],
                 Z=diagm(ones(m)),df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*3,Xnul=ones(1,size(Y,2)),
                 itol=1e-4,tol0=1e-4,tol=1e-4,ρ=0.001)

 permTest(nperm,cross,Kg::Array{Float64,3},Y,XX;pval=[0.05 0.01],Z=diagm(ones(m)),df_prior=m+1,Prior=cov(Y,dims=2)*3,Xnul=ones(1,size(Y,2)),
          itol=1e-4,tol0=1e-4,tol=1e-4,ρ=0.001,δ=0.01)

Implement permutation test by estimating the distribution of maximum LOD scores to get thresholds at the levels of type 1 error, α. The FlxQTL model is defined as

\[vec(Y)\sim MVN((X' \otimes Z)vec(B) (or ZBX), Kg \otimes \Omega +I \otimes \Sigma),\]

where Kg is a genetic kinship, and $\Omega \approx \tau^2V_C$, $\Sigma$ are covariance matrices for random and error terms, respectively. $V_C$ is estimated under the null model (H0) of no QTL from the conventional MLMM, which is equivalent to the FlxQTL model for $\tau^2 =1$.

!!! NOTE

  • permutationTest() has options of penalization as well as LOCO. Depending on the data size and computer capacity, one can select any combination. There may exist slight to moderate differences in the choice of LOCO and penalization.
  • permTest() is implemented by permutation test with LOCO, but $\Omega$ is estimated with a genetic kinship by kinshipLin without LOCO.
  • mlmmTest is implemented by the conventional MLMM without LOCO.

Arguments

  • nperm : An integer (Int64) indicating the number of permutation to be implemented.
  • cross : An integer (Int64) indicating the number of combination of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degrees of freedom for the effect size of a genetic marker when doing genome scan.
  • Kg : A 3d-array of n x n genetic kinship matrices (LOCO = true or only for permTest()) or a matrix of genetic kinship for the default option of LOCO = false. Should be symmetric positive definite.
  • Y : A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y[1,:] (a vector) -> Y[[1],:] (a matrix) .
  • XX : A type of Markers.
  • LOCO : Boolean. Default is false (no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out) if true.
  • penalize : Boolean. Default is false (no penalization). For higher dimensional traits, i.e. large m=size(Y,1), penalization is recommended, i.e. set penalize=true for numerical stability with adjustment of df_prior or/and Prior if necessary.

Keyword Arguments

  • pval : A vector of p-values to get their quantiles. Default is [0.05 0.01] (without comma).
  • Xnul : A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.
  • Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just use Z = diagm(ones(m)) or Matrix(1.0I,m,m) (default).
  • Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.
  • df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon. m+1 (weakly informative) is default.
  • itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
  • tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
  • tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
  • ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.
  • δ : A tuning parameter in permTest() to correct a non-positive definite kinship without LOCO to pre-estimate a null variance component. This no-LOCO kinship is computed inside the function for efficient computation.

!!! Note

  • When some LOD scores return negative values, you may reduce tolerences for ECM to tol0 = 1e-4, or increase df_prior, where $m+1 \le$ df_prior $< 2m$. The last resort could be df_prior = Int64(ceil(1.9m)) to avoid sigularity errors unless any of them works. Adjusting df_prior works better than doing 'Prior; we do recommend this adjustment for higher dimensional traits with genotype probability data, depending on the data–we have tested no penalization option witout error up to $m = 16$ with genotype probabilities or m = 30 with genotypes. Lower dimensional traits with penalization slow performance.
  • This LOCO version of permutation test is desirable to be implemented by high-performance computers.

Output

  • maxLODs : A nperm x 1 vector of maximal LOD scores by permutation.
  • H1par_perm : A vector of structs, EcmNestrv.Approx(B,τ2,Σ,loglik) for each Chromosome per permutation, i.e. # of Chromosomes x nperm.
  • cutoff : A vector of thresholds corresponding to pval.
source
FlxQTL.EcmNestrvModule
EcmNestrv

A module for base algorithms using the ECM (Expectation-Conditional Maxization) with the Speed restarting Nesterov's accelerated gradient method to fit a flexible multivariate linear mixed model (flxMLMM).

source

Genetic Relatedness Matrices (GRM)

FlxQTL.GRMModule
 GRM

A module for computing Genetic Relatedness Matrix (or kinship).

source
FlxQTL.GRM.kinship4wayMethod
 kinship4way(genmat::Array{Float64,2})

Computes a kinship for four-way cross data counting the occurrence of different alleles between two markers: ex. AB-AB=0; AB-AC=1; AB-CD=2,$\dots$ Note: In R/qtl, genotypes are labeled as 1=AC; 2=BC; 3=AD; 4=BD by the function, read.cross().

Argument

  • genmat : A matrix of genotypes for four-way cross $(1,2, \dots)$. size(genematrix)= (p,n), for p genetic markers x n individuals(or lines).

Output

Returns a n x n symmetric matrix containing 1's on the diagonal.

source
FlxQTL.GRM.kinshipCtrMethod
 kinshipCtr(genmat::Array{Float64,2})

Calculates a kinship with a centered genotype matrix (linear kernel), where a marker mean is subtracted from its genotypes.

Argument

  • genmat : A matrix of genotype data (0,1,2). size(genmat)=(p,n) for p markers x n individuals

Output

Returns a n x n symmetric matrix. See also kinshipStd.

source
FlxQTL.GRM.kinshipGKMethod
 kinshipGK(G::Array{Float64,2},ρ::Float64)

Computes a non-linear kinship matrix using a Gaussian Kernel.

Arguments

  • G : A matrix of genotype. size(G)=(r,c), such that r genotype markers and c individuals (or lines).
  • ρ : A free parameter determining the width of the kernel. Could be attained empirically.

Output

Returns a c x c symmetric (positive definite) matrix containing 1's on the diagonal.

source
FlxQTL.GRM.kinshipLinMethod
kinshipLin(mat,cross)

Calculates a kinship matrix by linear kernel for genotype (or allele) probabilities. This is a general form of kinshipCtr, kinshipStd for genetype data, i.e. XX'.

Arguments

  • mat : A matrix of genotype (or allele) probabilities usually extracted from R/qtl, R/qtl2, or the counterpart packages. size(mat)= (p,n) for p genetic markers x n individuals.
  • cross (>1) : An integer indicating the occurrence or combination of alleles or genotypes in a genetic marker. ex. 1 for genotypes (labeled as 0,1,2), 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.

Output

Returns a n x n symmetric (positive definite) matrix containing 1's on the diagonal.

source
FlxQTL.GRM.kinshipLocoFunction
 kinshipLoco(kin,g::Markers,cross::Int64=1,ρ=0.01)

Generates a 3-d array of symmetric positive definite kinship matrices using LOCO (Leave One Chromosome Out) witout shrinkage intensity estimation. This is designed for linear kernel based kinship functions. See Arguments.

Arguments

  • kin : A function of computing a kinship. Can only use with kinshipCtr, kinshipStd for genotypes, and with kinshipLin for genotype (or allele) probabilities.
  • g : A struct of arrays, type Markers.
  • cross : A scalar indicating the occurrence or combination of alleles or genotypes in a genetic marker. Simply, it coincides with the number of columns per marker in genotype (or allele) probability data; for instance, 1 for genotypes (0,1,2) as default, 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
  • ρ : A real number to correct a non-positive definite kinship. Default is 0.01. When a kinship is not corrected to be positive definite with the default value, slight increase of ρ can adjust to minuscule negative eigenvalues.

Output

Returns a 3-d array of n x n symmetric positive definite matrices as many as Chromosomes. Refer to shrinkgLoco for comparison.

source
FlxQTL.GRM.kinshipManMethod
  kinshipMan(genematrix::Array{Float64,2})

Calculates a kinship matrix using a manhattan distance method. Missing values need to be either omitted or imputed. This function is designed for recombinant inbred line (RIL) (AA/BB), not for 4-way cross genotype data. See kinship4way.

Argument

  • genematrix : A matrix of genotypes, i.e. 0,1 (or 1,2). size(genematrix)= (p,n) for p genetic markers x n individuals(or lines).

Output

Returns a n x n symmetric matrix containing 1's on the diagonal.

source
FlxQTL.GRM.kinshipStdMethod
 kinshipStd(genmat::Array{Float64,2})

Produces a kinship with a standardized (or normalized) genotype matrix (linear kernel), i.e., centered genetypes divided by the standard deviation of a marker.

Argument

  • genmat : A matrix of genotype data (0,1,2). size(genmat)=(p,n) for p markers x n individuals

Output

Returns a n x n symmetric matrix. See also kinshipCtr.

source
FlxQTL.GRM.shrinkgMethod
 shrinkg(f,nb::Int64,geno)

Estimates a full-rank positive definite kinship matrix by shrinkage intensity estimation (bootstrap). Can only use with kinshipMan, kinship4way. This function runs fast by CPU parallelization. Add workers/processes using an addprocs() function before running it for speedup.

Arguments

  • f: A function of computing a kinship. Can only use with kinshipMan, kinship4way.
  • nb : An integer indicating the number of bootstrap. It does not have to be a large number.
  • geno : A matrix of genotypes. size(geno) = (p,n) for p markers x n individuals.

Example

julia> using FlxQTL
julia> addprocs(8)
julia> K = shinkage(kinshipMan,20,myGeno)

Output

Returns a full-rank symmetric positive definite matrix.

source
FlxQTL.GRM.shrinkgLocoMethod
   shrinkgLoco(kin,nb,g::Markers)

Generates a 3-d array of full-rank positive definite kinship matrices by shrinkage intensity estimation (bootstrap) using a LOCO (Leave One Chromosome Out) scheme. Can only use with kinshipMan, kinship4way.

Arguments

  • kin : A function of computing a kinship. Can only use with kinshipMan, kinship4way
  • nb : An integer indicating the number of bootstrap.
  • g : A struct of arrays, type Markers.

Output

Returns 3-d array of n x n symmetric positive definite matrices as many as Chromosomes.

source

Flexible Multivariate Linear Models (flxMLM)

FlxQTL.flxMLMModule
flxMLM

A module designed for fitting a Multivariate Linear Model by the (Residual) Maximum Likelihood (REML or MLE) method. The model:

$Y=XBZ'+E$,

where $E(vec(Y))= (Z \otimes X)vec(B)$, $Var(vec(Y))= \Sigma \otimes I_n$,

dim(Y)= (n individuals, m traits), dim(X) = (n,p markers), and dim(Z) = (m, q trait covariates).

source
FlxQTL.flxMLM.mlm1ScanFunction
mlm1Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;LogP::Bool=false,
          Xnul::Matrix{Float64}=ones(size(Y,1),1))
mlm1Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;LogP::Bool=false,
          Xnul::Matrix{Float64}=ones(size(Y,1),1))

Implement 1d-genome scan. The second mlm1Scan() is for Z=I case; one can also run the first by inserting an identity matrix (Matrix(1.0I,m,m)) into Z.

$vec(Y) \sim MVN ((Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I_n),$

where size(Y)=(n,m), size(X)=(n,p), and size(Z)=(m,q).

Arguments

  • cross : An integer indicating the occurrence of alleles or genotypes. ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
  • Y : A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in the arguement. i.e. Y0[:,1] (a vector) ->Y[:,[1]] (a matrix) .
  • XX : A type of Markers. Be cautious when combining genotype infomation into the struct of Markers; size(X) = (n,p).
  • Z : An optional m x q matrix of low-dimensional phenotypic (or trait) covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just use Z = I.
  • reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented if true.

Keyword Arguments

  • Xnul : A matrix of covariates. Default is intercepts (1's): Xnul= ones(size(Y,1),1). Adding covariates (C) is Xnul= hcat(ones(n),C) where size(C)=(c,n) for n = size(Y,1).
  • LogP : Boolean. Default is false. Returns $-\log_{10}{P}$ instead of LOD scores if true.

Output

  • LODs (or logP) : LOD scores. Can change to $- \log_{10}{P}$ in lod2logP if LogP = true.
  • B : A 3-d array of B (fixed effects) matrices under H1: existence of QTL. If covariates are added to Xnul : Xnul= [ones(size(Y,1)) Covariates], ex. For sex covariates in 4-way cross analysis, B[2,:,100], B[3:5,:,100] are effects for sex, the rest genotypes of the 100th QTL, respectively.
  • est0 : A type of MLM.Estimat including parameter estimates under H0: no QTL.
source
FlxQTL.flxMLM.mlm2ScanFunction
 mlm2Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;Xnul::Matrix{Float64}=ones(size(Y,1),1))
 mlm2Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;Xnul::Matrix{Float64}=ones(size(Y,1),1))

Implement 2d-genome scan. The second mlm2Scan() is for Z=I case; one can also run the first by inserting an identity matrix (Matrix(1.0I,m,m)) into Z.

$vec(Y) \sim MVN ((Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I_n),$

where size(Y)=(n,m), size(X)=(n,p), and size(Z)=(m,q).

Arguments

  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
  • Y : A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in arguement. i.e. Y0[:,1] (a vector) ->Y[:,[1]] (a matrix) .
  • XX : A type of Markers. Be cautious when combining genotype infomation into the struct of Markers; size(X) = (n,p).
  • Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just use Z = I.
  • reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented if true.

Keyword Arguments

  • Xnul : A matrix of covariates. Default is intercepts (1's). Unless adding covariates, just leave as it is. See mlm1Scan.

Output

  • LODs : LOD scores. Can change to $- \log_{10}{P}$ using lod2logP.
  • est0 : A type of MLM.Estimat including parameter estimates under H0: no QTL.
source
FlxQTL.flxMLM.mlmTestFunction
 mlmTest(nperm::Int64,cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;
          Xnul::Matrix{Float64}=ones(size(Y,1),1),pval=[0.05 0.01])
 mlmTest(nperm::Int64,cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;
          Xnul::Matrix{Float64}=ones(size(Y,1),1),pval=[0.05 0.01])

Implement permutation test to get thresholds at the levels of type 1 error, α. The second mlmTest() is for Z=I case; one can also run the first by inserting an identity matrix (Matrix(1.0I,m,m)) into Z.

$vec(Y) \sim MVN ((Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I_n),$

where size(Y)=(n,m), size(X)=(n,p), and size(Z)=(m,q).

Arguments

  • nperm : An integer indicating the number of permutation to be implemented.
  • cross : An integer indicating the incidence of alleles or genotypes per marker. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
  • Y : A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in arguement. i.e. Y[:,1] (a vector) -> Y[:,[1]] (a matrix) .
  • XX : A type of Markers. Be cautious when combining genotype infomation into the struct of Markers; size(X) = (n,p).
  • reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented if true.

Keyword Arguments

  • pval : A vector of p-values to get their quantiles. Default is [0.05 0.01] (without comma).
  • Xnul : A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.
  • Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just use Z = I.

Output

  • maxLODs : A nperm x 1 vector of maximal LOD scores by permutation.

  • H1par_perm : A type of struct, MLM.Estimat(B,Σ,loglik) including parameter estimates for a MLM under H0: no QTL by permutation.

  • cutoff : A vector of thresholds corresponding to pval.

source
FlxQTL.MLMModule
  MLM

A module for fitting general Multivariate Linear Models motivated by functional data analysis via mle or reml. The default fitting method is mle. ( i.e. reml=false)

The model: $Y=XBZ'+E$,

where $E(Y)=XBZ'$ (or $E(vec(Y))= (Z \otimes X)vec(B)$ ),

$var(vec(E))=\Sigma \otimes I,$

and size(Y)=(n,m), size(X)=(n,p), and size(Z)=(m,q).

source
FlxQTL.MLM.EstimatType
 Estimat(B::Array{Float64,2},Σ::Array{Float64,2},loglik::Float64)

A struct of arrays for results by fitting a multivariate linear model, mGLM(). It includes B(fixed effects), Σ (m x m covariance matrix), loglik(a value of log-likelihood by mle or reml).

source
FlxQTL.MLM.mGLMFunction
  mGLM(Y::Array{Float64,2},X::Array{Float64,2},Z::Array{Float64,2},reml::Bool=false)
  mGLM(Y::Array{Float64,2},X::Array{Float64,2},reml::Bool=false)

Fitting multivariate General Linear Models via MLE (or REML) and returns a type of a struct Estimat.

Arguments

  • Y : A matrix of response variables, i.e. traits. size(Y)=(n,m) for n individuals x m traits
  • X : A matrix of independent variables, i.e. genotypes or genotype probabilities including intercept or/and covariates. size(X)=(n,p) for n individuals x p markers including intercept or/and covariates
  • Z : An optional matrix of low-dimensional phenotypic or trait covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). size(Z)=(m,q) for m traits x q phenotypic covariates. If the data does not assume any particular trait relation, just use Z = Matrix(1.0I,m,m) or the second functon.
  • reml : Boolean. Default is fitting the model via mle. Resitricted MLE is implemented if true.

Output

Returns Estimat .

source

Utility Functions (Util)

FlxQTL.Util.MarkersType
Markers(name::Array{String,1},chr::Array{Any,1},pos::Array{Float64,1},X::Array{Float64,2})

A struct of arrays creating genotype or genotype probability data for genome scan.

Arguments

  • name : A vector of marker names
  • chr : A vector of Chromosomes
  • pos : A vector of marker positions (cM)
  • X : A matrix of genotypes or genotype probabilities
source
FlxQTL.Util.Y_huberMethod
Y_huber(Y::Array{Float64,2})

Rescale Y (phenotype or trait data) to be less sensitive to outliers using by Huber loss function and MAD (median absolute deviation). size(Y)=(m,n) for m trait and n individuals.

source
FlxQTL.Util.array2matMethod
 array2mat(cross::Int64,X0::Array{Float64,3})

Returns a 3-d array to a matrix of genotype probabilities. size(X0)=(cross,n,p) –> (p1,n), where p1 = cross*p for p markers, cross alleles or genotypes, and n individuals. See mat2array.

source
FlxQTL.Util.getFinoidxMethod
getFinoidx(phenoData::Union{Array{Union{Missing,Float64},2},Matrix{Float64}})

Attains indices of phenotype data without missing values.

Argument

  • phenoData : A matrix of phenotype (or trait) data including missing values. size(phenoData) = (m,n) for m traits and n individuals.
source
FlxQTL.Util.getGenoidxFunction
getGenoidx(GenoData::Union{Array{Any,2},Array{Float64,2}},maf::Float64=0.025)

Attains genotype indices to drop correlated or bad markers.

Arguments

  • GenoData : A matrix of genotype data. size(GenoData)= (p,n) for p markers and n individuals.
  • maf : A scalar for dropping criterion of markers. Default is 0.025 i.e. markers of MAF < 0.025 are dropped.
source
FlxQTL.Util.lod2logPMethod
lod2logP(LODs::Union{Array{Float64,1},Array{Any,1}},v::Int64)

Caculates $-\log_{10}{P}$ from LOD scores.

Arguments

  • LODs : A vector of LOD scores computed from genome scan.
  • v : Degrees of freedom for Chi-squared distribution.

!!! NOTE

  • To compute degrees of freedom for 1D-genome scan (geneScan), v = prod(size(B1))-prod(size(B0), where B1 and B0 are effect size matrices under H1 of a full model (intercept + covariates + QTL) and H0 of no QTL (intercept + covariates), respectively.
source
FlxQTL.Util.mat2arrayMethod
 mat2array(cross::Int64,X0)

Returns a matrix of genotype probabilities to 3-d array. size(X0)=(p1,n) –> (cross,n,p), where p1 = cross*p for p markers, cross alleles or genotypes, and n individuals.

Argument

  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
  • X0 : A matrix of genotype probability data computed from r/qtl or r/qtl2.

See array2mat.

source
FlxQTL.Util.newMarkersFunction
newMarkers(XX::Markers,cross::Int64,cM::Int64=2)

Returns a struct of Markers by keeping only markers positioned in every cM centimorgans for any genome scan to avoid singularity.

Arguments

  • XX : A type of Markers. See Markers.
  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
  • cM : An integer of dropping criterion of markers. Default is 2, i.e. keeping only markers in every 2 cM, or dropping markers within 2cM between 2 markers.
source
FlxQTL.Util.ordrMarkersMethod
ordrMarkers(markers::Array{Any,2})

Rearrange by CPU parallelization marker information composed of marker name, chr, position obtained from rqtl2, which is not listed in order (excluding X chromosome).

Argument

  • markers : An array of marker information.
source
FlxQTL.Util.setSeedMethod
setSeed(seed::Integer))

Assigns different seed numbers to generated worker(s) in a deterministic and reproducibile way.

Arguments

  • seed: An integer.

Examples

julia> using Distributed, Random
julia> addprocs(10)
julia> @everywhere using FlxQTL
julia> setSeed(123)
source
FlxQTL.Util.sortBycMFunction
    sortBycM(Chr::Any,XX::Markers,cross::Int64,cM::Int64=2)

Returns marker indices in Chromosome Chr and the corresponding genotype probabilities keeping only markers positioned in every cM centimorgans for any genome scan to avoid singularity.

Arguments

  • Chr : A type of Any indicating a particular chromosome to sort markers out.
  • XX : A type of Markers.
  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
  • cM : An integer of dropping criterion of markers. Default is 2, i.e. keeping only markers in every 2 cM, or dropping markers within 2cM between 2 markers.

See also newMarkers

source