Functions and Types

Flexible Multivariate Linear Mixed Models (flxMLMM)

FlxQTL.FlxQTLModule
FlxQTL

Flexible QTL analysis tools for structured multiple traits fitting a Multivariate Linear Mixed Model or a Multivariate Linear Model.

source
FlxQTL.flxMLMMModule
flxMLMM

A module designed for fitting a Multivariate Linear Mixed Model run by the Nesterov's Accelerated Gradient with restarting scheme embedding the Expectation Conditional Maximization to estimate MLEs. REML is not supported.

The general form of Multivariate Linear Mixed model is

\[vec(Y) \sim MVN((X' \otimes Z)vec(B) (or ZBX), K \otimes \V_C +I \otimes \Sigma),\]

where $Z = I_m$, K is a genetic kinship, and $V_C, \Sigma$ are variance component and error matrices, respectively.

The FlxQTL model (flxMLMM) estimates a scalar parameter $\tau^2$ under H1 to efficiently estimate the high dimensional variance component, i.e. $\Omega \approx \tau^2 V_C$ as well as $Z \neq I_m$ to estimate much smaller B than the former model with Z = I. dim(Y) = (m traits, n individuals), dim(X) = (p markers, n), dim(Z) = (m, q trait covariates).

source
FlxQTL.flxMLMM.K2eigFunction
   K2eig(K,LOCO::Bool=false)

Returns eigenvectors and eigenvalues of genetic relatedness, or 3-d array of these of a genetic relatedness if LOCO is true.

Arguments

  • K : A matrix of genetic relatedness (Default). 3-d array of genetic relatedness (LOCO sets to be true.)
  • LOCO : Boolean. Default is false (no LOCO). (Leave One Chromosome Out).

Output

  • T : A matrix of eigenvectors, or a 3-d array of eigenvectors if LOCO sets to be true.
  • λ : A vector of eigenvalues, or a matrix of eigenvalues if LOCO sets to be true.

Examples

For a null variance component, or genetic relatedness for LOCO =false,

 T, λ = K2eig(K)

produces a matrix of T and a vector of λ.

For a genetic kinship calculated under LOCO (a 3-d array of kinship matrices),

 T, λ = K2eig(K,true)

produces a 3-d array of matrices T and a matrix of λ.

source
FlxQTL.flxMLMM.gene2ScanFunction
gene2Scan(cross::Int64,Tg::Union{Array{Float64,3},Matrix{Float64}},Λg::Union{Matrix{Float64},Vector{Float64}},
          Y::Array{Float64,2},XX::Markers,LOCO::Bool=false;m=size(Y,1),Z=diagm(ones(m)),
          Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*3,
          itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)

gene2Scan(Tg,Λg,Y,XX,cross,false;m=size(Y,1),Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,
        Prior::Matrix{Float64}=cov(Y,dims=2)*3,kmin::Int64=1,itol=1e-4,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)

Implement 2d-genome scan with/without LOCO (Leave One Chromosome Out). The first function is a FlxQTL model with/without Z that preestimate a null variance component matrix ($V_C$) under H0 : no QTL, followed by its adjustment by a scalar parameter under H1 : existing QTL. The third geneScan() is based on a conventional MLMM that estimate all parameters under H0/H1. The conventional MLMM is defined as

\[vec(Y) \sim MVN((X' \otimes Z)vec(B) (or ZBX), K \otimes \V_C +I \otimes \Sigma),\]

where Z is identity, K is a genetic kinship, and $V_C, \Sigma$ are variance component and error matrices, respectively.

The FlxQTL model estimates a scalar parameter $\tau^2$ under H1 to efficiently estimate the high dimensional variance component, i.e. $\Omega \approx \tau^2 V_C$

Arguments

  • cross : An integer (Int64) indicating the occurrence of combination of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degrees of freedom for the effect size of the genetic marker when doing genome scan.
  • Tg : A n x n matrix of eigenvectors, or a 3d-array of eigenvectors if LOCO is true. See also K2eig.
  • Λg : A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues if LOCO is true.
  • Y : A m x n matrix of response variables, i.e. m traits by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y0[1,:] (a vector) ->Y[[1],:] (a matrix) .
  • XX : A type of Markers.
  • Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (wavelet, polynomials, B-splines, etc.). If no assumption among traits, insert an identity matrix, Matrix(1.0I,m,m), or use the second geneScan().
  • LOCO : Boolean. Default is false (no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out) if true.

Keyword Arguments

  • Xnul : A matrix of covariates. Default is intercepts (1's): Xnul= ones(1,size(Y,2)). Adding covariates (C) is Xnul= vcat(ones(1,n),C) where size(C)=(c,n).
  • Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.
  • df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon. m+1 (weakly informative) is default.
  • itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
  • tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
  • tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
  • ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.

!!! Note

  • When some LOD scores return negative values, reduce tolerences for ECM to tol0 = 1e-4, or increase df_prior, such that $m+1 \le df_prior \le 2m$. The easiest setting is df_prior = Int64(ceil(1.9m)) for numerical stability.

Output

  • LODs : LOD scores. Can change to $- \log_{10}{P}$ using lod2logP.
  • est0 : A type of EcmNestrv.Approx including parameter estimates under H0: no QTL.
source
FlxQTL.flxMLMM.geneScanFunction
geneScan(cross::Int64,Tg::Union{Array{Float64,3},Matrix{Float64}},Λg::Union{Matrix{Float64},Vector{Float64}},
         Y::Array{Float64,2},XX::Markers,Z::Array{Float64,2},LOCO::Bool=false;m=size(Y,1),H0_up::Bool=false,
         Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*3,
         LogP::Bool=false,itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)

geneScan(cross,Tg,Λg,Y,XX,false;m=size(Y,1),H0_up::Bool=false,Xnul::Array{Float64,2}=ones(1,size(Y,2)),
         df_prior=m+1,Prior=cov(Y,dims=2)*3,LogP=false,itol=1e-3,tol0=1e-3,tol=1e-4,ρ=0.001)

geneScan(Tg,Λg,Y,XX,cross,false;Xnul=ones(1,size(Y,2)),df_prior=m+1,Prior=cov(Y,dims=2)*3,LogP=false,
         itol=1e-3,tol0=1e-3,tol=1e-4,ρ=0.001)

Implement 1d-genome scan with/without LOCO (Leave One Chromosome Out). The first two functions are FlxQTL models with/without Z that preestimate a null variance component matrix ($V_C$) under H0 : no QTL, followed by its adjustment by a scalar parameter under H1 : existing QTL. The third geneScan() is based on a conventional MLMM that estimate all parameters under H0/H1. The conventional MLMM is defined as

\[vec(Y) \sim MVN((X' \otimes Z)vec(B) (or ZBX), K \otimes \V_C +I_n \otimes \Sigma),\]

where Z is identity, K is a genetic kinship, and $V_C, \Sigma$ are variance component and error matrices, respectively.

The FlxQTL model estimates a scalar parameter $\tau^2$ under H1 to efficiently estimate the high dimensional variance component, i.e. $\Omega \approx \tau^2 V_C$, where $Z \neq I$.

Arguments

  • cross : An integer (Int64) indicating the occurrence of combination of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degrees of freedom for the effect size of the genetic marker when doing genome scan.
  • Tg : A n x n matrix of eigenvectors, or a 3d-array of eigenvectors if LOCO is true. See also K2eig.
  • Λg : A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues if LOCO is true.
  • Y : A m x n matrix of response variables, i.e. m traits by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y0[1,:] (a vector) ->Y[[1],:] (a matrix) .
  • XX : A type of Markers.
  • Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (wavelet, polynomials, B-splines, etc.). If no assumption among traits, insert an identity matrix, Matrix(1.0I,m,m), or use the second geneScan().
  • LOCO : Boolean. Default is false (no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out) if true.

Keyword Arguments

  • Xnul : A matrix of covariates. Default is intercepts (1's): Xnul= ones(1,size(Y,2)). Adding covariates (C) is Xnul= vcat(ones(1,n),C) where size(C)=(c,n).
  • Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.
  • df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon. m+1 (weakly informative) is default.
  • H0_up : Default returns null estimates, est0 from the conventional MLMM. It is recommended setting H0_up=true for $m \ge 11$ to avoid negative LODs.
  • itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
  • tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
  • tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
  • ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.
  • LogP : Boolean. Default is false. Returns $-\log_{10}{P}$ instead of LOD scores if true.

!!! Note

  • When some LOD scores return negative values, reduce tolerences for ECM to tol0 = 1e-4, or increase df_prior, such that $m+1 \le df_prior \le 2m$. The easiest setting is df_prior = Int64(ceil(1.9m)) for numerical stability.

Output

  • LODs (or logP) : LOD scores. Can change to $- \log_{10}{P}$ in lod2logP if LogP = true.
  • B : A 3-d array of B (fixed effects) matrices under H1: existence of QTL. If covariates are added to Xnul : Xnul= [ones(1,size(Y,2)); Covariates], ex. For sex covariates in 4-way cross analysis, B[:,2,100], B[:,3:5,100] are effects for sex, the rest genotypes of the 100th QTL, respectively.
  • est0 : A type of EcmNestrv.Result including parameter estimates under H0: no QTL for both functions. Returns EcmNestrv.Approx if H0_up=true for the FlxQTL model only.
source
FlxQTL.flxMLMM.permTestMethod
  permTest(nperm::Int64,cross,Kg,Y,XX::Markers;pval=[0.05 0.01],m=size(Y,1),Z=diagm(ones(m)),df_prior=m+1,
          Prior::Matrix{Float64}=cov(Y,dims=2)*3,Xnul=ones(1,size(Y,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)
  
  mlmmTest(nperm::Int64,cross,Kg,Y,XX::Markers;pval=[0.05 0.01],df_prior=m+1,
           Prior::Matrix{Float64}=cov(Y,dims=2)*3,Xnul=ones(1,size(Y,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)

Implement permutation test to get thresholds at the levels of type 1 error, α. Note that mlmmTest() is based on the conventional MLMM (Z=I). The FlxQTL model is defined as

\[vec(Y)\sim MVN((X' \otimes Z)vec(B) (or ZBX), K \otimes \Omega +I \otimes \Sigma),\]

where K is a genetic kinship, and $\Omega \approx \tau^2V_C$, $\Sigma$ are covariance matrices for random and error terms, respectively. $V_C$ is pre-estimated under the null model (H0) of no QTL from the conventiona MLMM, which is equivalent to the FlxQTL model for $\tau^2 =1$.

!!! NOTE

  • permutationTest() is implemented by geneScan with LOCO.

Arguments

  • nperm : An integer indicating the number of permutation to be implemented.
  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
  • Kg : A n x n genetic kinship matrix. Should be symmetric positive definite.
  • Y : A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y[1,:] (a vector) -> Y[[1],:] (a matrix) .
  • XX : A type of Markers.

Keyword Arguments

  • pval : A vector of p-values to get their quantiles. Default is [0.05 0.01] (without comma).
  • Xnul : A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.
  • Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just use Z = I (default).
  • Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.
  • df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon. m+1 (weakly informative) is default.
  • itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
  • tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
  • tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
  • ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.

!!! Note

  • When some LOD scores return negative values, reduce tolerences for ECM to tol0 = 1e-4, or increase df_prior, such that $m+1 \le df_prior \le 2m$. The easiest setting is df_prior = Int64(ceil(1.9m)) for numerical stability.

Output

  • maxLODs : A nperm x 1 vector of maximal LOD scores by permutation.
  • H1par_perm : A type of struct, EcmNestrv.Approx(B,τ2,Σ,loglik) including parameter estimates or EcmNestrv.Result(B,Vc,Σ,loglik) for a conventional MLMM under H0: no QTL by permutation.
  • cutoff : A vector of thresholds corresponding to pval.
source
FlxQTL.flxMLMM.permutationTestMethod
  permutationTest(nperm,cross,Kg,Y,XX;pval=[0.05 0.01],Z=diagm(ones(m)),Xnul=ones(1,size(Y,2)),
                 df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*3,
                 LOCO_all::Bool=false,itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001,δ =0.01)

Implement permutation test from the distribution of maximum LOD scores by LOCO to get thresholds at the levels of type 1 error, α. The FlxQTL model is defined as

\[vec(Y)\sim MVN((X' \otimes Z)vec(B) (or ZBX), K \otimes \Omega +I \otimes \Sigma),\]

where K is a genetic kinship, and $\Omega \approx \tau^2V_C$, $\Sigma$ are covariance matrices for random and error terms, respectively. $V_C$ is pre-estimated under the null model (H0) of no QTL from the conventiona MLMM, which is equivalent to the FlxQTL model for $\tau^2 =1$.

!!! NOTE

  • permTest() and mlmmTest() are implemented by geneScan without LOCO.

Arguments

  • nperm : An integer (Int64) indicating the number of permutation to be implemented.
  • cross : An integer (Int64) indicating the number of combination of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degrees of freedom for the effect size of a genetic marker when doing genome scan.
  • Kg : A 3d-array of n x n genetic kinship matrices. Should be symmetric positive definite.
  • Y : A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y[1,:] (a vector) -> Y[[1],:] (a matrix) .
  • XX : A type of Markers.

Keyword Arguments

  • pval : A vector of p-values to get their quantiles. Default is [0.05 0.01] (without comma).
  • Xnul : A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.
  • Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just use Z = diagm(ones(m)) or Matrix(1.0I,m,m) (default).
  • Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. An amplified empirical covariance matrix is default.
  • df_prior: Degrees of freedom, $\nu_0$ for Inverse-Wishart distributon. m+1 (weakly informative) is default.
  • LOCO_all : Boolean. Default is false, which implements geneScan(LOCO=true) with permuted data but a null variance component (Vc) preestimated only once with a kinship (kinshipLin for genotype (or allele) probabilities, or kinshipStd for genotypes) by LOCO=false implicitly. It is recommended setting true for high-dimensional traits for better accuracy, i.e. $m\le 16 \sim 18$.
  • itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
  • tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
  • tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
  • ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.
  • δ : A tuning parameter to correct q non-positive definite kinship without LOCO to pre-estimate a null variance component for low- to medium-dimensional traits ($m \le 16 \sim 18$) only. This no-LOCO kinship is computed inside the function for efficient computation.

!!! Note

  • When some LOD scores return negative values, reduce tolerences for ECM to tol0 = 1e-4, or increase df_prior, such that $m+1 \le df_prior \le 2m$. The easiest setting is df_prior = Int64(ceil(1.9m)) for numerical stability.

Output

  • maxLODs : A nperm x 1 vector of maximal LOD scores by permutation.
  • H1par_perm : A vector of structs, EcmNestrv.Approx(B,τ2,Σ,loglik) for each Chromosome per permutation, i.e. # of Chromosomes x nperm.
  • cutoff : A vector of thresholds corresponding to pval.
source
FlxQTL.EcmNestrvModule
EcmNestrv

A module for base algorithms using the ECM (Expectation-Conditional Maxization) with the Speed restarting Nesterov's accelerated gradient method to fit a flexible multivariate linear mixed model (flxMLMM).

source

Genetic Relatedness Matrices (GRM)

FlxQTL.GRMModule
 GRM

A module for computing Genetic Relatedness Matrix (or kinship).

source
FlxQTL.GRM.kinship4wayMethod
 kinship4way(genmat::Array{Float64,2})

Computes a kinship for four-way cross data counting the occurrence of different alleles between two markers: ex. AB-AB=0; AB-AC=1; AB-CD=2,$\dots$ Note: In R/qtl, genotypes are labeled as 1=AC; 2=BC; 3=AD; 4=BD by the function, read.cross().

Argument

  • genmat : A matrix of genotypes for four-way cross $(1,2, \dots)$. size(genematrix)= (p,n), for p genetic markers x n individuals(or lines).

Output

Returns a n x n symmetric matrix containing 1's on the diagonal.

source
FlxQTL.GRM.kinshipCtrMethod
 kinshipCtr(genmat::Array{Float64,2})

Calculates a kinship with a centered genotype matrix (linear kernel), where a marker mean is subtracted from its genotypes.

Argument

  • genmat : A matrix of genotype data (0,1,2). size(genmat)=(p,n) for p markers x n individuals

Output

Returns a n x n symmetric matrix. See also kinshipStd.

source
FlxQTL.GRM.kinshipGKMethod
 kinshipGK(G::Array{Float64,2},ρ::Float64)

Computes a non-linear kinship matrix using a Gaussian Kernel.

Arguments

  • G : A matrix of genotype. size(G)=(r,c), such that r genotype markers and c individuals (or lines).
  • ρ : A free parameter determining the width of the kernel. Could be attained empirically.

Output

Returns a c x c symmetric (positive definite) matrix containing 1's on the diagonal.

source
FlxQTL.GRM.kinshipLinMethod
kinshipLin(mat,cross)

Calculates a kinship matrix by linear kernel for genotype (or allele) probabilities. This is a general form of kinshipCtr, kinshipStd for genetype data, i.e. XX'.

Arguments

  • mat : A matrix of genotype (or allele) probabilities usually extracted from R/qtl, R/qtl2, or the counterpart packages. size(mat)= (p,n) for p genetic markers x n individuals.
  • cross (>1) : An integer indicating the occurrence or combination of alleles or genotypes in a genetic marker. ex. 1 for genotypes (labeled as 0,1,2), 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.

Output

Returns a n x n symmetric (positive definite) matrix containing 1's on the diagonal.

source
FlxQTL.GRM.kinshipLocoFunction
 kinshipLoco(kin,g::Markers,cross::Int64=1,ρ=0.01)

Generates a 3-d array of symmetric positive definite kinship matrices using LOCO (Leave One Chromosome Out) witout shrinkage intensity estimation. This is designed for linear kernel based kinship functions. See Arguments.

Arguments

  • kin : A function of computing a kinship. Can only use with kinshipCtr, kinshipStd for genotypes, and with kinshipLin for genotype (or allele) probabilities.
  • g : A struct of arrays, type Markers.
  • cross : A scalar indicating the occurrence or combination of alleles or genotypes in a genetic marker. Simply, it coincides with the number of columns per marker in genotype (or allele) probability data; for instance, 1 for genotypes (0,1,2) as default, 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
  • ρ : A real number to correct a non-positive definite kinship. Default is 0.01. When a kinship is not corrected to be positive definite with the default value, slight increase of ρ can adjust to minuscule negative eigenvalues.

Output

Returns a 3-d array of n x n symmetric positive definite matrices as many as Chromosomes. Refer to shrinkgLoco for comparison.

source
FlxQTL.GRM.kinshipManMethod
  kinshipMan(genematrix::Array{Float64,2})

Calculates a kinship matrix using a manhattan distance method. Missing values need to be either omitted or imputed. This function is designed for recombinant inbred line (RIL) (AA/BB), not for 4-way cross genotype data. See kinship4way.

Argument

  • genematrix : A matrix of genotypes, i.e. 0,1 (or 1,2). size(genematrix)= (p,n) for p genetic markers x n individuals(or lines).

Output

Returns a n x n symmetric matrix containing 1's on the diagonal.

source
FlxQTL.GRM.kinshipStdMethod
 kinshipStd(genmat::Array{Float64,2})

Produces a kinship with a standardized (or normalized) genotype matrix (linear kernel), i.e., centered genetypes divided by the standard deviation of a marker.

Argument

  • genmat : A matrix of genotype data (0,1,2). size(genmat)=(p,n) for p markers x n individuals

Output

Returns a n x n symmetric matrix. See also kinshipCtr.

source
FlxQTL.GRM.shrinkgMethod
 shrinkg(f,nb::Int64,geno)

Estimates a full-rank positive definite kinship matrix by shrinkage intensity estimation (bootstrap). Can only use with kinshipMan, kinship4way. This function runs fast by CPU parallelization. Add workers/processes using an addprocs() function before running it for speedup.

Arguments

  • f: A function of computing a kinship. Can only use with kinshipMan, kinship4way.
  • nb : An integer indicating the number of bootstrap. It does not have to be a large number.
  • geno : A matrix of genotypes. size(geno) = (p,n) for p markers x n individuals.

Example

julia> using FlxQTL
julia> addprocs(8)
julia> K = shinkage(kinshipMan,20,myGeno)

Output

Returns a full-rank symmetric positive definite matrix.

source
FlxQTL.GRM.shrinkgLocoMethod
   shrinkgLoco(kin,nb,g::Markers)

Generates a 3-d array of full-rank positive definite kinship matrices by shrinkage intensity estimation (bootstrap) using a LOCO (Leave One Chromosome Out) scheme. Can only use with kinshipMan, kinship4way.

Arguments

  • kin : A function of computing a kinship. Can only use with kinshipMan, kinship4way
  • nb : An integer indicating the number of bootstrap.
  • g : A struct of arrays, type Markers.

Output

Returns 3-d array of n x n symmetric positive definite matrices as many as Chromosomes.

source

Flexible Multivariate Linear Models (flxMLM)

FlxQTL.flxMLMModule
flxMLM

A module designed for fitting a Multivariate Linear Model by the (Residual) Maximum Likelihood (REML or MLE) method. The model:

$Y=XBZ'+E$, where $E(vec(Y))= (Z \otimes X)vec(B)$, $Var(vec(Y))= \Sigma \otimes I_n$,

where dim(Y)= (n individuals, m traits), dim(X) = (n,p markers), and dim(Z) = (m, q trait covariates).

source
FlxQTL.flxMLM.mlm1ScanFunction
mlm1Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;LogP::Bool=false,
          Xnul::Matrix{Float64}=ones(size(Y,1),1))
mlm1Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;LogP::Bool=false,
          Xnul::Matrix{Float64}=ones(size(Y,1),1))

Implement 1d-genome scan. The second mlm1Scan() is for Z=I case; one can also run the first by inserting an identity matrix (Matrix(1.0I,m,m)) into Z. math vec(Y) \sim MVN ((Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I_n), where size(Y)=(n,m), size(X)=(n,p), and size(Z)=(m,q).

Arguments

  • cross : An integer indicating the occurrence of alleles or genotypes. ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
  • Y : A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in the arguement. i.e. Y0[:,1] (a vector) ->Y[:,[1]] (a matrix) .
  • XX : A type of Markers. Be cautious when combining genotype infomation into the struct of Markers; size(X) = (n,p).
  • Z : An optional m x q matrix of low-dimensional phenotypic (or trait) covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just use Z = I.
  • reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented if true.

Keyword Arguments

  • Xnul : A matrix of covariates. Default is intercepts (1's): Xnul= ones(size(Y,1),1). Adding covariates (C) is Xnul= hcat(ones(n),C) where size(C)=(c,n) for n = size(Y,1).
  • LogP : Boolean. Default is false. Returns $-\log_{10}{P}$ instead of LOD scores if true.

Output

  • LODs (or logP) : LOD scores. Can change to $- \log_{10}{P}$ in lod2logP if LogP = true.
  • B : A 3-d array of B (fixed effects) matrices under H1: existence of QTL. If covariates are added to Xnul : Xnul= [ones(size(Y,1)) Covariates], ex. For sex covariates in 4-way cross analysis, B[2,:,100], B[3:5,:,100] are effects for sex, the rest genotypes of the 100th QTL, respectively.
  • est0 : A type of MLM.Estimat including parameter estimates under H0: no QTL.
source
FlxQTL.flxMLM.mlm2ScanFunction
 mlm2Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;Xnul::Matrix{Float64}=ones(size(Y,1),1))
 mlm2Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;Xnul::Matrix{Float64}=ones(size(Y,1),1))

Implement 2d-genome scan. The second mlm2Scan() is for Z=I case; one can also run the first by inserting an identity matrix (Matrix(1.0I,m,m)) into Z. math vec(Y) \sim MVN ((Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I_n), where size(Y)=(n,m), size(X)=(n,p), and size(Z)=(m,q).

Arguments

  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
  • Y : A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in arguement. i.e. Y0[:,1] (a vector) ->Y[:,[1]] (a matrix) .
  • XX : A type of Markers. Be cautious when combining genotype infomation into the struct of Markers; size(X) = (n,p).
  • Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just use Z = I.
  • reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented if true.

Keyword Arguments

  • Xnul : A matrix of covariates. Default is intercepts (1's). Unless adding covariates, just leave as it is. See mlm1Scan.

Output

  • LODs : LOD scores. Can change to $- \log_{10}{P}$ using lod2logP.
  • est0 : A type of MLM.Estimat including parameter estimates under H0: no QTL.
source
FlxQTL.flxMLM.mlmTestFunction
 mlmTest(nperm::Int64,cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;
          Xnul::Matrix{Float64}=ones(size(Y,1),1),pval=[0.05 0.01])
 mlmTest(nperm::Int64,cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;
          Xnul::Matrix{Float64}=ones(size(Y,1),1),pval=[0.05 0.01])

Implement permutation test to get thresholds at the levels of type 1 error, α. The second mlmTest() is for Z=I case; one can also run the first by inserting an identity matrix (Matrix(1.0I,m,m)) into Z. math vec(Y) \sim MVN ((Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I_n), where size(Y)=(n,m), size(X)=(n,p), and size(Z)=(m,q).

Arguments

  • nperm : An integer indicating the number of permutation to be implemented.
  • cross : An integer indicating the incidence of alleles or genotypes per marker. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
  • Y : A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in arguement. i.e. Y[:,1] (a vector) -> Y[:,[1]] (a matrix) .
  • XX : A type of Markers. Be cautious when combining genotype infomation into the struct of Markers; size(X) = (n,p).
  • reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented if true.

Keyword Arguments

  • pval : A vector of p-values to get their quantiles. Default is [0.05 0.01] (without comma).
  • Xnul : A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.
  • Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If the data does not assume any particular trait relation, just use Z = I.

Output

  • maxLODs : A nperm x 1 vector of maximal LOD scores by permutation.

  • H1par_perm : A type of struct, MLM.Estimat(B,Σ,loglik) including parameter estimates for a MLM under H0: no QTL by permutation.

  • cutoff : A vector of thresholds corresponding to pval.

source
FlxQTL.MLMModule
  MLM

A module for fitting general Multivariate Linear Models motivated by functional data analysis via mle or reml. The default fitting method is mle. ( i.e. reml=false)

The model: $Y=XBZ'+E$, where $E(Y)=XBZ'$ (or $E(vec(Y))= (Z \otimes X)vec(B)$ ), $var(vec(E))=\Sigma \otimes I.$ size(Y)=(n,m), size(X)=(n,p), size(Z)=(m,q).

source
FlxQTL.MLM.EstimatType
 Estimat(B::Array{Float64,2},Σ::Array{Float64,2},loglik::Float64)

A struct of arrays for results by fitting a multivariate linear model, mGLM(). It includes B(fixed effects), Σ (m x m covariance matrix), loglik(a value of log-likelihood by mle or reml).

source
FlxQTL.MLM.mGLMFunction
  mGLM(Y::Array{Float64,2},X::Array{Float64,2},Z::Array{Float64,2},reml::Bool=false)
  mGLM(Y::Array{Float64,2},X::Array{Float64,2},reml::Bool=false)

Fitting multivariate General Linear Models via MLE (or REML) and returns a type of a struct Estimat.

Arguments

  • Y : A matrix of response variables, i.e. traits. size(Y)=(n,m) for n individuals x m traits
  • X : A matrix of independent variables, i.e. genotypes or genotype probabilities including intercept or/and covariates. size(X)=(n,p) for n individuals x p markers including intercept or/and covariates
  • Z : An optional matrix of low-dimensional phenotypic or trait covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). size(Z)=(m,q) for m traits x q phenotypic covariates. If the data does not assume any particular trait relation, just use Z = Matrix(1.0I,m,m) or the second functon.
  • reml : Boolean. Default is fitting the model via mle. Resitricted MLE is implemented if true.

Output

Returns Estimat .

source

Utility Functions (Util)

FlxQTL.Util.MarkersType
Markers(name::Array{String,1},chr::Array{Any,1},pos::Array{Float64,1},X::Array{Float64,2})

A struct of arrays creating genotype or genotype probability data for genome scan.

Arguments

  • name : A vector of marker names
  • chr : A vector of Chromosomes
  • pos : A vector of marker positions (cM)
  • X : A matrix of genotypes or genotype probabilities
source
FlxQTL.Util.Y_huberMethod
Y_huber(Y::Array{Float64,2})

Rescale Y (phenotype or trait data) to be less sensitive to outliers using by Huber loss function and MAD (median absolute deviation). size(Y)=(m,n) for m trait and n individuals.

source
FlxQTL.Util.array2matMethod
 array2mat(cross::Int64,X0::Array{Float64,3})

Returns a 3-d array to a matrix of genotype probabilities. size(X0)=(cross,n,p) –> (p1,n), where p1 = cross*p for p markers, cross alleles or genotypes, and n individuals. See mat2array.

source
FlxQTL.Util.getFinoidxMethod
getFinoidx(phenoData::Array{Union{Missing,Float64},2})

Attains indices of phenotype data without missing values.

Argument

  • phenoData : A matrix of phenotype (or trait) data including missing values. size(phenoData) = (m,n) for m traits and n individuals.
source
FlxQTL.Util.getGenoidxFunction
getGenoidx(GenoData::Union{Array{Any,2},Array{Float64,2}},maf::Float64=0.025)

Attains genotype indices to drop correlated or bad markers.

Arguments

  • GenoData : A matrix of genotype data. size(GenoData)= (p,n) for p markers and n individuals.
  • maf : A scalar for dropping criterion of markers. Default is 0.025 i.e. markers of MAF < 0.025 are dropped.
source
FlxQTL.Util.lod2logPMethod
lod2logP(LODs::Union{Array{Float64,1},Array{Any,1}},v::Int64)

Caculates $-\log_{10}{P}$ from LOD scores.

Arguments

  • LODs : A vector of LOD scores computed from genome scan.
  • v : Degrees of freedom for Chi-squared distribution.

!!! NOTE

  • To compute degrees of freedom for 1D-genome scan (geneScan), v = prod(size(B1))-prod(size(B0), where B1 and B0 are effect size matrices under H1 of a full model (intercept + covariates + QTL) and H0 of no QTL (intercept + covariates), respectively.
source
FlxQTL.Util.mat2arrayMethod
 mat2array(cross::Int64,X0)

Returns a matrix of genotype probabilities to 3-d array. size(X0)=(p1,n) –> (cross,n,p), where p1 = cross*p for p markers, cross alleles or genotypes, and n individuals.

Argument

  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
  • X0 : A matrix of genotype probability data computed from r/qtl or r/qtl2.

See array2mat.

source
FlxQTL.Util.newMarkersFunction
newMarkers(XX::Markers,cross::Int64,cM::Int64=2)

Returns a struct of Markers by keeping only markers positioned in every cM centimorgans for any genome scan to avoid singularity.

Arguments

  • XX : A type of Markers. See Markers.
  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
  • cM : An integer of dropping criterion of markers. Default is 2, i.e. keeping only markers in every 2 cM, or dropping markers within 2cM between 2 markers.
source
FlxQTL.Util.ordrMarkersMethod
ordrMarkers(markers::Array{Any,2})

Rearrange by CPU parallelization marker information composed of marker name, chr, position obtained from rqtl2, which is not listed in order (excluding X chromosome).

Argument

  • markers : An array of marker information.
source
FlxQTL.Util.setSeedMethod
setSeed(seedNum::Int64)

Assigns different numbers of seeds to workers (or processes) for reproducibility.

Arguments

  • seedNum: An integer. A minimum seed number to assign a worker. For distributed computing, seed numbers are generated for multiple workers by increasing 1, e.g. seedNum = 123 & 10 workers, setSeed generates seeds from 123 to 132 and assigns to corresponding workers (processes).

Examples

julia> using Distributed
julia> addprocs(10)
julia> @everywhere using flxQTL
julia> setSeed(123)
source
FlxQTL.Util.sortBycMFunction
    sortBycM(Chr::Any,XX::Markers,cross::Int64,cM::Int64=2)

Returns marker indices in Chromosome Chr and the corresponding genotype probabilities keeping only markers positioned in every cM centimorgans for any genome scan to avoid singularity.

Arguments

  • Chr : A type of Any indicating a particular chromosome to sort markers out.
  • XX : A type of Markers.
  • cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
  • cM : An integer of dropping criterion of markers. Default is 2, i.e. keeping only markers in every 2 cM, or dropping markers within 2cM between 2 markers.

See also newMarkers

source