Functions and Types

Flexible Multivariate Linear Mixed Models (flxMLMM)

FlxQTL.FlxQTL — Module

FlxQTL

flexible QTL analysis tools for structured multiple traits fitting a Multivariate Linear Mixed Model or a Multivariate Linear Model.

source

FlxQTL.flxMLMM — Module

flxMLMM

A module designed for fitting a Multivariate Linear Mixed Model run by Nesterov's Accelerated Gradient with restarting scheme incorporated with Expectation Conditional Maximization to estimate MLEs. REML is not supported.

The model:

$Y=XBZ'+R+E$, where $E(vec(Y))= (Z \otimes X)vec(B)$, $var(vec(Y))= \tau^2 K_G \otimes K_C + I_n \otimes \Sigma$

source

FlxQTL.flxMLMM.K2Eig — Function

  K2Eig(Kg,Kc::Array{Float64,2},LOCO::Bool=false)

Returns a two pairs of eigenvectors and eigenvalues for genetic and climatic relatedness matrices.

Arguments

Kg : A matrix of a genetic kinship, or 3-d array of that if LOCO sets to be true.
Kc : A matrix of a climatic relatedness.
LOCO : Boolean. Default is false (no LOCO). (Leave One Chromosome Out). LOCO is only connected to the genetic kinship (Kg).

Output

Tg : A matrix of eigenvectors for Kg, or 3-d array of eigenvectors if LOCO sets to be true.
λg : A vector of eigenvalues for Kg, or matrix of eigenvalues if LOCO sets to be true.
Tc : A matrix of eigenvectors for Kc.
λc : A vector of eigenvalues for Kc

See K2eig.

Examples

For a genetic kinship calculated under LOCO (3-d array of kinship),

 Tg,λg,Tc,λc = K2Eig(Kg,Kc,true)

produces a 3-d array of Tg, matrices of λg, Tc, and a vector of λc.

source

FlxQTL.flxMLMM.K2eig — Function

   K2eig(K,LOCO::Bool=false)

Returns eigenvectors and eigenvalues of a (genetic, climatic) relatedness, or 3-d array of these of a genetic relatedness if LOCO is true.

Arguments

K : A matrix of (genetic or climatic) relatedness (Default). 3-d array of genetic relatedness (LOCO sets to be true.)
LOCO : Boolean. Default is false (no LOCO). (Leave One Chromosome Out).

Output

T : A matrix of eigenvectors, or 3-d array of eigenvectors if LOCO sets to be true.
λ : A vector of eigenvalues, or matrix of eigenvalues if LOCO sets to be true.

See also K2Eig.

Examples

For a (climatic) relatedness, or genetic relatedness for LOCO =false,

 T, λ = K2eig(K)

produces a matrix of T and a vector of λ.

For a genetic kinship calculated under LOCO (3-d array of kinship),

 T, λ = K2eig(K,true)

produces a 3-d array of T and a matrix of λ.

source

FlxQTL.flxMLMM.gene2Scan — Function

gene2Scan(cross::Int64,Tg,Tc::Array{Float64,2},Λg,λc::Array{Float64,1},Y::Array{Float64,2},
         XX::Markers,Z::Array{Float64,2},LOCO::Bool=false;ρ=0.001,Xnul::Array{Float64,2}=ones(1,size(Y,2)), 
         df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*5,itol=1e-4,tol0=1e-3,tol::Float64=1e-4)
gene2Scan(cross::Int64,Tg,Λg,Y::Array{Float64,2},XX::Markers,Z::Array{Float64,2},LOCO::Bool=false;
           Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*5,
           itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
gene2Scan(cross::Int64,Tg,Λg,Y::Array{Float64,2},XX::Markers,LOCO::Bool=false;Xnul::Array{Float64,2}=ones(1,size(Y,2)),
           df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*5,itol=1e-4,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)

Implement 2d-genome scan with/without LOCO (Leave One Chromosome Out). Note that the second gene2Scan includes getKc for precomputing Kc– no need of precomputing and doing eigen-decomposition to Kc separately. The last gene2Scan is run by a conventional MLMM (no Z, i.e. Z=I):

\[vec(Y) \sim MVN((Z \otimes X)vec(B) (or XBZ') , K \otimes \Sigma_1 +I \otimes \Sigma_2),\]

where K is a genetic kinship, $\Sigma_1, \Sigma_2$ are covariance matrices for random and error terms, respectively. Z can be replaced with an identity matrix.

Arguments

cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
Tg : A n x n matrix of eigenvectors from K2eig, or K2Eig. Returns 3d-array of eigenvectors as many as Chromosomes if LOCO is true.
Tc : A m x m matrix of eigenvectors from the precomputed covariance matrix of Kc under the null model of no QTL.
Λg : A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues if LOCO is true.
λc : A m x 1 vector of eigenvalues from Kc.
Y : A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y0[1,:] (a vector) -> Y[[1],:] (a matrix) .
XX : A type of Markers.
Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). To use the identity matrix just as the conventional MLMM, simply type diagm(ones(m)).
LOCO : Boolean. Default is false (no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out).

Keyword Arguments

Xnul : A matrix of covariates. Default is intercepts (1's). Unless adding covariates, just leave as it is. See geneScan.
Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. A large scaled covariance matrix (a weakly informative prior) is default.
df_prior: degrees of freedom, $\nu_0$ for Inverse-Wishart distributon. m+1 (weakly informative) is default.
itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.

!!! Note

When some LOD scores return negative values, reduce tolerences for ECM to tol0 = 1e-4. It works in most cases. If not, can reduce both tol0 and tol to 1e-4 or further. 2D scan is usually computationally heavy and takes much time especially for large data with high-dimensional traits, i.e. large m and n. This will be worse when the conventional LMM is chosen.

Output

LODs : LOD scores. Can change to $- \log_{10}{P}$ using lod2logP.
est0 : A type of EcmNestrv.Approx including parameter estimates under H0: no QTL.

source

FlxQTL.flxMLMM.geneScan — Function

geneScan(cross::Int64,Tg,Tc::Array{Float64,2},Λg,λc::Array{Float64,1},Y::Array{Float64,2},XX::Markers,Z::Array{Float64,2},
         LOCO::Bool=false;m=size(Y,1),tdata::Bool=false,LogP::Bool=false,Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,
            Prior::Matrix{Float64}=cov(Y,dims=2)*5,itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
geneScan(cross::Int64,Tg::Union{Array{Float64,3},Array{Float64,2}},Tc::Array{Float64,2},Λg::Union{Array{Float64,2},Array{Float64,1}},
         λc::Array{Float64,1},Y::Array{Float64,2},XX::Markers,LOCO::Bool=false;m=size(Y,1),LogP::Bool=false,Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,
    Prior::Matrix{Float64}=cov(Y,dims=2)*5,itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
geneScan(cross::Int64,Tg,Λg,Y::Array{Float64,2},XX::Markers,LOCO::Bool=false;m=size(Y,1),Xnul::Array{Float64,2}=ones(1,size(Y,2)),
            df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*5,tdata::Bool=false,LogP::Bool=false,
           itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)
gene1Scan(cross::Int64,Tg,Λg,Y::Array{Float64,2},XX::Markers,Z::Array{Float64,2},LOCO::Bool=false;m=size(Y,1),
           Xnul::Array{Float64,2}=ones(1,size(Y,2)),df_prior=m+1,Prior::Matrix{Float64}=cov(Y,dims=2)*5,
             tdata::Bool=false,LogP::Bool=false,itol=1e-3,tol0=1e-3,tol::Float64=1e-4,ρ=0.001)

Implement 1d-genome scan with/without LOCO (Leave One Chromosome Out). Note that gene1Scan includes getKc for precomputing Kc– no need of precomputing and doing eigen-decomposition to Kc separately. The third geneScan() is based on a conventional MLMM:

\[vec(Y) \sim MVN((Z \otimes X)vec(B) (or XBZ'), K \otimes \Sigma_1 +I \otimes \Sigma_2),\]

where K is a genetic kinship, $\Sigma_1, \Sigma_2$ are covariance matrices for random and error terms, respectively. Z can be replaced with an identity matrix.

Arguments

cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
Tg : A n x n matrix of eigenvectors from K2eig, or K2Eig. Returns 3d-array of eigenvectors as many as Chromosomes if LOCO is true.
Tc : A m x m matrix of eigenvectors from the precomputed covariance matrix of Kc under the null model of no QTL.
Λg : A n x 1 vector of eigenvalues from kinship. Returns a matrix of eigenvalues if LOCO is true.
λc : A m x 1 vector of eigenvalues from the precomputed covariance matrix of Kc.
Y : A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y0[1,:] (a vector) ->Y[[1],:] (a matrix) .
XX : A type of Markers.
Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If nothing to insert in Z, just exclude it or insert an identity matrix, Matrix(1.0I,m,m). m traits x q phenotypic covariates.
LOCO : Boolean. Default is false (no LOCO). Runs genome scan using LOCO (Leave One Chromosome Out).

Keyword Arguments

Xnul : A matrix of covariates. Default is intercepts (1's): Xnul= ones(1,size(Y,2)). Adding covariates (C) is Xnul= vcat(ones(1,n),C) where size(C)=(c,n) for n = size(Y0,2).
Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. A large scaled covariance matrix (a weakly informative prior) is default.
df_prior: degrees of freedom, $\nu_0$ for Inverse-Wishart distributon. m+1 (weakly informative) is default.
itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.
LogP : Boolean. Default is false. Returns $-\log_{10}{P}$ instead of LOD scores if true.

!!! Note

When some LOD scores return negative values, reduce tolerences for ECM to tol0 = 1e-4. It works in most cases. If not, can reduce both tol0 and tol to 1e-4 or further.

Output

LODs (or logP) : LOD scores. Can change to $- \log_{10}{P}$ in lod2logP if LogP = true.
B : A 3-d array of B (fixed effects) matrices under H1: existence of QTL. If covariates are added to Xnul : Xnul= [ones(1,size(Y,2)); Covariates], ex. For sex covariates in 4-way cross analysis, B[:,2,100], B[:,3:5,100] are effects for sex, the rest genotypes of the 100th QTL, respectively.
est0 : A type of EcmNestrv.Approx including parameter estimates under H0: no QTL.

source

FlxQTL.flxMLMM.getKc — Method

getKc(Y::Array{Float64,2};m=size(Y,1),Z=diagm(ones(m)), df_prior=m+1,
       Prior::Matrix{Float64}=cov(Y,dims=2)*5,Xnul::Array{Float64,2}=ones(1,size(Y,2)),
       itol=1e-2,tol::Float64=1e-3,ρ=0.001)

Pre-estimate Kc by regressing Y on Xnul, i.e. estimating environmental covariates under H0: no QTL.

Argument

Y : A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y0[1,:] (a vector) ->Y[[1],:] (a matrix) .

Keyword Arguments

Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). An identity matrix, $I_m$, is default.
Xnul : A matrix of covariates. Default is intercepts (1's): Xnul= ones(1,size(Y0)). Adding covariates (C) is Xnul= vcat(ones(1,m),C) where size(C)=(c,m) for m = size(Y0,1).
Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. A large scaled covariance matrix (a weakly informative prior) is default.
df_prior: degrees of freedom, $\nu_0$ for Inverse-Wishart distributon. m+1 (weakly informative) is default.
itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.

Output

InitKc : A type of struct of arrays, including pre-estimated Kc,and null estimates of B, Σ,τ2used as initial values inside gene1Scan, one of geneScan functions, or gene2Scan.

Examples

julia> K0 = getKc(Y)  
julia> K0.Kc  # for Kc
julia> K0.B # for B under H0

source

FlxQTL.flxMLMM.permTest — Method

  permTest(nperm::Int64,cross,Kg,Kc,Y,XX::Markers;pval=[0.05 0.01],m=size(Y,1),Z=diagm(ones(m)),df_prior=m+1,
         Prior::Matrix{Float64}=cov(Y,dims=2)*5,Xnul=ones(1,size(Y,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)
  permTest(nperm::Int64,cross,Kg,Y,XX::Markers;pval=[0.05 0.01],df_prior=m+1,
             Prior::Matrix{Float64}=cov(Y,dims=2)*5,Xnul=ones(1,size(Y,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)

Implement permutation test to get thresholds at the levels of type 1 error, α. Note that the last permTest() is for the conventional MLMM:

\[vec(Y)\sim MVN((I \otimes X)vec(B) (or BX), K \otimes \Sigma_1 +I \otimes \Sigma_2),\]

where K is a genetic kinship, $\Sigma_1, \Sigma_2$ are covariance matrices for random and error terms, respectively.

Arguments

nperm : An integer indicating the number of permutation to be implemented.
cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
Kg : A n x n genetic kinship matrix. Should be symmetric positive definite.
Kc : A m x m precomputed covariance matrix of Kc under the null model of no QTL. Should be symmetric positive definite.
Y : A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y[1,:] (a vector) -> Y[[1],:] (a matrix) .
XX : A type of Markers.

Keyword Arguments

pval : A vector of p-values to get their quantiles. Default is [0.05 0.01] (without comma).
Xnul : A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.
Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). Default is an identity matrix for the dimension of m traits x q phenotypic covariates.
Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. A large scaled covariance matrix (a weakly informative prior) is default.
df_prior: degrees of freedom, $\nu_0$ for Inverse-Wishart distributon. m+1 (weakly informative) is default.
itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.

Output

maxLODs : A nperm x 1 vector of maximal LOD scores by permutation.
H1par_perm : A type of struct, EcmNestrv.Approx(B,τ2,Σ,loglik) including parameter estimates or EcmNestrv.Result(B,Vc,Σ,loglik) for a conventional MLMM under H0: no QTL by permutation.
cutoff : A vector of thresholds corresponding to pval.

source

FlxQTL.flxMLMM.permutationTest — Method

  permutationTest(nperm::Int64,cross,Kg,Kc,Y,XX::Markers;pval=[0.05 0.01],m=size(Y,1),Z=diagm(ones(m)),df_prior=m+1,
      Prior::Matrix{Float64}=cov(Y,dims=2)*5,Xnul=ones(1,size(Y,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)
  permutationTest(nperm::Int64,cross,Kg,Y,XX::Markers;pval=[0.05 0.01],m=size(Y,1),df_prior=m+1,
             Prior::Matrix{Float64}=cov(Y,dims=2)*5,Xnul=ones(1,size(Y,2)),itol=1e-4,tol0=1e-3,tol=1e-4,ρ=0.001)

Implement permutation test by geneScan with LOCO to get thresholds at the levels of type 1 error, α. Note that the last permutationTest() is for the conventional MLMM:

\[vec(Y)\sim MVN((I \otimes X)vec(B) (or BX), K \otimes \Sigma_1 +I \otimes \Sigma_2),\]

where K is a genetic kinship, $\Sigma_1, \Sigma_2$ are covariance matrices for random and error terms, respectively.

!!! NOTE

permTest() is implemented by geneScan without LOCO.

Arguments

nperm : An integer indicating the number of permutation to be implemented.
cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
Kg : A n x n genetic kinship matrix. Should be symmetric positive definite.
Kc : A m x m precomputed covariance matrix of Kc under the null model of no QTL. Should be symmetric positive definite.
Y : A m x n matrix of response variables, i.e. m traits (or environments) by n individuals (or lines). For univariate phenotypes, use square brackets in arguement. i.e. Y[1,:] (a vector) -> Y[[1],:] (a matrix) .
XX : A type of Markers.

Keyword Arguments

pval : A vector of p-values to get their quantiles. Default is [0.05 0.01] (without comma).
Xnul : A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.
Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). Default is an identity matrix for the dimension of m traits x q phenotypic covariates.
Prior: A positive definite scale matrix, $\Psi$, of prior Inverse-Wishart distributon, i.e. $\Sigma \sim W^{-1}_m (\Psi, \nu_0)$. A large scaled covariance matrix (a weakly informative prior) is default.
df_prior: degrees of freedom, $\nu_0$ for Inverse-Wishart distributon. m+1 (weakly informative) is default.
itol : A tolerance controlling ECM (Expectation Conditional Maximization) under H0: no QTL. Default is 1e-3.
tol0 : A tolerance controlling ECM under H1: existence of QTL. Default is 1e-3.
tol : A tolerance of controlling Nesterov Acceleration Gradient method under both H0 and H1. Default is 1e-4.
ρ : A tunning parameter controlling $\tau^2$. Default is 0.001.

Output

maxLODs : A nperm x 1 vector of maximal LOD scores by permutation.
H1par_perm : A vector of structs, EcmNestrv.Approx(B,τ2,Σ,loglik) including parameter estimates or EcmNestrv.Result(B,Vc,Σ,loglik) for a conventional MLMM under H0: no QTL by permutation.
cutoff : A vector of thresholds corresponding to pval.

source

FlxQTL.EcmNestrv — Module

EcmNestrv

A module for base algorithms using the ECM (Expectation-Conditional Maxization) with the Speed restarting Nesterov's accelerated gradient method to fit a flexible multivariate linear mixed model (flxMLMM).

source

Genetic Relatedness Matrices (GRM)

FlxQTL.GRM — Module

GRM

A module for computing Genetic Relatedness Matrix (or kinship).

source

FlxQTL.GRM.kinship4way — Method

 kinship4way(genmat::Array{Float64,2})

Computes a kinship for four-way cross data counting different alleles between two markers: ex. AB-AB=0; AB-AC=1; AB-CD=2,$\dots$ Note: In R/qtl, genotypes are labeled as 1=AC; 2=BC; 3=AD; 4=BD by the function, read.cross.

Argument

genmat : A matrix of genotypes for four-way cross $(1,2, \dots)$. size(genematrix)= (p,n), for p genetic markers x n individuals(or lines).

Output

Returns a n x n symmetric matrix containing 1's on the diagonal.

source

FlxQTL.GRM.kinshipCtr — Method

 kinshipCtr(genmat::Array{Float64,2})

Calculates a kinship by a centered genotype matrix (linear kernel), i.e. genotypes subtracted by marker mean.

Argument

genmat : A matrix of genotype data (0,1,2). size(genmat)=(p,n) for p markers x n individuals

Output

Returns a n x n symmetric matrix. See also kinshipStd.

source

FlxQTL.GRM.kinshipLin — Method

kinshipLin(mat,cross)

Calculates a kinship matrix by linear kernel.

Arguments

mat : A matrix of genotype (or allele) probabilities usually extracted from R/qtl, R/qtl2, or the counterpart packages. size(mat)= (p,n) for p genetic markers x n individuals.
cross : A scalar indicating instances of alleles or genotypes in a genetic marker. ex. 1 for genotypes (labeled as 0,1,2), 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.

Output

Returns a n x n symmetric (positive definite) matrix containing 1's on the diagonal.

See also kinshipCtr, kinshipStd for genetype data.

source

FlxQTL.GRM.kinshipLoco — Function

 kinshipLoco(kin,g::Markers,cross::Int64=1)

Generates a 3-d array of symmetric positive definite kinship matrices using LOCO (Leave One Chromosome Out) witout shrinkage intensity estimation. When a kinship is not positive definite, a tweak like a weighted average of kinship and Identity is used to correct minuscule negative eigenvalues.

Arguments

kin : A function of computing a kinship. Can only use with kinshipCtr, kinshipStd for genotypes, and with kinshipLin for genotype (or allele) probabilities.
g : A struct of arrays, type Markers.
cross : A scalar indicating instances of alleles or genotypes in a genetic marker. ex. 1 for genotypes (0,1,2) as default, 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.

Output

Returns 3-d array of n x n symmetric positive definite matrices as many as Chromosomes. Refer to shrinkgLoco.

source

FlxQTL.GRM.kinshipMan — Method

  kinshipMan(genematrix::Array{Float64,2})

Calculates a kinship matrix using a manhattan distance. Missing values need to be either omitted or imputed. This function is for recombinant inbred line (RIL) (AA/BB), not for 4-way cross genotype data. See kinship4way.

Argument

genematrix : A matrix of genotypes, i.e. 0,1 (or 1,2). size(genematrix)= (p,n) for p genetic markers x n individuals(or lines).

Output

Returns a n x n symmetric matrix containing 1's on the diagonal.

source

FlxQTL.GRM.kinshipStd — Method

 kinshipStd(genmat::Array{Float64,2})

Calculates a kinship by a standardized (or normalized) genotype matrix (linear kernel), i.e. genotypes subtracted by marker mean and divided by marker standard deviation.

Argument

genmat : A matrix of genotype data (0,1,2). size(genmat)=(p,n) for p markers x n individuals

Output

Returns a n x n symmetric matrix. See also kinshipCtr.

source

FlxQTL.GRM.shrinkg — Method

 shrinkg(f,nb::Int64,geno)

Estimates a full-rank positive definite kinship matrix by shrinkage intensity estimation (bootstrap). Can only use with kinshipMan, kinship4way. This function runs faster by CPU parallelization. Add workers/processes using addprocs() function before running for speedup.

Arguments

f: A function of computing a kinship. Can only use with kinshipMan, kinship4way.
nb : An integer indicating the number of bootstrap. It does not have to be a large number.
geno : A matrix of genotypes. See kinshipMan, kinship4way for dimension.

Example

julia> using FlxQTL
julia> addprocs(8)
julia> K = shinkage(kinshipMan,20,myGeno)

Output

Returns a full-rank symmetric positive definite matrix.

source

FlxQTL.GRM.shrinkgLoco — Method

   shrinkgLoco(kin,nb,g::Markers)

Generates 3-d array of full-rank positive definite kinship matrices by shrinkage intensity estimation (bootstrap) using a LOCO (Leave One Chromosome Out) scheme.

Argument

kin : A function of computing a kinship. Can only use with kinshipMan, kinship4way
nb : An integer indicating the number of bootstrap.
g : A struct of arrays, type Markers.

Output

Returns 3-d array of n x n symmetric positive definite matrices as many as Chromosomes.

source

Flexible Multivariate Linear Models (flxMLM)

FlxQTL.flxMLM — Module

flxMLM

A module designed for fitting a Multivariate Linear Model by the (Residual) Maximum Likelihood (REML or MLE) method. The model:

$Y=XBZ'+E$, where $E(vec(Y))= (Z \otimes X)vec(B)$, $var(vec(Y))= I_n \otimes \Sigma$

source

FlxQTL.flxMLM.mlm1Scan — Function

mlm1Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;LogP::Bool=false,
          Xnul::Matrix{Float64}=ones(size(Y,1),1))
mlm1Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;LogP::Bool=false,
          Xnul::Matrix{Float64}=ones(size(Y,1),1))

Implement 1d-genome scan. The second mlm1Scan() is for Z=I case; one can also run the first by inserting an identity matrix (I) into Z. math vec(Y) \sim MVN ( (Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I), where size(Y)=(n,m), size(X)=(n,p), size(Z)=(m,q).

Arguments

cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
Y : A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in arguement. i.e. Y0[:,1] (a vector) ->Y[:,[1]] (a matrix) .
XX : A type of Markers. Be cautious when combining genotype infomation into the struct of Markers; size(X) = (n,p).
Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If nothing to insert in Z, just exclude it or insert an identity matrix, Matrix(1.0I,m,m). m traits x q phenotypic covariates.
reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented if true.

Keyword Arguments

Xnul : A matrix of covariates. Default is intercepts (1's): Xnul= ones(size(Y,1),1). Adding covariates (C) is Xnul= hcat(ones(n),C) where size(C)=(c,n) for n = size(Y,1).
LogP : Boolean. Default is false. Returns $-\log_{10}{P}$ instead of LOD scores if true.

Output

LODs (or logP) : LOD scores. Can change to $- \log_{10}{P}$ in lod2logP if LogP = true.
B : A 3-d array of B (fixed effects) matrices under H1: existence of QTL. If covariates are added to Xnul : Xnul= [ones(size(Y,1)) Covariates], ex. For sex covariates in 4-way cross analysis, B[2,:,100], B[3:5,:,100] are effects for sex, the rest genotypes of the 100th QTL, respectively.
est0 : A type of MLM.Estimat including parameter estimates under H0: no QTL.

source

FlxQTL.flxMLM.mlm2Scan — Function

 mlm2Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;Xnul::Matrix{Float64}=ones(size(Y,1),1))
 mlm2Scan(cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;Xnul::Matrix{Float64}=ones(size(Y,1),1))

Implement 2d-genome scan. The second mlm2Scan() is for Z=I case; one can also run the first by inserting an identity matrix (I) into Z. math vec(Y) \sim MVN ( (Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I), where size(Y)=(n,m), size(X)=(n,p), size(Z)=(m,q).

Arguments

cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
Y : A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in arguement. i.e. Y0[:,1] (a vector) ->Y[:,[1]] (a matrix) .
XX : A type of Markers. Be cautious when combining genotype infomation into the struct of Markers; size(X) = (n,p).
Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If nothing to insert in Z, just exclude it or insert an identity matrix, Matrix(1.0I,m,m). m traits x q phenotypic covariates.
reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented if true.

Keyword Arguments

Xnul : A matrix of covariates. Default is intercepts (1's). Unless adding covariates, just leave as it is. See mlm1Scan.

Output

LODs : LOD scores. Can change to $- \log_{10}{P}$ using lod2logP.
est0 : A type of MLM.Estimat including parameter estimates under H0: no QTL.

source

FlxQTL.flxMLM.mlmTest — Function

 mlmTest(nperm::Int64,cross::Int64,Y::Matrix{Float64},XX::Markers,Z::Matrix{Float64},reml::Bool=false;
          Xnul::Matrix{Float64}=ones(size(Y,1),1),pval=[0.05 0.01])
 mlmTest(nperm::Int64,cross::Int64,Y::Matrix{Float64},XX::Markers,reml::Bool=false;
          Xnul::Matrix{Float64}=ones(size(Y,1),1),pval=[0.05 0.01])

Implement permutation test to get thresholds at the levels of type 1 error, α. The second mlmTest() is for Z=I case; one can also run the first by inserting an identity matrix (I) into Z. math vec(Y) \sim MVN ( (Z \otimes X)vec(B) (or XBZ'), \Sigma \otimes I), where size(Y)=(n,m), size(X)=(n,p), size(Z)=(m,q).

Arguments

nperm : An integer indicating the number of permutation to be implemented.
cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc. This value is related to degree of freedom when doing genome scan.
Y : A n x m matrix of response variables, i.e. n individuals (or lines) by m traits (or environments). For univariate phenotypes, use square brackets in arguement. i.e. Y[:,1] (a vector) -> Y[:,[1]] (a matrix) .
XX : A type of Markers. Be cautious when combining genotype infomation into the struct of Markers; size(X) = (n,p).
reml: Boolean. Default is fitting the model via mle. Resitricted MLE is implemented if true.

Keyword Arguments

pval : A vector of p-values to get their quantiles. Default is [0.05 0.01] (without comma).
Xnul : A matrix of covariates. Default is intercepts (1's). Unless plugging in particular covariates, just leave as it is.
Z : An optional m x q matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). Default is an identity matrix for the dimension of m traits x q phenotypic covariates.

Output

maxLODs : A nperm x 1 vector of maximal LOD scores by permutation.
H1par_perm : A type of struct, MLM.Estimat(B,Σ,loglik) including parameter estimates for a MLM under H0: no QTL by permutation.
cutoff : A vector of thresholds corresponding to pval.

source

FlxQTL.MLM — Module

MLM

A module for fitting general Multivariate Linear Models motivated by functional data analysis via mle or reml. The default fitting method is mle. ( i.e. reml=false)

The model: $Y=XBZ'+E$, where $E(Y)=XBZ'$ (or $E(vec(Y))= (Z \otimes X)vec(B)$ ), $var(vec(E))=\Sigma \otimes I.$ size(Y)=(n,m), size(X)=(n,p), size(Z)=(m,q).

source

FlxQTL.MLM.Estimat — Type

 Estimat(B::Array{Float64,2},Σ::Array{Float64,2},loglik::Float64)

A struct of arrays for results by fitting a multivariate linear model, mGLM(). The results are B(fixed effects), Σ (m x m covariance matrix), loglik(a value of log-likelihood by mle or reml).

source

FlxQTL.MLM.mGLM — Function

  mGLM(Y::Array{Float64,2},X::Array{Float64,2},Z::Array{Float64,2},reml::Bool=false)
  mGLM(Y::Array{Float64,2},X::Array{Float64,2},reml::Bool=false)

Fitting multivariate General Linear Models via MLE (or REML) and returns a type of a struct Estimat.

Arguments

Y : A matrix of response variables, i.e. traits. size(Y)=(n,m) for n individuals x m traits
X : A matrix of independent variables, i.e. genotypes or genotype probabilities including intercept or/and covariates. size(X)=(n,p) for n individuals x p markers including intercept or/and covariates
Z : An optional matrix of low-dimensional phenotypic covariates, i.e. contrasts, basis functions (fourier, wavelet, polynomials, B-splines, etc.). If nothing to insert in Z, just exclude it or insert Matrix(1.0I,m,m). size(Z)=(m,q) for m traits x q phenotypic covariates.
reml : Boolean. Default is fitting the model via mle. Resitricted MLE is implemented if true.

Output

Returns Estimat .

source

Utility Functions (Util)

FlxQTL.Util — Module

 Util

A module for utility functions.

source

FlxQTL.Util.Markers — Type

Markers(name::Array{String,1},chr::Array{Any,1},pos::Array{Float64,1},X::Array{Float64,2})

A struct of arrays creating genotype or genotype probability data for genome scan.

Arguments

name : A vector of marker names
chr : A vector of Chromosomes
pos : A vector of marker positions (cM)
X : A matrix of genotypes or genotype probabilities

source

FlxQTL.Util.Y_huber — Method

Y_huber(Y::Array{Float64,2})

Rescale Y (phenotype or trait data) to be less sensitive to outliers using by Huber loss function and MAD (median absolute deviation). size(Y)=(m,n) for m trait and n individuals.

source

FlxQTL.Util.array2mat — Method

 array2mat(cross::Int64,X0::Array{Float64,3})

Returns a 3-d array to a matrix of genotype probabilities. size(X0)=(cross,n,p) –> (p1,n), where p1 = cross*p for p markers, cross alleles or genotypes, and n individuals. See mat2array.

source

FlxQTL.Util.getFinoidx — Method

getFinoidx(phenoData::Array{Union{Missing,Float64},2})

Attains indices of phenotype data without missing values.

Argument

phenoData : A matrix of phenotype (or trait) data including missing values. size(phenoData) = (m,n) for m traits and n individuals.

source

FlxQTL.Util.getGenoidx — Function

getGenoidx(GenoData::Union{Array{Any,2},Array{Float64,2}},maf::Float64=0.025)

Attains genotype indices to drop correlated or bad markers.

Arguments

GenoData : A matrix of genotype data. size(GenoData)= (p,n) for p markers and n individuals.
maf : A scalar for dropping criterion of markers. Default is 0.025 i.e. markers of MAF < 0.025 are dropped.

source

FlxQTL.Util.lod2logP — Method

lod2logP(LODs::Union{Array{Float64,1},Array{Any,1}},v::Int64)

Caculates $-\log_{10}{P}$ from LOD scores.

Arguments

LODs : A vector of LOD scores computed from genome scan.
v : Degrees of freedom for Chi-squared distribution.

!!! NOTE

To compute degrees of freedom for 1D-genome scan (geneScan), v = prod(size(B1))-prod(size(B0), where B1 and B0 are effect size matrices under H1 of a full model (intercept + covariates + QTL) and H0 of no QTL (intercept + covariates), respectively.

source

FlxQTL.Util.mat2array — Method

 mat2array(cross::Int64,X0)

Returns a matrix of genotype probabilities to 3-d array. size(X0)=(p1,n) –> (cross,n,p), where p1 = cross*p for p markers, cross alleles or genotypes, and n individuals.

Argument

cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
X0 : A matrix of genotype probability data computed from r/qtl or r/qtl2.

See array2mat.

source

FlxQTL.Util.mat2vec — Method

 mat2vec(mat)

Stacks a matrix to a vector, i.e. vectorizing a matrix.

source

FlxQTL.Util.newMarkers — Function

newMarkers(XX::Markers,cross::Int64,cM::Int64=2)

Returns a struct of Markers by keeping only markers positioned in every cM centimorgans for 2-d genome scan to avoid singularity.

Arguments

XX : A type of Markers. See Markers.
cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
cM : An integer of dropping criterion of markers. Default is 2, i.e. keeping only markers in every 2 cM, or dropping markers within 2cM between 2 markers.

source

FlxQTL.Util.ordrMarkers — Method

ordrMarkers(markers::Array{Any,2})

Rearrange by CPU parallelization marker information composed of marker name, chr, position obtained from rqtl2, which is not listed in order (excluding X chromosome).

Argument

markers : An array of marker information.

source

FlxQTL.Util.setSeed — Method

setSeed(seedNum::Int64)

Assigns different numbers of seeds to workers (or processes) for reproducibility.

Arguments

seedNum: An integer. A minimum seed number to assign a worker. For distributed computing, seed numbers are generated for multiple workers by increasing 1, e.g. seedNum = 123 & 10 workers, setSeed generates seeds from 123 to 132 and assigns to corresponding workers (processes).

Examples

julia> using Distributed
julia> addprocs(10)
julia> @everywhere using flxQTL
julia> setSeed(123)

source

FlxQTL.Util.sortBycM — Function

    sortBycM(Chr::Any,XX::Markers,cross::Int64,cM::Int64=2)

Returns marker indices in Chromosome Chr and the corresponding genotype probabilities keeping only markers positioned in every cM centimorgans for 2-d genome scan to avoid singularity.

Arguments

Chr : A type of Any indicating a particular chromosome to sort markers out.
XX : A type of Markers.
cross : An integer indicating the number of alleles or genotypes. Ex. 2 for RIF, 4 for four-way cross, 8 for HS mouse (allele probabilities), etc.
cM : An integer of dropping criterion of markers. Default is 2, i.e. keeping only markers in every 2 cM, or dropping markers within 2cM between 2 markers.