Skip to content

Functions for matrices

BigRiverJunbi.check_mad Method
julia
check_mad(mat::Matrix{<:Real}; dims::Int = 2)

Checks if the MAD (median absolute deviation) is zero for each column of a matrix. If it is, then errors and displays the list of columns with zero MAD.

Arguments

  • mat: The matrix to check the MAD for.
source
BigRiverJunbi.check_mad Method
julia
check_mad(x::Vector{<:Real})

Checks if the MAD (median absolute deviation) is zero for a vector. If it is, then errors.

Arguments

  • x: The vector to check the MAD for.
source
BigRiverJunbi.huberize Method
julia
huberize(
    mat::Matrix{<:Real}; alpha::Real = 1,
    error_on_zero_mad::Bool = true
)

Performs Huberization for sample intensities.

Arguments

  • mat: The matrix to normalize.

  • alpha: The alpha parameter for Huberization. Default is 1.

  • error_on_zero_mad: Whether to throw an error if the MAD is zero. Default is true.

Warning

If you set error_on_zero_mad to false, this function will return a result with NaN values if the MAD is zero. This can be useful if you are expecting this behavior and want to handle it yourself, but should be used with caution.

Examples

julia
julia> mat = [0.5 1 2 3 3.5;
              7 3 5 1.5 4.5;
              8 2 7 6 9]
3×5 Matrix{Float64}:
 0.5  1.0  2.0  3.0  3.5
 7.0  3.0  5.0  1.5  4.5
 8.0  2.0  7.0  6.0  9.0

julia> BigRiverJunbi.huberize(mat)
3×5 Matrix{Float64}:
 2.86772  1.0  2.0002  3.0      3.5
 7.0      3.0  5.0     1.5      4.5
 8.0      2.0  7.0     5.89787  7.83846
source
BigRiverJunbi.huberize Method
julia
huberize(
    x::Vector{<:Real}; alpha::Real = 1,
    error_on_zero_mad::Bool = true
)

Performs Huberization for a single vector.

Arguments

  • x: The vector to Huberize.

  • alpha: The alpha parameter for the Huberization. Default is 1.0.

  • error_on_zero_mad: Whether to throw an error if the MAD is zero. Default is true.

Warning

If you set error_on_zero_mad to false, this function will return a result with NaN values if the MAD is zero. This can be useful if you are expecting this behavior and want to handle it yourself, but should be used with caution.

source
BigRiverJunbi.huberloss Method
julia
huberloss(x::Real; alpha::Real = 1)

Computes the Huber loss for a given value. This is defined as:

L(x)={12x2if |x|αα(|x|α22)if |x|>α

Arguments

  • x: The value to compute the Huber loss for.

  • alpha: The alpha parameter for the Huber loss. Default is 1.

source
BigRiverJunbi.imputeKNN Function
julia
imputeKNN(
    data::AbstractMatrix{Union{Missing, Float64}},
    k::Int = 1;
    threshold::Float64 = 0.5,
    dims::Union{Nothing, Int} = nothing,
    distance::NearestNeighbors.MinkowskiMetric = Euclidean()
)

Replaces missing elements based on k-nearest neighbors (KNN). Returns a new matrix without modifying the original matrix. This method is almost an exact copy of the KNN imputation method from Impute.jl.

Arguments

  • data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.

  • k: number of nearest neighbors to use for imputation.

  • threshold: threshold for the number of missing neighbors above which the imputation is skipped.

  • dims: dimension along which the statistic is calculated.

  • distance: distance metric to use for the nearest neighbors search, taken from Distances.jl. Default is Euclidean(). This can only be one of the Minkowski metrics i.e. Euclidean, Cityblock, Minkowski and Chebyshev.

source
BigRiverJunbi.imputeKNN! Method
julia
imputeKNN!(
    data::AbstractMatrix{Union{Missing, Float64}},
    k::Int, threshold::Float64, dims::Union{Nothing, Int},
    distance::NearestNeighbors.MinkowskiMetric
)

Replaces missing elements based on k-nearest neighbors (KNN) imputation. Writes the result back to the original matrix. This method is almost an exact copy of the KNN imputation method from Impute.jl.

Arguments

  • data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.

  • k: number of nearest neighbors to use for imputation.

  • threshold: threshold for the number of missing neighbors above which the imputation is skipped.

  • dims: dimension along which the statistic is calculated.

  • distance: distance metric to use for the nearest neighbors search, taken from Distances.jl. Default is Euclidean(). This can only be one of the Minkowski metrics i.e. Euclidean, Cityblock, Minkowski and Chebyshev.

source
BigRiverJunbi.impute_QRILC Method
julia
impute_QRILC(
    data::Matrix{<:Union{Missing, Real}};
    tune_sigma::Float64 = 1.0,
    eps::Float64 = 0.005
)

Returns imputated matrix based on the "Quantile regression Imputation for left-censored data" (QRILC) method. The function is based on the function impute.QRILC from the imputeLCMD R package, with one difference: the default value of eps is set to 0.005 instead of 0.001.

Arguments

  • data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.

  • tune_sigma: coefficient that controls the SD of the MNAR distribution: - 1 if the complete data distribution is supposed to be gaussian. - 0 < tune_sigma < 1 if the complete data distribution is supposed to be left-censored. Default is 1.0.

  • eps: small value added to the quantile for stability.

  • rng: random number generator. Default is Random.default_rng().

source
BigRiverJunbi.impute_half_min! Method
julia
impute_half_min!(data::Matrix{<:Union{Missing, Real}}; dims::Union{Nothing, Int} = nothing)

Replaces missing elements with half of the minimum value of the non-missing elements and writes the result back to the original matrix.

Note

For integer matrices, the half of the minimum value is calculated by integer division i.e. the result of the division is rounded down to the nearest integer if the result is not an integer.

Arguments

  • data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.

  • dims: dimension along which the minimum values are calculated. Default is nothing, which means the whole matrix is used.

source
BigRiverJunbi.impute_half_min Method
julia
impute_half_min(data::Matrix{<:Union{Missing, Real}}; dims::Union{Nothing, Int} = nothing)

Replaces missing elements with half of the minimum value of the non-missing elements and returns a new matrix without modifying the original matrix.

Note

For integer matrices, the half of the minimum value is calculated by integer division i.e. the result of the division is rounded down to the nearest integer if the result is not an integer.

Arguments

  • data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.

  • dims: dimension along which the minimum values are calculated. Default is nothing, which means the whole matrix is used.

source
BigRiverJunbi.impute_median_cat! Method
julia
impute_median_cat!(data::Matrix{<:Union{Missing, Real}})

Imputes missing elements based on a categorical imputation: - 0: Missing values - 1: Values below the median - 2: Values equal to or above the median Writes the result back to the original matrix.

Arguments

  • data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
source
BigRiverJunbi.impute_median_cat Method
julia
impute_median_cat(data::Matrix{<:Union{Missing, Real}})

Imputes missing elements based on a categorical imputation: - 0: Missing values - 1: Values below the median - 2: Values equal to or above the median Returns a new matrix without modifying the original matrix.

Arguments

  • data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
source
BigRiverJunbi.impute_min! Method
julia
impute_min!(data::Matrix{<:Union{Missing, Real}}; dims::Union{Nothing, Int} = nothing)

Replaces missing elements with the minimum value of the non-missing elements and writes the result back to the original matrix.

Arguments

  • data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.

  • dims: dimension along which the minimum values are calculated. Default is nothing, which means the whole matrix is used.

source
BigRiverJunbi.impute_min Method
julia
impute_min(data::Matrix{<:Union{Missing, Real}}; dims::Union{Nothing, Int} = nothing)

Replaces missing elements with the minimum value of the non-missing elements and returns a new matrix without modifying the original matrix.

Arguments

  • data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.

  • dims: dimension along which the minimum values are calculated. Default is nothing, which means the whole matrix is used.

source
BigRiverJunbi.impute_min_prob Function
julia
impute_min_prob(
    data::Matrix{<:Union{Missing, Real}}, q::Float64 = 0.01;
    tune_sigma::Float64 = 1.0, dims::Int = 1,
    rng::AbstractRNG = Random.default_rng()
)

Replaces missing values with random draws from a gaussian distribution centered in the minimum value observed and with standard deviation equal to the median value of the population of line-wise standard deviations. Returns a new matrix without modifying the original matrix.

Arguments

  • data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.

  • q: quantile of the minimum values to use for imputation. Default is 0.01.

  • dims: dimension along which the minimum values are calculated. Default is 1.

  • tune_sigma: coefficient that controls the sd of the MNAR distribution: - 1 if the complete data distribution is supposed to be gaussian. - 0 < tune_sigma < 1 if the complete data distribution is supposed to be left-censored. Default is 1.0.

source
BigRiverJunbi.impute_min_prob! Function
julia
impute_min_prob!(
    data::Matrix{Union{Missing, Float64}}, q = 0.01;
    tune_sigma = 1, dims::Int = 1,
    rng::AbstractRNG = Random.default_rng()
)

Replaces missing values with random draws from a gaussian distribution centered around the minimum value observed and with standard deviation equal to the median value of the population of line-wise standard deviations. Writes the result back to the original matrix.

Arguments

  • data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.

  • q: quantile of the minimum values to use for imputation. Default is 0.01.

  • dims: dimension along which the minimum values are calculated. Default is 1.

  • tune_sigma: coefficient that controls the sd of the MNAR distribution: - 1 if the complete data distribution is supposed to be gaussian. - 0 < tune_sigma < 1 if the complete data distribution is supposed to be left-censored. Default is 1.0.

source
BigRiverJunbi.impute_zero! Method
julia
impute_zero!(data::Matrix{<:Union{Missing, Real}})

Replace missing elements with zero and writes the result back to the original matrix.

Arguments

  • data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
source
BigRiverJunbi.impute_zero Method
julia
impute_zero(data::Matrix{<:Union{Missing, Real}})

Returns a matrix with missing elements replaced with zero without modifying the original matrix.

Arguments

  • data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
source
BigRiverJunbi.intnorm Method
julia
intnorm(mat::Matrix{<:Real}; dims::Int64 = 2, lambda::Real = 1)

Total Area Normalization for each row or column. By default, it normalizes each row. This requires that the matrix has all positive values.

Arguments

  • mat: The matrix to normalize.

  • dims: The dimension to normalize across. Default is 2.

  • lambda: The lambda parameter for the normalization. Default is 1.

Examples

julia
julia> mat = [0.5 1 2 3 3.5;
              7 3 5 1.5 4.5;
              8 2 7 6 9]
3×5 Matrix{Float64}:
 0.5  1.0  2.0  3.0  3.5
 7.0  3.0  5.0  1.5  4.5
 8.0  2.0  7.0  6.0  9.0

julia> BigRiverJunbi.intnorm(mat)
3×5 Matrix{Float64}:
 0.05      0.1       0.2       0.3        0.35
 0.333333  0.142857  0.238095  0.0714286  0.214286
 0.25      0.0625    0.21875   0.1875     0.28125
source
BigRiverJunbi.log_tx Method
julia
log_tx(mat::Matrix{<:Real}; base::Real = 2, constant::Real = 0)

Computes logarithm on a matrix, adding a constant to all values (for instance, to avoid log(0)). Default base is 2, default constant is 0.

Arguments

  • mat: The matrix to transform.

  • base: The base of the logarithm. Default is 2.

  • constant: The constant to add to all values. Default is 0.

Examples

julia
julia> mat = [0.5 1 2 3 3.5;
             7 3 5 0 3.5;
             8 2 5 6 0]
3×5 Matrix{Float64}:
 0.5  1.0  2.0  3.0  3.5
 7.0  3.0  5.0  0.0  3.5
 8.0  2.0  5.0  6.0  0.0

julia> BigRiverJunbi.log_tx(mat; constant = 1)
3×5 Matrix{Float64}:
 0.584963  1.0      1.58496  2.0      2.16993
 3.0       2.0      2.58496  0.0      2.16993
 3.16993   1.58496  2.58496  2.80735  0.0
source
BigRiverJunbi.meancenter_tx Function
julia
meancenter_tx(mat::Matrix{Float64}, dims::Int64 = 1)

Mean center a matrix across the specified dimension. This requires that the matrix has all positive values.

Arguments

  • mat: The matrix to transform.

  • dims: The dimension to mean center across. Default is 1.

Examples

julia
julia> mat = [0.5 1 2 3 3.5;
             7 3 5 0 3.5;
             8 2 5 6 0]
3×5 Matrix{Float64}:
 0.5  1.0  2.0  3.0  3.5
 7.0  3.0  5.0  0.0  3.5
 8.0  2.0  5.0  6.0  0.0

julia> BigRiverJunbi.meancenter_tx(mat)
3×5 Matrix{Float64}:
 -4.66667  -1.0  -2.0   0.0   1.16667
  1.83333   1.0   1.0  -3.0   1.16667
  2.83333   0.0   1.0   3.0  -2.33333
source
BigRiverJunbi.pqnorm Method
julia
pqnorm(mat::Matrix{<:Real}; lambda::Real = 1)

Performs a probabilistic quotient normalization (PQN) for sample intensities. This assumes that the matrix is organized as samples x features and requires that the matrix have all positive values.

Arguments

  • mat: The matrix to normalize.

Examples

julia
julia> mat = [0.5 1 2 3 3.5;
              7 3 5 1.5 4.5;
              8 2 7 6 9]
3×5 Matrix{Float64}:
 0.5  1.0  2.0  3.0  3.5
 7.0  3.0  5.0  1.5  4.5
 8.0  2.0  7.0  6.0  9.0

julia> BigRiverJunbi.pqnorm(mat)
3×5 Matrix{Float64}:
 0.05     0.1      0.2      0.3       0.35
 0.30625  0.13125  0.21875  0.065625  0.196875
 0.25     0.0625   0.21875  0.1875    0.28125
source
BigRiverJunbi.quantilenorm Method
julia
quantilenorm(data::Matrix{<:Real})

Performs quantile normalization for sample intensities. This assumes that the matrix is organized as samples x features.

Arguments

  • data: The matrix to normalize.

Examples

julia
julia> mat = [0.5 1 2 3 3.5;
              7 3 5 1.5 4.5;
              8 2 7 6 9]
3×5 Matrix{Float64}:
 0.5  1.0  2.0  3.0  3.5
 7.0  3.0  5.0  1.5  4.5
 8.0  2.0  7.0  6.0  9.0

julia> BigRiverJunbi.quantilenorm(mat)
3×5 Matrix{Float64}:
 1.7  1.7  1.7  4.3  1.7
 4.3  6.6  4.3  1.7  4.3
 6.6  4.3  6.6  6.6  6.6
source
BigRiverJunbi.standardize Method
julia
standardize(mat::Matrix{<:Real}; center::Bool = true)

Standardize a matrix i.e. scale to unit variance, with the option of centering or not.

Arguments

  • mat: The matrix to standardize.

  • center: Whether to center the data. Default is true.

source
BigRiverJunbi.substitute! Method
julia
substitute!(
    data::AbstractArray{<:Union{Missing, Real}},
    statistic::Function;
    dims::Union{Nothing, Int} = nothing
)

Substitutes missing values with the value calculated by the statistic function along the specified dimension and writes the result back to the original array.

Arguments

  • data: array of values. One example: matrix of metabolomics data, where the rows are the features and the columns are the samples.

  • statistic: function that calculates the value to substitute the missing values. The function must return a value of the same type as the data.

  • dims: dimension along which the statistic is calculated.

source
BigRiverJunbi.substitute Method
julia
substitute(
    data::AbstractArray{<:Union{Missing, Real}},
    statistic::Function;
    dims::Union{Nothing, Int} = nothing
)

Substitutes missing values with the value calculated by the statistic function along the specified dimension and returns a new array without modifying the original array.

Arguments

  • data: array of values. One example: matrix of metabolomics data, where the rows are the features and the columns are the samples.

  • statistic: function that calculates the value to substitute the missing values. The function must return a value of the same type as the data.

  • dims: dimension along which the statistic is calculated.

source