Functions for matrices

Imputation

julia

imputeKNN(
    data::AbstractMatrix{Union{Missing, Float64}},
    k::Int = 1;
    threshold::Float64 = 0.5,
    dims::Union{Nothing, Int} = nothing,
    distance::NearestNeighbors.MinkowskiMetric = Euclidean()
)

Replaces missing elements based on k-nearest neighbors (KNN). Returns a new matrix without modifying the original matrix. This method is almost an exact copy of the KNN imputation method from Impute.jl.

Arguments

data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
k: number of nearest neighbors to use for imputation.
threshold: threshold for the number of missing neighbors above which the imputation is skipped.
dims: dimension along which the statistic is calculated.
distance: distance metric to use for the nearest neighbors search, taken from Distances.jl. Default is Euclidean(). This can only be one of the Minkowski metrics i.e. Euclidean, Cityblock, Minkowski and Chebyshev.

source

BigRiverJunbi.imputeKNN! Method

julia

imputeKNN!(
    data::AbstractMatrix{Union{Missing, Float64}},
    k::Int, threshold::Float64, dims::Union{Nothing, Int},
    distance::NearestNeighbors.MinkowskiMetric
)

Replaces missing elements based on k-nearest neighbors (KNN) imputation. Writes the result back to the original matrix. This method is almost an exact copy of the KNN imputation method from Impute.jl.

Arguments

data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
k: number of nearest neighbors to use for imputation.
threshold: threshold for the number of missing neighbors above which the imputation is skipped.
dims: dimension along which the statistic is calculated.
distance: distance metric to use for the nearest neighbors search, taken from Distances.jl. Default is Euclidean(). This can only be one of the Minkowski metrics i.e. Euclidean, Cityblock, Minkowski and Chebyshev.

source

BigRiverJunbi.impute_QRILC Method

julia

impute_QRILC(
    data::Matrix{<:Union{Missing, Real}};
    tune_sigma::Float64 = 1.0,
    eps::Float64 = 0.005
)

Returns imputated matrix based on the "Quantile regression Imputation for left-censored data" (QRILC) method. The function is based on the function impute.QRILC from the imputeLCMD R package, with one difference: the default value of eps is set to 0.005 instead of 0.001.

Arguments

data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
tune_sigma: coefficient that controls the SD of the MNAR distribution: - 1 if the complete data distribution is supposed to be gaussian. - 0 < tune_sigma < 1 if the complete data distribution is supposed to be left-censored. Default is 1.0.
eps: small value added to the quantile for stability.
rng: random number generator. Default is Random.default_rng().

source

BigRiverJunbi.impute_half_min! Method

julia

impute_half_min!(data::Matrix{<:Union{Missing, Real}}; dims::Union{Nothing, Int} = nothing)

Replaces missing elements with half of the minimum value of the non-missing elements and writes the result back to the original matrix.

Note

For integer matrices, the half of the minimum value is calculated by integer division i.e. the result of the division is rounded down to the nearest integer if the result is not an integer.

Arguments

data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
dims: dimension along which the minimum values are calculated. Default is nothing, which means the whole matrix is used.

source

BigRiverJunbi.impute_half_min Method

julia

impute_half_min(data::Matrix{<:Union{Missing, Real}}; dims::Union{Nothing, Int} = nothing)

Replaces missing elements with half of the minimum value of the non-missing elements and returns a new matrix without modifying the original matrix.

Note

For integer matrices, the half of the minimum value is calculated by integer division i.e. the result of the division is rounded down to the nearest integer if the result is not an integer.

Arguments

data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
dims: dimension along which the minimum values are calculated. Default is nothing, which means the whole matrix is used.

source

BigRiverJunbi.impute_median_cat! Method

julia

impute_median_cat!(data::Matrix{<:Union{Missing, Real}})

Imputes missing elements based on a categorical imputation: - 0: Missing values - 1: Values below the median - 2: Values equal to or above the median Writes the result back to the original matrix.

Arguments

data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.

source

BigRiverJunbi.impute_median_cat Method

julia

impute_median_cat(data::Matrix{<:Union{Missing, Real}})

Imputes missing elements based on a categorical imputation: - 0: Missing values - 1: Values below the median - 2: Values equal to or above the median Returns a new matrix without modifying the original matrix.

Arguments

data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.

source

BigRiverJunbi.impute_min! Method

julia

impute_min!(data::Matrix{<:Union{Missing, Real}}; dims::Union{Nothing, Int} = nothing)

Replaces missing elements with the minimum value of the non-missing elements and writes the result back to the original matrix.

Arguments

data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
dims: dimension along which the minimum values are calculated. Default is nothing, which means the whole matrix is used.

source

BigRiverJunbi.impute_min Method

julia

impute_min(data::Matrix{<:Union{Missing, Real}}; dims::Union{Nothing, Int} = nothing)

Replaces missing elements with the minimum value of the non-missing elements and returns a new matrix without modifying the original matrix.

Arguments

data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
dims: dimension along which the minimum values are calculated. Default is nothing, which means the whole matrix is used.

source

BigRiverJunbi.impute_min_prob Function

julia

impute_min_prob(
    data::Matrix{<:Union{Missing, Real}}, q::Float64 = 0.01;
    tune_sigma::Float64 = 1.0, dims::Int = 1,
    rng::AbstractRNG = Random.default_rng()
)

Replaces missing values with random draws from a gaussian distribution centered in the minimum value observed and with standard deviation equal to the median value of the population of line-wise standard deviations. Returns a new matrix without modifying the original matrix.

Arguments

data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
q: quantile of the minimum values to use for imputation. Default is 0.01.
dims: dimension along which the minimum values are calculated. Default is 1.
tune_sigma: coefficient that controls the sd of the MNAR distribution: - 1 if the complete data distribution is supposed to be gaussian. - 0 < tune_sigma < 1 if the complete data distribution is supposed to be left-censored. Default is 1.0.

source

BigRiverJunbi.impute_min_prob! Function

julia

impute_min_prob!(
    data::Matrix{Union{Missing, Float64}}, q = 0.01;
    tune_sigma = 1, dims::Int = 1,
    rng::AbstractRNG = Random.default_rng()
)

Replaces missing values with random draws from a gaussian distribution centered around the minimum value observed and with standard deviation equal to the median value of the population of line-wise standard deviations. Writes the result back to the original matrix.

Arguments

data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
q: quantile of the minimum values to use for imputation. Default is 0.01.
dims: dimension along which the minimum values are calculated. Default is 1.
tune_sigma: coefficient that controls the sd of the MNAR distribution: - 1 if the complete data distribution is supposed to be gaussian. - 0 < tune_sigma < 1 if the complete data distribution is supposed to be left-censored. Default is 1.0.

source

BigRiverJunbi.impute_zero! Method

julia

impute_zero!(data::Matrix{<:Union{Missing, Real}})

Replace missing elements with zero and writes the result back to the original matrix.

Arguments

data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.

source

BigRiverJunbi.impute_zero Method

julia

impute_zero(data::Matrix{<:Union{Missing, Real}})

Returns a matrix with missing elements replaced with zero without modifying the original matrix.

Arguments

data: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.

source

BigRiverJunbi.substitute! Method

julia

substitute!(
    data::AbstractArray{<:Union{Missing, Real}},
    statistic::Function;
    dims::Union{Nothing, Int} = nothing
)

Substitutes missing values with the value calculated by the statistic function along the specified dimension and writes the result back to the original array.

Arguments

data: array of values. One example: matrix of metabolomics data, where the rows are the features and the columns are the samples.
statistic: function that calculates the value to substitute the missing values. The function must return a value of the same type as the data.
dims: dimension along which the statistic is calculated.

source

BigRiverJunbi.substitute Method

julia

substitute(
    data::AbstractArray{<:Union{Missing, Real}},
    statistic::Function;
    dims::Union{Nothing, Int} = nothing
)

Substitutes missing values with the value calculated by the statistic function along the specified dimension and returns a new array without modifying the original array.

Arguments

data: array of values. One example: matrix of metabolomics data, where the rows are the features and the columns are the samples.
statistic: function that calculates the value to substitute the missing values. The function must return a value of the same type as the data.
dims: dimension along which the statistic is calculated.

source

Normalization

BigRiverJunbi.huberize Method

julia

huberize(
    mat::Matrix{<:Real}; alpha::Real = 1,
    error_on_zero_mad::Bool = true
)

Performs Huberization for sample intensities.

Arguments

mat: The matrix to normalize.
alpha: The alpha parameter for Huberization. Default is 1.
error_on_zero_mad: Whether to throw an error if the MAD is zero. Default is true.

Warning

If you set error_on_zero_mad to false, this function will return a result with NaN values if the MAD is zero. This can be useful if you are expecting this behavior and want to handle it yourself, but should be used with caution.

Example

julia

julia> mat = [0.5 1 2 3 3.5;
              7 3 5 1.5 4.5;
              8 2 7 6 9]
3×5 Matrix{Float64}:
 0.5  1.0  2.0  3.0  3.5
 7.0  3.0  5.0  1.5  4.5
 8.0  2.0  7.0  6.0  9.0

julia> BigRiverJunbi.huberize(mat)
3×5 Matrix{Float64}:
 2.86772  1.0  2.0002  3.0      3.5
 7.0      3.0  5.0     1.5      4.5
 8.0      2.0  7.0     5.89787  7.83846

source

BigRiverJunbi.huberize Method

julia

huberize(
    x::Vector{<:Real}; alpha::Real = 1,
    error_on_zero_mad::Bool = true
)

Performs Huberization for a single vector.

Arguments

x: The vector to Huberize.
alpha: The alpha parameter for the Huberization. Default is 1.0.
error_on_zero_mad: Whether to throw an error if the MAD is zero. Default is true.

Warning

source

BigRiverJunbi.huberloss Method

julia

huberloss(x::Real; alpha::Real = 1)

Computes the Huber loss for a given value. This is defined as:

L (x) = {\begin{cases} \frac{1}{2} x^{2} & if | x | \leq α \\ α (| x | - \frac{α^{2}}{2}) & if | x | > α \end{cases}

Arguments

x: The value to compute the Huber loss for.
alpha: The alpha parameter for the Huber loss. Default is 1.

source

BigRiverJunbi.intnorm Method

julia

intnorm(mat::Matrix{<:Real}; dims::Int64 = 2, lambda::Real = 1)

Total Area Normalization for each row or column. By default, it normalizes each row. This requires that the matrix has all positive values.

Arguments

mat: The matrix to normalize.
dims: The dimension to normalize across. Default is 2.
lambda: The lambda parameter for the normalization. Default is 1.

Example

julia

julia> mat = [0.5 1 2 3 3.5;
              7 3 5 1.5 4.5;
              8 2 7 6 9]
3×5 Matrix{Float64}:
 0.5  1.0  2.0  3.0  3.5
 7.0  3.0  5.0  1.5  4.5
 8.0  2.0  7.0  6.0  9.0

julia> BigRiverJunbi.intnorm(mat)
3×5 Matrix{Float64}:
 0.05      0.1       0.2       0.3        0.35
 0.333333  0.142857  0.238095  0.0714286  0.214286
 0.25      0.0625    0.21875   0.1875     0.28125

source

BigRiverJunbi.pqnorm Method

julia

pqnorm(mat::Matrix{<:Real}; lambda::Real = 1)

Performs a probabilistic quotient normalization (PQN) for sample intensities. This assumes that the matrix is organized as samples x features and requires that the matrix have all positive values.

Arguments

mat: The matrix to normalize.

Example

julia

julia> mat = [0.5 1 2 3 3.5;
              7 3 5 1.5 4.5;
              8 2 7 6 9]
3×5 Matrix{Float64}:
 0.5  1.0  2.0  3.0  3.5
 7.0  3.0  5.0  1.5  4.5
 8.0  2.0  7.0  6.0  9.0

julia> BigRiverJunbi.pqnorm(mat)
3×5 Matrix{Float64}:
 0.05     0.1      0.2      0.3       0.35
 0.30625  0.13125  0.21875  0.065625  0.196875
 0.25     0.0625   0.21875  0.1875    0.28125

source

BigRiverJunbi.quantilenorm Method

julia

quantilenorm(data::Matrix{<:Real})

Performs quantile normalization for sample intensities. This assumes that the matrix is organized as samples x features.

Arguments

data: The matrix to normalize.

Example

julia

julia> mat = [0.5 1 2 3 3.5;
              7 3 5 1.5 4.5;
              8 2 7 6 9]
3×5 Matrix{Float64}:
 0.5  1.0  2.0  3.0  3.5
 7.0  3.0  5.0  1.5  4.5
 8.0  2.0  7.0  6.0  9.0

julia> BigRiverJunbi.quantilenorm(mat)
3×5 Matrix{Float64}:
 1.7  1.7  1.7  4.3  1.7
 4.3  6.6  4.3  1.7  4.3
 6.6  4.3  6.6  6.6  6.6

source

BigRiverJunbi.standardize Method

julia

standardize(mat::Matrix{<:Real}; center::Bool = true)

Standardize a matrix i.e. scale to unit variance, with the option of centering or not.

Arguments

mat: The matrix to standardize.
center: Whether to center the data. Default is true.

source

Transformation

BigRiverJunbi.log_tx Method

julia

log_tx(mat::Matrix{<:Real}; base::Real = 2, constant::Real = 0)

Computes logarithm on a matrix, adding a constant to all values (for instance, to avoid log(0)). Default base is 2, default constant is 0.

Arguments

mat: The matrix to transform.
base: The base of the logarithm. Default is 2.
constant: The constant to add to all values. Default is 0.

Example

julia

julia> mat = [0.5 1 2 3 3.5;
             7 3 5 0 3.5;
             8 2 5 6 0]
3×5 Matrix{Float64}:
 0.5  1.0  2.0  3.0  3.5
 7.0  3.0  5.0  0.0  3.5
 8.0  2.0  5.0  6.0  0.0

julia> BigRiverJunbi.log_tx(mat; constant = 1)
3×5 Matrix{Float64}:
 0.584963  1.0      1.58496  2.0      2.16993
 3.0       2.0      2.58496  0.0      2.16993
 3.16993   1.58496  2.58496  2.80735  0.0

source

BigRiverJunbi.meancenter_tx Function

julia

meancenter_tx(mat::Matrix{Float64}, dims::Int64 = 1)

Mean center a matrix across the specified dimension. This requires that the matrix has all positive values.

Arguments

mat: The matrix to transform.
dims: The dimension to mean center across. Default is 1.

Example

julia

julia> mat = [0.5 1 2 3 3.5;
             7 3 5 0 3.5;
             8 2 5 6 0]
3×5 Matrix{Float64}:
 0.5  1.0  2.0  3.0  3.5
 7.0  3.0  5.0  0.0  3.5
 8.0  2.0  5.0  6.0  0.0

julia> BigRiverJunbi.meancenter_tx(mat)
3×5 Matrix{Float64}:
 -4.66667  -1.0  -2.0   0.0   1.16667
  1.83333   1.0   1.0  -3.0   1.16667
  2.83333   0.0   1.0   3.0  -2.33333

source

Functions for matrices ​

Imputation ​

Normalization ​

Transformation ​

Functions for matrices

Imputation

Normalization

Transformation