Functions for matrices
BigRiverJunbi.check_mad
BigRiverJunbi.check_mad
BigRiverJunbi.huberize
BigRiverJunbi.huberize
BigRiverJunbi.huberloss
BigRiverJunbi.imputeKNN
BigRiverJunbi.imputeKNN!
BigRiverJunbi.impute_QRILC
BigRiverJunbi.impute_half_min
BigRiverJunbi.impute_half_min!
BigRiverJunbi.impute_median_cat
BigRiverJunbi.impute_median_cat!
BigRiverJunbi.impute_min
BigRiverJunbi.impute_min!
BigRiverJunbi.impute_min_prob
BigRiverJunbi.impute_min_prob!
BigRiverJunbi.impute_zero
BigRiverJunbi.impute_zero!
BigRiverJunbi.intnorm
BigRiverJunbi.log_tx
BigRiverJunbi.meancenter_tx
BigRiverJunbi.pqnorm
BigRiverJunbi.quantilenorm
BigRiverJunbi.standardize
BigRiverJunbi.substitute
BigRiverJunbi.substitute!
BigRiverJunbi.check_mad Method
check_mad(mat::Matrix{<:Real}; dims::Int = 2)
Checks if the MAD (median absolute deviation) is zero for each column of a matrix. If it is, then errors and displays the list of columns with zero MAD.
Arguments
mat
: The matrix to check the MAD for.
BigRiverJunbi.check_mad Method
check_mad(x::Vector{<:Real})
Checks if the MAD (median absolute deviation) is zero for a vector. If it is, then errors.
Arguments
x
: The vector to check the MAD for.
BigRiverJunbi.huberize Method
huberize(
mat::Matrix{<:Real}; alpha::Real = 1,
error_on_zero_mad::Bool = true
)
Performs Huberization for sample intensities.
Arguments
mat
: The matrix to normalize.alpha
: The alpha parameter for Huberization. Default is 1.error_on_zero_mad
: Whether to throw an error if the MAD is zero. Default istrue
.
Warning
If you set error_on_zero_mad
to false
, this function will return a result with NaN values if the MAD is zero. This can be useful if you are expecting this behavior and want to handle it yourself, but should be used with caution.
Examples
julia> mat = [0.5 1 2 3 3.5;
7 3 5 1.5 4.5;
8 2 7 6 9]
3×5 Matrix{Float64}:
0.5 1.0 2.0 3.0 3.5
7.0 3.0 5.0 1.5 4.5
8.0 2.0 7.0 6.0 9.0
julia> BigRiverJunbi.huberize(mat)
3×5 Matrix{Float64}:
2.86772 1.0 2.0002 3.0 3.5
7.0 3.0 5.0 1.5 4.5
8.0 2.0 7.0 5.89787 7.83846
BigRiverJunbi.huberize Method
huberize(
x::Vector{<:Real}; alpha::Real = 1,
error_on_zero_mad::Bool = true
)
Performs Huberization for a single vector.
Arguments
x
: The vector to Huberize.alpha
: The alpha parameter for the Huberization. Default is 1.0.error_on_zero_mad
: Whether to throw an error if the MAD is zero. Default istrue
.
Warning
If you set error_on_zero_mad
to false
, this function will return a result with NaN values if the MAD is zero. This can be useful if you are expecting this behavior and want to handle it yourself, but should be used with caution.
BigRiverJunbi.huberloss Method
huberloss(x::Real; alpha::Real = 1)
Computes the Huber loss for a given value. This is defined as:
Arguments
x
: The value to compute the Huber loss for.alpha
: The alpha parameter for the Huber loss. Default is 1.
BigRiverJunbi.imputeKNN Function
imputeKNN(
data::AbstractMatrix{Union{Missing, Float64}},
k::Int = 1;
threshold::Float64 = 0.5,
dims::Union{Nothing, Int} = nothing,
distance::NearestNeighbors.MinkowskiMetric = Euclidean()
)
Replaces missing elements based on k-nearest neighbors (KNN). Returns a new matrix without modifying the original matrix. This method is almost an exact copy of the KNN imputation method from Impute.jl.
Arguments
data
: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.k
: number of nearest neighbors to use for imputation.threshold
: threshold for the number of missing neighbors above which the imputation is skipped.dims
: dimension along which the statistic is calculated.distance
: distance metric to use for the nearest neighbors search, taken from Distances.jl. Default isEuclidean()
. This can only be one of the Minkowski metrics i.e. Euclidean, Cityblock, Minkowski and Chebyshev.
BigRiverJunbi.imputeKNN! Method
imputeKNN!(
data::AbstractMatrix{Union{Missing, Float64}},
k::Int, threshold::Float64, dims::Union{Nothing, Int},
distance::NearestNeighbors.MinkowskiMetric
)
Replaces missing elements based on k-nearest neighbors (KNN) imputation. Writes the result back to the original matrix. This method is almost an exact copy of the KNN imputation method from Impute.jl.
Arguments
data
: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.k
: number of nearest neighbors to use for imputation.threshold
: threshold for the number of missing neighbors above which the imputation is skipped.dims
: dimension along which the statistic is calculated.distance
: distance metric to use for the nearest neighbors search, taken from Distances.jl. Default isEuclidean()
. This can only be one of the Minkowski metrics i.e. Euclidean, Cityblock, Minkowski and Chebyshev.
BigRiverJunbi.impute_QRILC Method
impute_QRILC(
data::Matrix{<:Union{Missing, Real}};
tune_sigma::Float64 = 1.0,
eps::Float64 = 0.005
)
Returns imputated matrix based on the "Quantile regression Imputation for left-censored data" (QRILC) method. The function is based on the function impute.QRILC
from the imputeLCMD
R package, with one difference: the default value of eps
is set to 0.005 instead of 0.001.
Arguments
data
: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.tune_sigma
: coefficient that controls the SD of the MNAR distribution: - 1 if the complete data distribution is supposed to be gaussian. - 0 < tune_sigma < 1 if the complete data distribution is supposed to be left-censored. Default is 1.0.eps
: small value added to the quantile for stability.rng
: random number generator. Default isRandom.default_rng()
.
BigRiverJunbi.impute_half_min! Method
impute_half_min!(data::Matrix{<:Union{Missing, Real}}; dims::Union{Nothing, Int} = nothing)
Replaces missing elements with half of the minimum value of the non-missing elements and writes the result back to the original matrix.
Note
For integer matrices, the half of the minimum value is calculated by integer division i.e. the result of the division is rounded down to the nearest integer if the result is not an integer.
Arguments
data
: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.dims
: dimension along which the minimum values are calculated. Default is nothing, which means the whole matrix is used.
BigRiverJunbi.impute_half_min Method
impute_half_min(data::Matrix{<:Union{Missing, Real}}; dims::Union{Nothing, Int} = nothing)
Replaces missing elements with half of the minimum value of the non-missing elements and returns a new matrix without modifying the original matrix.
Note
For integer matrices, the half of the minimum value is calculated by integer division i.e. the result of the division is rounded down to the nearest integer if the result is not an integer.
Arguments
data
: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.dims
: dimension along which the minimum values are calculated. Default is nothing, which means the whole matrix is used.
BigRiverJunbi.impute_median_cat! Method
impute_median_cat!(data::Matrix{<:Union{Missing, Real}})
Imputes missing elements based on a categorical imputation: - 0: Missing values - 1: Values below the median - 2: Values equal to or above the median Writes the result back to the original matrix.
Arguments
data
: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
BigRiverJunbi.impute_median_cat Method
impute_median_cat(data::Matrix{<:Union{Missing, Real}})
Imputes missing elements based on a categorical imputation: - 0: Missing values - 1: Values below the median - 2: Values equal to or above the median Returns a new matrix without modifying the original matrix.
Arguments
data
: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
BigRiverJunbi.impute_min! Method
impute_min!(data::Matrix{<:Union{Missing, Real}}; dims::Union{Nothing, Int} = nothing)
Replaces missing elements with the minimum value of the non-missing elements and writes the result back to the original matrix.
Arguments
data
: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.dims
: dimension along which the minimum values are calculated. Default is nothing, which means the whole matrix is used.
BigRiverJunbi.impute_min Method
impute_min(data::Matrix{<:Union{Missing, Real}}; dims::Union{Nothing, Int} = nothing)
Replaces missing elements with the minimum value of the non-missing elements and returns a new matrix without modifying the original matrix.
Arguments
data
: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.dims
: dimension along which the minimum values are calculated. Default is nothing, which means the whole matrix is used.
BigRiverJunbi.impute_min_prob Function
impute_min_prob(
data::Matrix{<:Union{Missing, Real}}, q::Float64 = 0.01;
tune_sigma::Float64 = 1.0, dims::Int = 1,
rng::AbstractRNG = Random.default_rng()
)
Replaces missing values with random draws from a gaussian distribution centered in the minimum value observed and with standard deviation equal to the median value of the population of line-wise standard deviations. Returns a new matrix without modifying the original matrix.
Arguments
data
: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.q
: quantile of the minimum values to use for imputation. Default is 0.01.dims
: dimension along which the minimum values are calculated. Default is 1.tune_sigma
: coefficient that controls the sd of the MNAR distribution: - 1 if the complete data distribution is supposed to be gaussian. - 0 < tune_sigma < 1 if the complete data distribution is supposed to be left-censored. Default is 1.0.
BigRiverJunbi.impute_min_prob! Function
impute_min_prob!(
data::Matrix{Union{Missing, Float64}}, q = 0.01;
tune_sigma = 1, dims::Int = 1,
rng::AbstractRNG = Random.default_rng()
)
Replaces missing values with random draws from a gaussian distribution centered around the minimum value observed and with standard deviation equal to the median value of the population of line-wise standard deviations. Writes the result back to the original matrix.
Arguments
data
: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.q
: quantile of the minimum values to use for imputation. Default is 0.01.dims
: dimension along which the minimum values are calculated. Default is 1.tune_sigma
: coefficient that controls the sd of the MNAR distribution: - 1 if the complete data distribution is supposed to be gaussian. - 0 < tune_sigma < 1 if the complete data distribution is supposed to be left-censored. Default is 1.0.
BigRiverJunbi.impute_zero! Method
impute_zero!(data::Matrix{<:Union{Missing, Real}})
Replace missing elements with zero and writes the result back to the original matrix.
Arguments
data
: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
BigRiverJunbi.impute_zero Method
impute_zero(data::Matrix{<:Union{Missing, Real}})
Returns a matrix with missing elements replaced with zero without modifying the original matrix.
Arguments
data
: matrix of omics value, e.g., metabolomics matrix, where the rows are the samples and the columns are the features.
BigRiverJunbi.intnorm Method
intnorm(mat::Matrix{<:Real}; dims::Int64 = 2, lambda::Real = 1)
Total Area Normalization for each row or column. By default, it normalizes each row. This requires that the matrix has all positive values.
Arguments
mat
: The matrix to normalize.dims
: The dimension to normalize across. Default is 2.lambda
: The lambda parameter for the normalization. Default is 1.
Examples
julia> mat = [0.5 1 2 3 3.5;
7 3 5 1.5 4.5;
8 2 7 6 9]
3×5 Matrix{Float64}:
0.5 1.0 2.0 3.0 3.5
7.0 3.0 5.0 1.5 4.5
8.0 2.0 7.0 6.0 9.0
julia> BigRiverJunbi.intnorm(mat)
3×5 Matrix{Float64}:
0.05 0.1 0.2 0.3 0.35
0.333333 0.142857 0.238095 0.0714286 0.214286
0.25 0.0625 0.21875 0.1875 0.28125
BigRiverJunbi.log_tx Method
log_tx(mat::Matrix{<:Real}; base::Real = 2, constant::Real = 0)
Computes logarithm on a matrix, adding a constant to all values (for instance, to avoid log(0)). Default base is 2, default constant is 0.
Arguments
mat
: The matrix to transform.base
: The base of the logarithm. Default is 2.constant
: The constant to add to all values. Default is 0.
Examples
julia> mat = [0.5 1 2 3 3.5;
7 3 5 0 3.5;
8 2 5 6 0]
3×5 Matrix{Float64}:
0.5 1.0 2.0 3.0 3.5
7.0 3.0 5.0 0.0 3.5
8.0 2.0 5.0 6.0 0.0
julia> BigRiverJunbi.log_tx(mat; constant = 1)
3×5 Matrix{Float64}:
0.584963 1.0 1.58496 2.0 2.16993
3.0 2.0 2.58496 0.0 2.16993
3.16993 1.58496 2.58496 2.80735 0.0
BigRiverJunbi.meancenter_tx Function
meancenter_tx(mat::Matrix{Float64}, dims::Int64 = 1)
Mean center a matrix across the specified dimension. This requires that the matrix has all positive values.
Arguments
mat
: The matrix to transform.dims
: The dimension to mean center across. Default is 1.
Examples
julia> mat = [0.5 1 2 3 3.5;
7 3 5 0 3.5;
8 2 5 6 0]
3×5 Matrix{Float64}:
0.5 1.0 2.0 3.0 3.5
7.0 3.0 5.0 0.0 3.5
8.0 2.0 5.0 6.0 0.0
julia> BigRiverJunbi.meancenter_tx(mat)
3×5 Matrix{Float64}:
-4.66667 -1.0 -2.0 0.0 1.16667
1.83333 1.0 1.0 -3.0 1.16667
2.83333 0.0 1.0 3.0 -2.33333
BigRiverJunbi.pqnorm Method
pqnorm(mat::Matrix{<:Real}; lambda::Real = 1)
Performs a probabilistic quotient normalization (PQN) for sample intensities. This assumes that the matrix is organized as samples x features and requires that the matrix have all positive values.
Arguments
mat
: The matrix to normalize.
Examples
julia> mat = [0.5 1 2 3 3.5;
7 3 5 1.5 4.5;
8 2 7 6 9]
3×5 Matrix{Float64}:
0.5 1.0 2.0 3.0 3.5
7.0 3.0 5.0 1.5 4.5
8.0 2.0 7.0 6.0 9.0
julia> BigRiverJunbi.pqnorm(mat)
3×5 Matrix{Float64}:
0.05 0.1 0.2 0.3 0.35
0.30625 0.13125 0.21875 0.065625 0.196875
0.25 0.0625 0.21875 0.1875 0.28125
BigRiverJunbi.quantilenorm Method
quantilenorm(data::Matrix{<:Real})
Performs quantile normalization for sample intensities. This assumes that the matrix is organized as samples x features.
Arguments
data
: The matrix to normalize.
Examples
julia> mat = [0.5 1 2 3 3.5;
7 3 5 1.5 4.5;
8 2 7 6 9]
3×5 Matrix{Float64}:
0.5 1.0 2.0 3.0 3.5
7.0 3.0 5.0 1.5 4.5
8.0 2.0 7.0 6.0 9.0
julia> BigRiverJunbi.quantilenorm(mat)
3×5 Matrix{Float64}:
1.7 1.7 1.7 4.3 1.7
4.3 6.6 4.3 1.7 4.3
6.6 4.3 6.6 6.6 6.6
BigRiverJunbi.standardize Method
standardize(mat::Matrix{<:Real}; center::Bool = true)
Standardize a matrix i.e. scale to unit variance, with the option of centering or not.
Arguments
mat
: The matrix to standardize.center
: Whether to center the data. Default istrue
.
BigRiverJunbi.substitute! Method
substitute!(
data::AbstractArray{<:Union{Missing, Real}},
statistic::Function;
dims::Union{Nothing, Int} = nothing
)
Substitutes missing values with the value calculated by the statistic function along the specified dimension and writes the result back to the original array.
Arguments
data
: array of values. One example: matrix of metabolomics data, where the rows are the features and the columns are the samples.statistic
: function that calculates the value to substitute the missing values. The function must return a value of the same type as the data.dims
: dimension along which the statistic is calculated.
BigRiverJunbi.substitute Method
substitute(
data::AbstractArray{<:Union{Missing, Real}},
statistic::Function;
dims::Union{Nothing, Int} = nothing
)
Substitutes missing values with the value calculated by the statistic function along the specified dimension and returns a new array without modifying the original array.
Arguments
data
: array of values. One example: matrix of metabolomics data, where the rows are the features and the columns are the samples.statistic
: function that calculates the value to substitute the missing values. The function must return a value of the same type as the data.dims
: dimension along which the statistic is calculated.