Getting Started
Here are some simple examples of converting CSV files to Helium format with various arguments.
Write and Read Helium file
Key function
The function writehe()
requires at least two arguments: the matrix and the file's path. And, the function readhe()
requires only one argument: the file's path. It returns the matrix from the Helium file.
writehe(mat, heFile::String; colNames::Array{String,1} = [""],
rowNames::Array{String,1} = [""],
supplement::Array{String,2} = ["" ""])
Arguments
mat
: data matrix.heFile
: a string that indicates the path of the helium.he
file.colNames
: an array of strings that contains the names of the columns associated to the data matrix.rowNames
: an array of strings that contains the names of the rows associated to the data matrix.supplement
: a matrix of strings that is a supplemental information associated to the data matrix. Its number of rows is identical to the data matrix. It may include column names only if there exist column names associated to the data matrix.
readhe(heFile::String)
Arguments
heFile
: a string that indicates the path of the helium.he
file.
Examples
julia> using Helium
julia> toymat = [1.5 8 12 24;7 22 24 70]
2×4 Array{Float64,2}:
1.5 8.0 12.0 24.0
7.0 22.0 24.0 70.0
julia> Helium.writehe(toymat, "~/Project/data/testFile.he")
julia> Helium.readhe("~/Project/data/testFile.he")
2×4 Array{Float64,2}:
1.5 8.0 12.0 24.0
7.0 22.0 24.0 70.0
julia> toymat = [1.5 8; 12 24;;;7 22; 24 70]
2×2×2 Array{Float64, 3}:
[:, :, 1] =
1.5 8.0
12.0 24.0
[:, :, 2] =
7.0 22.0
24.0 70.0
julia> Helium.writehe(toymat, "~/Project/data/testFile.he")
julia> Helium.readhe("~/Project/data/testFile.he")
2×2×2 Array{Float64, 3}:
[:, :, 1] =
1.5 8.0
12.0 24.0
[:, :, 2] =
7.0 22.0
24.0 70.0
CSV to Helium: key function
Helium.csv2he
converts a CSV file that contains a matrix like data into the Helium format.
csv2he(csvFile::String, heFile::String, matType::DataType;
hasColNames::Bool=true, hasRowNames::Bool=false,
strMiss::String="na", sep::String=",", skipCol::Int64=0)
Arguments
csvFile
: a string that indicates the path of the CSV file.heFile
: a string that indicates the path of the helium.he
file.matType
: the type of data (e.g.,Float64
,Int64
,...).hasColNames
: a boolean. Iftrue
(default:true
) we assume that the CSV file includes the column names.hasRowNames
: a boolean. Iftrue
(default:false
) we assume that the CSV file includes the row names.strMiss
: a string in the CSV file that identifies missing elements in the matrix. By default "NA" and "missing" are considered as missing data and they will be mapped asNaN
inside the matrix. The stringstrMiss
represents an additional possibility to look for missing or NA element. It is not case sensitive.sep
: a string delimiter that separates the elements. By defaults, the delimiter is a comma ",".skipCol
: the number of columns to skip before to start reading the matrix in the CSV file. By default, its value is 0. If its value is greater than zero then the skipped columns will be saved as supplemental data in the helium file.
CSV to Helium: no metadata
In this example, we consider a simple CSV file without column names and without row names. Our CSV file, for instance, looks like the following:
1.5,3,12,24
7,22,24,70
julia> using Helium
julia> Helium.csv2he("~/Project/data/testFile.csv", "~/Project/data/testFile.he", Float64,
hasColNames = false)
julia> Helium.readhe("~/Project/data/testFile.he")
2×4 Array{Float64,2}:
1.5 3.0 12.0 24.0
7.0 22.0 24.0 70.0
CSV to Helium: with row and column names
In the next example, we consider a CSV file that includes the column names and the row names. Here what the CSV file looks like in our example:
ID,col1,col2,col3,col4
1,1.5,8,12,24
2,7,22,24,70
julia> using Helium
julia> Helium.csv2he("~/Project/data/testFile.csv", "~/Project/data/testFile.he", Float64,
hasRowNames = true)
julia> Helium.readhe("~/Project/data/testFile.he")
2×4 Array{Float64,2}:
1.5 8.0 12.0 24.0
7.0 22.0 24.0 70.0
During the conversion to He format, the variables names and the sample IDs are embedded if the helium format file. Once the helium file is created, it is also possible to get the column and row names by using the functions getcolnames()
and getrownames()
. Both functions take the file's path as an argument and return an Array{String, 1}
.
julia> Helium.getcolnames("~/Project/data/testFile.he")
4-element Array{String,1}:
"col1"
"col2"
"col3"
"col4"
julia> Helium.getrownames("~/Project/data/testFile.he")
2-element Array{String,1}:
"1"
"2"
CSV to Helium: missing data
Next, we give an example where we specify what string corresponds to a missing data. By default, "NA"s and "NaN"s are checked in as NaN
in our matrix of float or integer, but we can also add a customized string representing missing data. In our CSV file, we consider that "X" is a missing data:
1.5,8,12,X,24
7,22,24,NA,70
julia> using Helium
julia> Helium.csv2he("~/Project/data/testFile.csv", "~/Project/data/testFile.he", Float64,
hasColNames = false, strMiss = "X")
julia> Helium.readhe("~/Project/data/testFile.he")
2×5 Array{Float64,2}:
1.5 8.0 12.0 NaN 24.0
7.0 22.0 24.0 NaN 70.0
CSV to Helium: extra columns
The argument skipCol
gives the option to skip an arbitrary number of columns before reading the matrix data. The skipped columns are preserved as supplemental Array{String,2}
built-in the Helium file. To obtain this supplemental data, we use the function getsupp()
. Let consider the following CSV file as an example, where we will skip 2 columns after the sample IDs:
ID,var1,var2,var3,var4,var5
ID1,Xtra1,3,1.5,X,12
ID2,Xtra2,10,7.0,22,70
julia> using Helium
julia> Helium.csv2he("~/Project/data/testFile.csv", "~/Project/data/testFile.he", Float64,
hasRowNames = true, strMiss = "x", skipCol = 2)
julia> Helium.readhe("~/Project/data/testFile.he")
2×3 Array{Float64,2}:
1.5 NaN 12.0
7.0 22.0 70.0
julia> Helium.getcolnames("~/Project/data/testFile.he")
3-element Array{String,1}:
"var3"
"var4"
"var5"
julia> Helium.getrownames("~/Project/data/testFile.he")
2-element Array{String,1}:
"ID1"
"ID2"
julia> Helium.getsupp("~/Project/data/testFile.he")
3×2 Array{String,2}:
"var1" "var2"
"Xtra1" "3"
"Xtra2" "10"
Convert Helium into a CSV format
Key function
Helium.he2csv
converts a Helium file into a CSV file.
he2csv(heFile::String, csvFile::String;
strMiss::String="NaN", nameColRows::String="ID", sep::String=",")
Arguments
heFile
: a string that indicates the path of the helium.he
file.csvFile
: a string that indicates the path of the CSV file.strMiss
: a string that will be used in the CSV file to indicates missing or NA elements in the matrix. By default "NaN" is used. It is case sensitive.nameColRows
: a string that assigns a column name for the row names in the CSV file. By default, the name is "ID", if the data has row names.nameColRows
is used only if there exists row names and column names in the helium file.sep
: a string delimiter that separates the elements. By defaults, the delimiter is a comma ",".
Example
In this example, let suppose that the file testFile.he contains a data matrix with row and column names. By using the functions readhe
, getrownames
, getcolnames
we can check their contents. By using the function he2csv
, we are able to convert the helium file into a CSV file.
julia> using Helium
julia> Helium.readhe("~/Project/data/testFile.he")
2×5 Array{Float64,2}:
NaN 3.0 12.0
7.0 22.0 70.0
julia> Helium.getcolnames("~/Project/data/testFile.he")
4-element Array{String,1}:
"var1"
"var2"
"var3"
julia> Helium.getrownames("~/Project/data/testFile.he")
2-element Array{String,1}:
"ID1"
"ID2"
julia> Helium.he2csv("~/Project/data/testFile.he", "~/Project/data/testFile.csv", strMiss = "X")
Our CSV file would look like the following:
ID,var1,var2,var3
ID1,X,3.0,12.0
ID2,7.0,22.0,70.0