Overview
The GeneNetworkAPI package provides access to the GeneNetwork database and analysis functions using the GeneNetwork REST API.
Credits
Pjotr Prins and Zach Sloan are the main contributors to the GeneNetwork REST API. Karl Broman wrote the GNapi R package for providing access to GeneNetwork from R. This package follows the structure and function of that package closely.
Note on terminology
GeneNetwork collects data on genetically segregating populations (called groups) in a number of species including humans. Most of the phenotype data is "omic" data which are organized as datasets.
Check connection
To check if the website is responding properly:
julia> check_gn()
GeneNetwork is alive. 200
Get species list
Which species have data on them?
julia> list_species()
12×4 DataFrame Row │ FullName Id Name TaxonomyId │ String Int64 String Int64 ─────┼─────────────────────────────────────────────────────────────────── 1 │ Mus musculus 1 mouse 10090 2 │ Rattus norvegicus 2 rat 10116 3 │ Arabidopsis thaliana 3 arabidopsis 3702 4 │ Homo sapiens 4 human 9606 5 │ Hordeum vulgare 5 barley 4513 6 │ Fly (Drosophila melanogaster dm6) 6 drosophila 7227 7 │ Macaca mulatta 7 monkey 9544 8 │ Glycine max 8 soybean 3847 9 │ Solanum lycopersicum 9 tomato 4081 10 │ Populus trichocarpa 10 poplar 3689 11 │ Oryzias latipes (Japanese medaka) 11 medaka 8090 12 │ Bat (Glossophaga soricina) 12 bat 27638
To get information on a single species:
julia> list_species("rat")
1×4 DataFrame Row │ FullName Id Name TaxonomyId │ String Int64 String Int64 ─────┼────────────────────────────────────────────── 1 │ Rattus norvegicus 2 rat 10116
You could also subset (safer):
julia> GeneNetworkAPI.subset(list_species(), :Name => x->x.=="rat")
1×4 DataFrame Row │ FullName Id Name TaxonomyId │ String Int64 String Int64 ─────┼────────────────────────────────────────────── 1 │ Rattus norvegicus 2 rat 10116
List groups for a species
Since the information is organized by segregating population ("group"), it is useful to get a list for a preticular species you might be interested in.
julia> list_groups("rat")
7×8 DataFrame Row │ DisplayName FullName GeneticType Id MappingMethodId Name SpeciesId public ⋯ │ String String String Int64 String String Int64 Int64 ⋯ ─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ Hybrid Rat Diversity Panel (Incl… Hybrid Rat Diversity Panel (Incl… None 10 1 HXBBXH 2 2 ⋯ 2 │ UIOWA SRxSHRSP F2 UIOWA SRxSHRSP F2 intercross 24 1 SRxSHRSPF2 2 2 3 │ NIH Heterogeneous Stock (RGSMC 2… NIH Heterogeneous Stock (RGSMC 2… None 42 1 HSNIH-RGSMC 2 2 4 │ NIH Heterogeneous Stock (Palmer) NIH Heterogeneous Stock (Palmer) None 55 None HSNIH-Palmer 2 2 5 │ NWU WKYxF344 F2 Behavior NWU WKYxF344 F2 Behavior intercross 82 3 NWU_WKYxF344_F2 2 2 ⋯ 6 │ HIV-1Tg and Control HIV-1Tg and Control None 83 1 HIV-1Tg 2 2 7 │ HRDP-HXB/BXH Brain Proteome HRDP-HXB/BXH Brain Proteome None 87 1 HRDP_HXB-BXH-BP 2 2
You can see the type of population it is. Note the short name (Name
) as that will be used in queries involving that population (group).
Get genotypes for a group
To get the genotypes of a group:
julia> get_geno("BXD") |> (x->first(x,10))
10×239 DataFrame Row │ Chr Locus cM Mb BXD1 BXD2 BXD5 BXD6 BXD8 BXD9 BXD11 BXD12 BXD13 BXD14 BXD15 ⋯ │ String3 String31 Float64 Float64 String1 String1 String1 String1 String1 String1 String1 String1 String1 String1 String1 ⋯ ─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 1 rsm10000000001 0.0 3.00149 B B D D D B B D B B D ⋯ 2 │ 1 rs31443144 0.11 3.01027 B B D D D B B D B B D 3 │ 1 rs6269442 0.21 3.4922 B B D D D B B D B B D 4 │ 1 rs32285189 0.32 3.5112 B B D D D B B D B B D 5 │ 1 rs258367496 0.43 3.6598 B B D D D B B D B B D ⋯ 6 │ 1 rs32430919 0.53 3.77702 B B D D D B B D B B D 7 │ 1 rs36251697 0.64 3.81227 B B D D D B B D B B D 8 │ 1 rs30658298 0.75 4.43062 B B D D D B B D B B D 9 │ 1 rs31879829 0.85 4.51871 B B D D D B B D B B D ⋯ 10 │ 1 rs36742481 0.96 4.77632 B B D D D B B D B B D 224 columns omitted
Currently, we only support the .geno
format which returns a data frame of genotypes with rows as marker and columns as individuals.
List datasets for a group
To list the (omic) datasets available for a group, you have to use the name as listed in the group list for a species:
julia> list_datasets("HSNIH-Palmer")
10×11 DataFrame Row │ AvgID CreateTime DataScale FullName Id Long_Abbreviation ProbeFreezeId Shor ⋯ │ Int64 String String String Int64 String Int64 Stri ⋯ ─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 24 Mon, 27 Aug 2018 00:00:00 GMT log2 HSNIH-Palmer Nucleus Accumbens C… 860 HSNIH-Rat-Acbc-RSeq-Aug18 347 HSNI ⋯ 2 │ 24 Sun, 26 Aug 2018 00:00:00 GMT log2 HSNIH-Palmer Infralimbic Cortex … 861 HSNIH-Rat-IL-RSeq-Aug18 348 HSNI 3 │ 24 Sat, 25 Aug 2018 00:00:00 GMT log2 HSNIH-Palmer Lateral Habenula RN… 862 HSNIH-Rat-LHB-RSeq-Aug18 349 HSNI 4 │ 24 Fri, 24 Aug 2018 00:00:00 GMT log2 HSNIH-Palmer Prelimbic Cortex RN… 863 HSNIH-Rat-PL-RSeq-Aug18 350 HSNI 5 │ 24 Thu, 23 Aug 2018 00:00:00 GMT log2 HSNIH-Palmer Orbitofrontal Corte… 864 HSNIH-Rat-VoLo-RSeq-Aug18 351 HSNI ⋯ 6 │ 24 Fri, 14 Sep 2018 00:00:00 GMT log2 HSNIH-Palmer Nucleus Accumbens C… 868 HSNIH-Rat-Acbc-RSeqlog2-Aug18 347 HSNI 7 │ 24 Fri, 14 Sep 2018 00:00:00 GMT log2 HSNIH-Palmer Infralimbic Cortex … 869 HSNIH-Rat-IL-RSeqlog2-Aug18 348 HSNI 8 │ 24 Fri, 14 Sep 2018 00:00:00 GMT log2 HSNIH-Palmer Lateral Habenula RN… 870 HSNIH-Rat-LHB-RSeqlog2-Aug18 349 HSNI 9 │ 24 Fri, 14 Sep 2018 00:00:00 GMT log2 HSNIH-Palmer Prelimbic Cortex RN… 871 HSNIH-Rat-PL-RSeqlog2-Aug18 350 HSNI ⋯ 10 │ 24 Fri, 14 Sep 2018 00:00:00 GMT log2 HSNIH-Palmer Orbitofrontal Corte… 872 HSNIH-Rat-VoLo-RSeqlog2-Aug18 351 HSNI 4 columns omitted
List meta information of genotype file
To list the meta information of a genotype file for a group, you have to use the name as listed in the group list for a species:
julia> list_geno("BXD")
3×4 DataFrame Row │ f1s mat pat location │ String String String String ─────┼────────────────────────────────────── 1 │ B6D2F1 C57BL/6J DBA/2J BXD.8.geno 2 │ D2B6F1 BXD.geno* 3 │ BXD.4.geno
In the case where multiple genotype dataset are associated to a group, the *
indicates the default genotype dataset.
Get sample data for a group
This gives you a matrix with rows as individuals/samples/strains and columns as "clinical" (non-omic) phenotypes. The number after the underscore is the phenotype number (to be used later). Some data may be missing.
julia> get_pheno("HSNIH-Palmer") |> (x->x[81:100,:]) |> show
20×509 DataFrame Row │ id HSR_10001 HSR_10002 HSR_10003 HSR_10004 HSR_10005 HSR_10006 HSR_10007 HSR_10008 HSR_10009 HSR_10010 HSR_100 ⋯ │ String15 Float64? Float64? Float64? Float64? Float64? Float64? Float64? Float64? Float64? Float64? Float64 ⋯ ─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 000721E489 missing missing missing missing missing missing missing missing missing missing missing ⋯ 2 │ 00072AAC0D missing missing missing missing missing missing missing missing missing missing missing 3 │ 00072AC972 missing missing missing missing missing missing missing missing missing missing missing 4 │ 00077E61DC missing missing missing missing missing missing missing missing missing missing missing 5 │ 00077E61EC missing missing missing missing missing missing missing missing missing missing missing ⋯ 6 │ 00077E61F3 missing missing missing missing missing missing missing missing missing missing missing 7 │ 00077E61F5 missing missing missing missing missing missing missing missing missing missing missing 8 │ 00077E6204 missing missing missing missing missing missing missing missing missing missing missing ⋮ │ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋮ ⋱ 14 │ 00077E634B missing missing missing missing missing missing missing missing missing missing missing ⋯ 15 │ 00077E63D9 missing missing missing missing missing missing missing missing missing missing missing 16 │ 00077E641E missing missing missing missing missing missing missing missing missing missing missing 17 │ 00077E6433 missing missing missing missing missing missing missing missing missing missing missing 18 │ 00077E64B3 8672.99 86.414 4762.08 63.416 24076.1 87.118 84.0 43.57 6614.97 22.526 1955 ⋯ 19 │ 00077E64BA missing missing missing missing missing missing missing missing missing missing missing 20 │ 00077E64C1 missing missing missing missing missing missing missing missing missing missing missing 498 columns and 5 rows omitted
Get information about traits
To get information on a particular (non-omic) trait use the group name and the trait number:
julia> info_dataset("HSNIH-Palmer","10308")
1×4 DataFrame Row │ dataset_type description id name │ String String Int64 String ─────┼─────────────────────────────────────────────────────────────────────────────── 1 │ phenotype Central nervous system, behavior… 10308 reaction_time_pint1_5
To get information on a dataset (of omic traits) for a group, use:
julia> info_dataset("HSNIH-Rat-Acbc-RSeq-Aug18")
1×10 DataFrame Row │ confidential data_scale dataset_type full_name id name public short_name ⋯ │ Int64 String String String Int64 String Int64 String ⋯ ─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 0 log2 mRNA expression HSNIH-Palmer Nucleus Accumbens C… 860 HSNIH-Rat-Acbc-RSeq-0818 1 HSNIH-Palmer Nucleus A ⋯ 3 columns omitted
Summary information on traits
Get a list of the maximum LRS for each trait and position.
julia> info_pheno("HXBBXH") |> (x->first(x,10))
10×11 DataFrame Row │ Additive Authors Chr Description ⋯ │ Float64? String String String ⋯ ─────┼────────────────────────────────────────────────────────────────────────────────────────────── 1 │ 0.0499968 Pravenec M, Zidek V, Musilova A,… 8 Original post publication descri… ⋯ 2 │ -0.0926364 Pravenec M, Zidek V, Musilova A,… 14 Original post publication descri… 3 │ 0.60189 Pravenec M, Zidek V, Musilova A,… 20 Original post publication descri… 4 │ 0.992576 Pravenec M, Zidek V, Musilova A,… 8 Original post publication descri… 5 │ 0.00854221 Pravenec M, Zidek V, Musilova A,… 8 Original post publication descri… ⋯ 6 │ -0.0355208 Pravenec M, Zidek V, Musilova A,… 8 Original post publication descri… 7 │ 0.413279 Pravenec M, Zidek V, Musilova A,… 2 Original post publication descri… 8 │ -0.936806 Pravenec M, Zidek V, Musilova A,… 3 Original post publication descri… 9 │ 1.23913 Pravenec M, Zidek V, Musilova A,… 7 Original post publication descri… ⋯ 10 │ 1.2982 Pravenec M, Zidek V, Musilova A,… 7 Original post publication descri… 7 columns omitted
You could also specify a group and a trait number or a dataset and a probename.
julia> info_pheno("BXD","10001")
1×4 DataFrame Row │ additive id locus lrs │ Float64 Int64 String Float64 ─────┼────────────────────────────────────── 1 │ 2.39444 4 rs48756159 13.4975
julia> info_pheno("HC_M2_0606_P","1436869_at")
1×13 DataFrame Row │ additive alias chr description id locus lrs mb mean name p_v ⋯ │ Float64 String String String Int64 String Float64 Float64 Float64 String Flo ⋯ ─────┼──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────── 1 │ -0.214088 HHG1; HLP3; HPE3; SMMCI; Dsh; Hh… 5 sonic hedgehog (hedgehog) 99602 rs8253327 12.7711 28.4572 9.27909 1436869_at 0 ⋯ 3 columns omitted
Analysis commands
GEMMA
julia> run_gemma("BXDPublish","10015",use_loco=true) |> (x->first(x,10))
10×6 DataFrame Row │ Mb additive chr lod_score name p_value │ Float64 Float64 Any Float64 String Float64 ─────┼───────────────────────────────────────────────────────────── 1 │ 3.00149 0.496892 1 0.548213 rsm10000000001 0.283001 2 │ 3.01027 0.496892 1 0.548213 rs31443144 0.283001 3 │ 3.4922 0.496892 1 0.548213 rs6269442 0.283001 4 │ 3.5112 0.496892 1 0.548213 rs32285189 0.283001 5 │ 3.6598 0.496892 1 0.548213 rs258367496 0.283001 6 │ 3.77702 0.496892 1 0.548213 rs32430919 0.283001 7 │ 3.81227 0.496892 1 0.548213 rs36251697 0.283001 8 │ 4.43062 0.496892 1 0.548213 rs30658298 0.283001 9 │ 4.51871 0.496892 1 0.548213 rs31879829 0.283001 10 │ 4.77632 0.496892 1 0.548213 rs36742481 0.283001
R/qtl
This function performs a one-dimensional genome scan. The arguments are
- db (required) - DB name for trait above (Short_Abbreviation listed when you query for datasets)
- trait (required) - ID for trait being mapped
- method - hk (default) | ehk | em | imp | mr | mr-imp | mr-argmax ; Corresponds to the "method" option for the R/qtl scanone function.
- model - normal (default) | binary | 2-part | np ; corresponds to the "model" option for the R/qtl scanone function
- n_perm - number of permutations; 0 by default
- control_marker - Name of marker to use as control; this relies on the user knowing the name of the marker they want to use as a covariate
- interval_mapping - Whether to use interval mapping; "false" by default
julia> run_rqtl("BXDPublish", "10015") |> (x->first(x,10))
https://genenetwork.org/api/v_pre1/mapping?trait_id=10015&db=BXDPublish&method=rqtl&rqtl_method=hk&rqtl_model=normal&num_perm=0&interval_mapping=false 10×5 DataFrame Row │ Mb cM chr lod_score name │ Float64 Float64 Any Float64 String ─────┼─────────────────────────────────────────────── 1 │ 3.01027 3.01027 1 0.116927 rs31443144 2 │ 3.4922 3.4922 1 0.117404 rs6269442 3 │ 3.5112 3.5112 1 0.117424 rs32285189 4 │ 3.6598 3.6598 1 0.117573 rs258367496 5 │ 3.77702 3.77702 1 0.117691 rs32430919 6 │ 3.81227 3.81227 1 0.117727 rs36251697 7 │ 4.43062 4.43062 1 0.118356 rs30658298 8 │ 4.44674 4.44674 1 0.118372 rs51852623 9 │ 4.51871 4.51871 1 0.118447 rs31879829 10 │ 4.77632 4.77632 1 0.118714 rs36742481
Correlation
This function correlates a trait in a dataset against all traits in a target database.
- trait_id (required) - ID for trait used for correlation
- db (required) - DB name for the trait above (this is the Short_Abbreviation listed when you query for datasets)
- target_db (required) - Target DB name to be correlated against
- type - sample (default) | tissue
- method - pearson (default) | spearman
- return - Number of results to return (default = 500)
julia> run_correlation("1427571_at","HC_M2_0606_P","BXDPublish") |> (x->first(x,10))
10×4 DataFrame Row │ #_strains p_value sample_r trait │ Int64 Float64 Float64 Int64 ─────┼────────────────────────────────────────── 1 │ 6 0.00480466 -0.942857 20511 2 │ 6 0.00480466 -0.942857 20724 3 │ 12 1.82889e-5 -0.923362 13536 4 │ 7 0.00680719 0.892857 10157 5 │ 7 0.00680719 -0.892857 20392 6 │ 6 0.0188455 0.885714 20479 7 │ 12 0.000189298 -0.875658 12762 8 │ 12 0.000245942 0.868653 12760 9 │ 7 0.0136973 -0.857143 20559 10 │ 10 0.00222003 -0.842424 10925