# R programming basics

---

In this section, we will present some basic concepts and definitions in R programming.

## How does R work?

When you start R, and get the command prompt, you are essentially getting an empty table ("workspace").  You type in actions ("commands"), they get executed, and these create "objects" in your workspace.  Objects are structured containers with data.  Some objects can operate on data -- these are "functions."  Think of these as tools.  Your R session is a series of actions performed using tools (functions) that operate on data (objects) to modify those objects, or to create new ones.

![How R Works](images/HowRWorks.svg)   


## What are comments in computer programming and how to write them in R?

Comments in computer programming are notes that explain a source code.  It is usually written in a way that any programmer could understand what the code is about.  In R programming, the `#` sign precedes comments so that this particular line will be ignored when running (compiling) the source code. Comments make the source code easier to be understood by others and even by the programmer himself when revisiting a code after a long period. Adding comments is a good pratice and highly recommended to reach a minimum of clarity.

In [None]:
# Print "Hello!"
print("Hello!")

## What is a variable in programming?


In programming, a "variable" could be described as a container (an object) that stores a value (or data) _that may change_. As the code is executed, the stored value may change in line with the programmed instructions (hence the name variable).  There are objects/values that may not change (constants); they are rarely used in R and hence not discussed here.

The programmer can choose almost any arbitrary name for a variable so long it respects specific rules proper to the programming language. In R, variable name:

* must begin with a letter or a period followed by a letter
* can be spelled only with letter, digits, periods and underscores
* need to be different from reserved words whose meanings are unique in R language   

List of reserved words: if, else, while, for, end, function, in, repeat, break, next, TRUE, FALSE, NULL, Inf, NaN, NA, NA_integer, NA_real, NA_complex, NA_character, return.   

Tips: *to name a variable, use words that easily indicate that variable's purpose.*

## How to assign a value to a variable?

In R, the operators `<-` and `=` assign a value to a variable, but the latter operator is only allowed at the top level (*e.g.*, not inside a function call ). The operator `<-` is generally more employed among the R community since it is valid anywhere. 

In [None]:
# Example
# Assign the number of female students in the classroom to the variable numberStudentsFemale
numberStudentsFemale <- 12

# Print out the value of the variable numberStudentsFemale
numberStudentsFemale

In [None]:
# Example
# Assign a new number of female students in the classroom to the variable numberStudentsFemale
numberStudentsFemale = 8

# Print out the value of the variable numberStudentsFemale
numberStudentsFemale

```R
# Lets try with '=' at a lower level as an example where it will generate an error
print(numberStudents = 5)

Error in print.default(numberStudentsFemale = 5): argument "x" is missing, with no default
Traceback:

1. print(numberStudentsFemale = 5)
2. print.default(numberStudentsFemale = 5)
```

In [None]:
# Lets try with '<-' at a lower level
print(numberStudentsFemale <- 5)

In [None]:
# Assign the number of male students in the classroom to the variable numberStudentsMale
numberStudentsMale <- 4

# Since the variable names describes clearly what their purpose is about, 
# get the total number of students in the classroom.
numberStudentsTotal <- numberStudentsFemale + numberStudentsMale

# Print out the total number of students
numberStudentsTotal

## How to do arthmetic in R?

The following operators perform arithmetic on numeric or complex vectors:

|  Operator   | Operation         | 
|:------------|:------------------|
|       +     | addition          |
|       -     | substraction      |
|       *     | multiplication    |
|       /     | division          |
|       ^     | exponentiation    |
|      %%     | modulus           |
|      %/%    | integer division  |

In [None]:
# Addition 
2 + 3

# Subtraction
8 - 1

# Multiplication
5 * 4

In [None]:
5%%3

## What is a function in R programming?

In R programming,  a function is a set of instructions that performs a specific task. R offers a large number of in-built functions; moreover, it is possible to create or customize functions too. Generally, a function uses one or more argument part as inputs, then it processes arguments in the body part, and finally, it may return an output.

The standard syntax to create a function generally looks like the following example:    
    
```
function_name <- function(arg_1, arg_2, arg_3){
    statements_to_be_processed
    return(output_variable_name)
}
```

In [None]:
# Create a function that print "Hello"
function_hello <- function(){
    print("Hello")
}

In [None]:
function_hello()

## How to get help in R?

There are various alternatives to seek help in R:   

* Getting help directly from the R console
* Getting help from in RStudio
* Getting help from the web

### Getting help from the R console

We can use the built-in function `help( )` that requires a topic name as a string argument.

In [None]:
# Get help documentation for the function mean( )
help("mean")

We can also use the command `?` followed by the name of a built-in function whose we need help.


In [None]:
?mean

The help.search() function scans the documentation for packages installed in your library. The (first) argument to help.search() is a character string or regular expression. For example, help.search("^glm") searches for help pages, vignettes, and code demos that have help “aliases,” “concepts,” or titles that begin (case-insensitively)

In the case, we do not know the full name of the function we inquire; we can use `help.search()`, similar to `??`, to draw up a list of links in the help file where our word occurs.

In [None]:
??mean

Another option is to show only some examples of how to utilize a built-in function by using `example( )`. 

In [None]:
example(mean)

### Getting help in RStudio

In RStudio, we can enter the name of the topic we need help on into the Search box on the Help tab

![Help in RStudio](images/helpRstudio.png)

### Getting help from the web


Using the general search engine as Google is very straight forward, but it could be laborsome to filter many links.
There exist internet search sites that are specialized for R programming as  [rseek.org](https://rseek.org).   
Another source to get help is the cheat sheet from the RStudio website: [https://www.rstudio.com/resources/cheatsheets/](https://www.rstudio.com/resources/cheatsheets/).   
For example, let see how to use packages:

![Cheat Sheet package in RStudio](images/RMarkdownCheatSheet.png)

## Basic data types

A data type is an attribute of a variable (*i.e.*, data) which indicates to the programming system how the programmer expects to use the variable. The primary and most frequently used data types are described in the following table:

|  Data type  | Description       | Example  |
|:------------|:------------------|---------:|
| integer     | natural numbers   |     5    |
| numeric     | decimal numbers   | 3.14     |
| complex     | complex numbers   | 2 + 7i   |
| logical     | boolean           | TRUE     |
| character   | string(text)      | "apples" |

R programming does not require to define the data type when creating a variable. R automatically determine the data type of a variable depending on the value it contains. There exists many ways to check the data type of a variable by using one the following commands:
* `typeof()`
* `class()`
* `mode()`
* `str()`   


In [None]:
# Check type of my variable/object
typeof(numberStudentsTotal)
class(numberStudentsTotal)
mode(numberStudentsTotal)
str(numberStudentsTotal)

By default, when an integer is assigned to a variable, R will automatically choose numeric as the data type. One way to specify that R should recognize the data type as an integer is to append the character "L" after the value.

In [None]:
# Check class of my_character
x = 7
class(x)
y = 7L
class(y)

An alternative way to test for a data type (*i.e*, returning TRUE or FALSE) is to check for a specific type with a command such as:   
* `is.integer()`
* `is.numeric()`
* `is.complex()`
* `is.logical()`
* `is.character`

It is also possible to make some type conversions in R; for instance, converting a character to numeric and vice versa by using the commands:  
* `as.integer()`
* `as.numeric()`
* `as.complex()`
* `as.logical()`
* `as.character`


In [None]:
# For example, assign the character 5 to the variable x
x <- "5"
# Check if data type of x is numeric then character
is.numeric(x)
is.character(x)
# Now convert the variable x as an integer
as.integer(x)

## How to store multiple values?

In R programming, there exists different data structures to store multiple values and each one of them prensents different advantages.
Type conversions will be further discussed in details in a different chapter. In the next section, we will present how to store multiple values by using different data structures as:

* vector
* list
* matrix
* data frame


### Vector

A vector is a collection of elements whose data type is identical for all of them. The data type can be any primary mode as integer, numeric, complex, logical or character.

In [None]:
# To create an empty vector you need to specify the mode (i.e. data type) and the length  
vec1 <- vector(mode = "character", length = 3)
vec1

In [None]:
# It is possible ti create and fill a vector directly using the function c() which 
# combines its arguments as a collection of elements 
vec1 <- c("March", "July", "September")
vec2 <- c(7, 22, 19)
vec2

To obtain the data type of a vector elements, we can use one of the following function: `typeof()`, `class()`, `mode()`, `str()`. 

In [None]:
mode(vec2)

To access elements of a vector, we use the square brackets `[ ]` and indexes .

In [None]:
# To access the second element
vec2[2]
# To access the first and the third element
vec2[c(1,3)]

The function combine `c( )` can be used to increment elements of a vector. 

In [None]:
# Add one element
c(vec2, 4)

In [None]:
# To display the vector as a row we can use the function `t ()` which returns the transpose of vector
t(c(vec2, 4))

What happens if we try to combines different data type elements?    
R will convert each element to a unique data type so that it can suit all the elements.

In [None]:
# If it contains at least one character type then the vector data type will be character
t(c("1", 2.1, 3))

In [None]:
# If it contains at least one complex and no character type then the vector data type will be complex
t(c(TRUE, 2+1i, 3.14))

In [None]:
# If it contains at least one numeric and neither character nor complex type then the vector data type will be numeric
t(c(TRUE, 2, 3.14))

### List

A list is a collection of elements whose data type can be different from each other.

In [None]:
list1 <- list("1", 3.14, TRUE)
list1

To obtain what data types are presents in a list, we use the function: `str()`. 

In [None]:
str(list1)

A list can contains different elements with different data structure.

In [None]:
list2 <- list(3.14, vec2)
list2

To access elements of a vector, we use the square brackets `[ ]` and indexes .

In [None]:
list2[2]

To access directly an element of a multi valued structure in a list, a vector for instead, we use the double square bracket `[[ ]]` operator and indexes. 

In [None]:
list2[[2]][3]

We can change a list content directly.

In [None]:
list2[1] <- "Number one"
list2

It is also possible to name each element in a list. We have to specify the name of the element, called a "key", and the value of the element. 

In [None]:
list3 <- list(name = "John", info = vec2)
list3

We can get the list of keys by using the function `names( )`.   
To access directly an element in list with keys, we use the list name and the key separated by `$`.

In [None]:
list3$name

### Matrix

A matrix is a collection of data elements organized in a two-dimensional rectangular layout, with rows and columns.   
Similarly to a vector, all elements of a matrix must be of the same data type.   
To build a matrix directly with data elements, we use the function `matrix( )` and fill the matrix content along the column orientation by default.

In [None]:
mat1 <- matrix(c(1, 2, 3, 4, 5, 6, 7, 8, 9),
               nrow = 3, ncol = 3)
mat1

To access directly an element of a matrix, we use the square bracket `[ ]` operator and the indexes (*i.e.*, \[row number, column number\] or column wise index). 

In [None]:
mat1[2,3]

In [None]:
mat1[8]

### Data frame

Data frame is a two-dimensional structure, and it is commonly used to store tabular data.
A data frame is structured as a list of vectors having the same length, and where each vector may have a different data type. 
In addition to column names, data frames can have row names.    
The following built-in functions retrieve some information about the attributes and the structure of a data frame:

* `head( )` shows first 6 rows
* `tail( )` shows last 6 rows
* `dim( )` returns the dimensions of data frame (i.e. number of rows and number of columns)
* `nrow( )` returns number of rows
* `ncol( )` returns number of columns
* `str( )` shows the structure of data frame 

In [None]:
# R offers some preloaded data frame as example
data(iris)
head(iris)

We can build data frame from existing vectors.

In [None]:
# Build a data frame from the previous vector vec.1 and vec.2 
df1 <- data.frame(month = vec1, day = vec2)
df1

In [None]:
df1$month <- as.character(df1$month)
str(df1)

Similarly to a list, we can access elements (or cells) of a data frame by using square brackets `[ ]` and the dollar character `$`.

In [None]:
df1$day

In [None]:
df1[[1]][2]

We can also do arthmetic directly with the column vectors or with a specific cell.

In [None]:
df1$day <- df1$day + 1
df1

In [None]:
df1$day[2]<- df1$day[2] + 1
df1

## How is missing data is represented in R?   
We use `NA`.

In [None]:
# For example, if we miss the value of an element in a vector, we replace it with NA
vec3 <- c(3, NA, 7)
t(vec3)

The function `anyNA( )` checks out if the vector contains any missing values.

In [None]:
anyNA(vec3)

The function `is.na( )` indicates where are any missing values.

In [None]:
t(is.na(vec3))

## How to list all R environment variables?
The functions `ls()` and `objects()` return a list containing the name the names of the objects in the current environment. They show as well what functions a user has defined.

In [None]:
ls()

In [None]:
objects()

## How to clear all R environment variables?

In [None]:
remove(list= ls())

Data frame is a very useful data structure in R programming and probably the most used for statistics. Its attributes and properties make data frame a versatile oject. We will see later on in more details how to store and how to manipulate data frame as:

* Importing data sets into a data frames
* Exploring data frames
* Extracting subsets from data frames
* Filtering data frames
* Merging data frames