- Show All Code
- Hide All Code
- View Source
Author
Shreyas Meher
Published
August 12, 2024
Introduction
Welcome to this introduction to basic R syntax and data types! Here, we’ll explore the fundamental building blocks of R programming. By the end of this session, you’ll be familiar with R’s basic syntax and its primary data types.
Learning Tip
Don’t worry if you don’t memorize everything immediately. Programming is about practice and repetition. The more you use these concepts, the more natural they’ll become.
Basic R Syntax
R is a powerful language for statistical computing and data analysis. Let’s start with its basic syntax.
Assignment Operator
In R, we use <-
to assign values to variables. The =
sign can also be used, but <-
is more common in R.
Code
x <- 5y = 10
Note
The <- operator is preferred in R because it’s more flexible and can be used in more contexts than =.
Comments
Comments in R start with #. Everything after # on a line is ignored by R.
Code
# This is a commentx <- 5 # This is also a comment
Basic Arithmetic
R can perform all standard arithmetic operations:
Code
a <- 10b <- 3sum <- a + bdifference <- a - bproduct <- a * bquotient <- a / bpower <- a ^ bmodulus <- a %% bprint(sum)print(difference)print(product)print(quotient)print(power)print(modulus)
Important
Exercise 1: - Create two variables c and d with values of your choice. Perform all the above operations on these variables and print the results.
Functions
R has many built-in functions. Here are a few examples:
Code
# Absolute valueabs(-5)# Square rootsqrt(16)# Roundinground(3.7)
Note
To get help on any function, type ?function_name in the console. For example, ?sqrt will give you information about the square root function.
Data Types in R
R has several basic data types. Let’s explore them:
- Numeric: Numeric data types include both integers and floating-point numbers.
Code
x <- 5 # integery <- 5.5 # doubleclass(x)class(y)
- Character: Character data types are used for text.
Code
name <- "Alice"class(name)
- Logical: Logical data types can be either TRUE or FALSE.
Code
is_student <- TRUEclass(is_student)
Important
Exercise 2: - Create variables of each data type we’ve discussed so far (numeric, character, logical). Use the class() function to verify their types.
- Vectors: Vectors are one-dimensional arrays that can hold data of the same type.
Code
numeric_vector <- c(1, 2, 3, 4, 5)character_vector <- c("apple", "banana", "cherry")logical_vector <- c(TRUE, FALSE, TRUE, TRUE)print(numeric_vector)print(character_vector)print(logical_vector)
Note
The c() function is used to create vectors in R.
- Factors: Factors are used to represent categorical data.
Code
colors <- factor(c("red", "blue", "green", "red", "green"))print(colors)levels(colors)
- Lists: Lists can contain elements of different types.
Code
my_list <- list(name = "Bob", age = 30, is_student = FALSE)print(my_list)
- Data Frames: Data frames are table-like structures that can hold different types of data.
Code
df <- data.frame( name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 35), is_student = c(TRUE, FALSE, TRUE))print(df)
Checking and Converting Data Types
You can check the type of any object using the class() function:
Code
x <- 5class(x)
To convert between types, R provides several functions:
Code
# Convert to numericas.numeric("5")# Convert to characteras.character(5)# Convert to logicalas.logical(1)
Tip
Next Steps - Practice creating and manipulating different data types. Try combining them in various ways. The more you experiment, the more comfortable you’ll become with R’s syntax and data structures.
Basic Data Manipulation
Now that we understand basic data types, let’s look at some simple ways to manipulate them.
Indexing Vectors
In R, we use square brackets []
to access elements of a vector. Remember, R uses 1-based indexing (the first element is at position 1, not 0).
Code
fruits <- c("apple", "banana", "cherry", "date")print(fruits[2]) # Access the second elementprint(fruits[c(1, 3)]) # Access first and third elementsprint(fruits[-2]) # All elements except the second
- Indexing Lists: For lists, we can use [] to get a sublist, or [[]] to extract an element.
Code
my_list <- list(name = "Alice", age = 30, scores = c(85, 90, 95))print(my_list["name"]) # Returns a listprint(my_list[["name"]]) # Returns the valueprint(my_list$name) # Another way to access elements
- Indexing Data Frames: Data frames can be indexed like lists (to access columns) or like matrices (to access specific cells).
Code
df <- data.frame( name = c("Alice", "Bob", "Charlie"), age = c(25, 30, 35))print(df$name) # Access a columnprint(df[1, 2]) # Access a specific cell (row 1, column 2)print(df[1, ]) # Access the first row
Important
Exercise 4: - Create a vector of numbers from 1 to 10. Then, use indexing to:
- Extract the 5th element
- Extract all elements except the 3rd
- Extract the 2nd, 4th, and 6th elements
Useful Built-in Functions
R has many built-in functions that are incredibly useful for data manipulation and analysis.
Statistical Functions
Code
numbers <- c(10, 20, 30, 40, 50)print(mean(numbers)) # Averageprint(median(numbers)) # Medianprint(sd(numbers)) # Standard deviationprint(sum(numbers)) # Sumprint(max(numbers)) # Maximum valueprint(min(numbers)) # Minimum value
- String Functions
Code
text <- "Hello, World!"print(toupper(text)) # Convert to uppercaseprint(tolower(text)) # Convert to lowercaseprint(nchar(text)) # Number of charactersprint(substr(text, 1, 5)) # Extract substring
- Utility Functions
Code
print(length(numbers)) # Number of elementsprint(seq(1, 10, by = 2)) # Generate a sequenceprint(rep("A", 5)) # Repeat a value
Note
These are just a few of the many built-in functions in R. As you progress, you’ll discover many more that can help you in your data analysis tasks.
Conditional Statements
Conditional statements allow you to execute code based on certain conditions. The most common is the if-else statement:
Code
x <- 10if (x > 5) { print("x is greater than 5")} else if (x == 5) { print("x is equal to 5")} else { print("x is less than 5")}
Important
Exercise 5: - Write a conditional statement that checks if a number is positive, negative, or zero, and prints an appropriate message for each case.
Matrices in R
Creating Matrices
Matrices are two-dimensional arrays that can hold elements of the same type. You can create a matrix in R using the matrix()
function.
Code
# Create a 2x3 matrixmatrix_data <- matrix(c(1, 2, 3, 4, 5, 6), nrow = 2, ncol = 3)print(matrix_data)
The matrix() function takes a vector of data and organizes it into a matrix with a specified number of rows (nrow) and columns (ncol). The data is filled column-wise by default.
Matrix Operations
You can perform various operations on matrices, including arithmetic and element-wise operations.
Code
# Create another matrix of the same dimensionsmatrix_data2 <- matrix(c(6, 5, 4, 3, 2, 1), nrow = 2, ncol = 3)# Matrix additionsum_matrix <- matrix_data + matrix_data2print(sum_matrix)# Element-wise multiplicationprod_matrix <- matrix_data * matrix_data2print(prod_matrix)
Matrices can be added or multiplied element-wise if they have the same dimensions. The + operator adds corresponding elements, and the * operator multiplies them.
Accessing Elements in a Matrix
You can access specific elements in a matrix using square brackets [], specifying the row and column indices.
Code
# Access the element in the first row and second columnelement <- matrix_data[1, 2]print(element)# Access the entire first rowfirst_row <- matrix_data[1, ]print(first_row)# Access the entire second columnsecond_column <- matrix_data[, 2]print(second_column)
Use matrix[row, column] to access specific elements. You can omit the row or column index to select an entire row or column.
Loops in R
For Loops
Loops are used to repeat a block of code multiple times. The for loop is commonly used to iterate over elements in a vector or a sequence.
Code
# Create a vector of numbersnumbers <- c(1, 2, 3, 4, 5)# Initialize a variable to store the sumtotal_sum <- 0# Loop over each element in the vectorfor (number in numbers) { total_sum <- total_sum + number # Add each number to the total_sum}print(total_sum) # Output the total sum
The for loop iterates over each element in the numbers vector. The loop variable (number) takes the value of each element, and the code inside the loop is executed for each iteration.
While Loop
The while loop repeats a block of code as long as a specified condition is TRUE.
Code
# Initialize a counter variablecounter <- 1# Loop until the counter reaches 5while (counter <= 5) { print(paste("Counter is:", counter)) # Print the current value of counter counter <- counter + 1 # Increment the counter}
The while loop continues to execute as long as the condition (counter <= 5) is TRUE. After each iteration, the counter is incremented until the condition becomes FALSE.
Important
Exercise 6: Create a vector of 5 numbers and a vector of 5 names. Combine them into a data frame where each number corresponds to an age and each name corresponds to a person. Then, calculate the mean age and display a summary of the data frame.
Important
Advanced Exercise: Create a custom function that takes a vector of numbers as input and returns a list containing the following: 1. The square of each number in the vector. 2. A count of how many numbers in the vector are greater than a specified threshold. 3. The mean of the numbers in the vector, but only include numbers greater than a specified threshold in the calculation.
Test your function with a vector of random numbers, using a threshold of your choice.
Hint:
Squaring Numbers: Remember that you can square all elements in a vector at once using vectorization (e.g.,
numbers^2
). There’s no need to loop through each element individually, though you can if you want to practice using loops.Counting Elements: To count how many numbers are greater than the threshold, use a logical comparison (e.g.,
numbers > threshold
). This will return a logical vector (TRUE
orFALSE
), and you can sum it up with thesum()
function, sinceTRUE
is treated as 1 in R.Filtering for Mean Calculation: Use the logical comparison to filter your vector before calculating the mean (e.g.,
numbers[numbers > threshold]
).
Conclusion
Congratulations! You’ve now been introduced to the basic syntax of R, its primary data types, and some fundamental operations for data manipulation. This knowledge forms the foundation for your journey into data analysis and statistical computing with R.
Remember, the key to mastering R is practice. Try to use these concepts in real-world scenarios, experiment with different data types and functions, and don’t hesitate to consult R’s extensive documentation and online resources when you encounter challenges.