This is in continuation of the series “Learn R Programming in 3 hours“. In case you have not gone through the previous sections (Part-1, Part-2), please do so before you proceed.
So far we have discussed the following:-
Installing R and R Studio
The basic R Studio interface.
Import data into R
Check: no of rows, no of column and the structure of your data
View the data that you imported
Create computed columns and use basic numeric functions
Sort the data in R
Derive the value of column based on condition
Apply basic character functions to your data
Convert data types in R
Deal with basic date columns
Do mathematical calculations with your data
Remove duplicates in your data
Replace and find missing values in your data
Remove or Keep columns from a dataset
In this chapter, we will continue from where we left and by end of this chapter you should be able to do the following in R:-
You should be able to do joins in R (Left, Right, Inner, Outer)
You should be able to do the basic summary and aggregation of data
You should be able to subset the data
You should be able to create new datasets in R
You should be able to create basic plots with data
You should be able to export a dataset into a file using R
So lets begin…
Join the tables
Now let us try to join the two tables akadata frames,namesand salaryand create a new table “newdata”. We will be joining them by a common column “id”. This is a full outer join because we said all=TRUE.
Lets check number of rows we got and also view the top 5 rows.
Similarly we can do the Left and Right joins by modifying the all. Statement. X denotes left and Y denotes right. So all.x=TRUE implies all rows from left table. Similarly all.y=TRUE will imply all rows from Right table.
The cross joins can be achieved by using the statement “by=NULL” and the inner join can be achieved by ignoring the all. Statement.
Aggregation and Summary
Now that we got employee info and salary info in one table, lets do some basic aggregation. Lets try to find the count of employees of different genders. We will do that by using table function. Here what I am telling R is to count the rows in newdataand group them by gender column:-
Also we can do the basic summary calculation and collect some statistics on the column “Income” of table “newdata”. This can be achieved by summary() function.
We can do some aggregation as well, where we calculate the summary statistics for column Income and group it by gender using aggregate function.
Here what I am telling R is to do the aggregation by calculating the summary statistics(FUN=summary) on column “Income” of table newdata(newdata$Income) and then group it by gender[by=list(newdata$gender)].
Subset the dataset and create a new dataset
Now that we have checked our summary, lets see how we can subset the data. We will create a data frame newdata_f that shall contain information only about Female employees by using subset function.
Here I am telling R to create a data frame newdata_fby subsetting data frame newdata on column gender, where gender==”f” (There is a mistake in snapshot below, I have missed one ‘=’ sign). Then lets try to view last few rows of the newly created data frame using tailfunction.
Scatter Plot
Lets try to create a simple scatter plot depicting the salaries of the female staff. Remember our data frame newdata_fonly has information about female employees only, so no filters are required.
Here what I am telling R is to create a scatter plot between two variables, age and salary of data framenewdata _f.
Oops I got an error !!!
The problem is that the small figure region created by chart is not large enough to contain just the default margins. I will try to modify it by using the code below and then run the plot() function again.
Voila ! it works.
Bar Plot
Now lets try one bar chart as well. Remember the table() function. How it counts the rows and groups them by gender. We will do it again but this time save the values into object “counts”
Next lets use the barplot() function and tell it to plot counts, with main title of chart as “Employees per Gender” and label the Y axis as “No. of Employees”
Write a dataset to a file
Lets try to create a tab delimited(sep=”\t”) file “newdata.txt” in the path
c:\temp\System\SAMPLE DATA-STATS\r-tutorial
You can open and view the file.
This marks the end of our journey of Learning basic programming. Do let me know your feedback via comments.
Some additional Tips:-
#To clear the consolectrl+L
#Remove Objectrm(newdata_f)
#list all objectsls()
#Multiple assignmentsx<-y<-newdata
#Clean all objects from Workspacerm(list=ls())
#help in Re.g: ?subset()
Sometimes R needs additional packages to be able to do certain tasks:e.g. to import an xlsx file, if you straight away try to call function read.xlsx , it will give you an error.
You need to install this package first and then attach it
#To install packageinstall.packages(‘xlsx’)
#To attach a packagelibrary(xlsx)
Now you can call the function read.xlsx()and read an xlsx file without any errors.
Some Functions in R
Numeric
Function | Description |
abs(x) | absolute value |
sqrt(x) | square root |
ceiling(x) | ceiling(3.475) is 4 |
floor(x) | floor(3.475) is 3 |
trunc(x) | trunc(5.99) is 5 |
round(x, digits=n) | round(3.475, digits=2) is 3.48 |
signif(x, digits=n) | signif(3.475, digits=2) is 3.5 |
cos(x), sin(x), tan(x) | also acos(x), cosh(x), acosh(x), etc. |
log(x) | natural logarithm |
log10(x) | common logarithm |
exp(x) | e^x |
Character
x <- “abcdef”
substr(x, 2, 4) is “bcd”
substr(x, 2, 4) <- “22222” is “a222ef”
toupper(x) | Uppercase |
tolower(x) | Lowercase |
Contributed by: Ubaid Darwaish