Learn R Programming in 3 hours-Part3

ubaid darwaish

This is in continuation of the series “Learn R Programming in 3 hours“. In case you have not gone through the previous sections (Part-1Part-2), please do so before you proceed.

So far we have discussed the following:-

Installing R and R Studio
The basic R Studio interface.
Import data into R
Check: no of rows, no of column and the structure of your data
View the data that you imported
Create computed columns and use basic numeric functions
Sort the data in R
Derive the value of column based on condition
Apply basic character functions to your data
Convert data types in R
Deal with basic date columns
Do mathematical calculations with your data
Remove duplicates in your data
Replace and find missing values in your data
Remove or Keep columns from a dataset

In this chapter, we will continue from where we left and by end of this chapter you should be able to do the following in R:-

You should be able to do joins in R (Left, Right, Inner, Outer)
You should be able to do the basic summary and aggregation of data
You should be able to subset the data
You should be able to create new datasets in R
You should be able to create basic plots with data
You should be able to export a dataset into a file using R

So lets begin…

Join the tables

Now let us try to join the two tables akadata frames,namesand salaryand create a new table “newdata”. We will be joining them by a common column “id”. This is a full outer join because we said all=TRUE.

Learn_R_Programming_29

Lets check number of rows we got and also view the top 5 rows.

Similarly we can do the Left and Right joins by modifying the all. Statement. X denotes left and Y denotes right. So all.x=TRUE implies all rows from left table. Similarly all.y=TRUE will imply all rows from Right table.

The cross joins can be achieved by using the statement “by=NULL” and the inner join can be achieved by ignoring the all. Statement.

Learn_R_Programming_30
Learn_R_Programming_31

Aggregation and Summary

Now that we got employee info and salary info in one table, lets do some basic aggregation. Lets try to find the count of employees of different genders. We will do that by using table function. Here what I am telling R is to count the rows in newdataand group them by gender column:-

Learn_R_Programming_32

Also we can do the basic summary calculation and collect some statistics on the column “Income” of table “newdata”. This can be achieved by summary() function.

Learn_R_Programming_33

We can do some aggregation as well, where we calculate the summary statistics for column Income and group it by gender using aggregate function.

Here what I am telling R is to do the aggregation by calculating the summary statistics(FUN=summary) on column “Income” of table newdata(newdata$Income) and then group it by gender[by=list(newdata$gender)].

Learn_R_Programming_34

Subset  the dataset and create a new dataset

Now that we have checked our summary,  lets see how we can subset the data. We will create a data frame newdata_f that shall contain information only about Female employees by using subset function.

Here I am telling R to create a data frame newdata_fby subsetting  data frame newdata on column gender, where gender==”f” (There is a mistake in snapshot below, I have missed one ‘=’ sign). Then lets try to view last few rows of the newly created data frame using tailfunction.

Learn_R_Programming_35

Scatter Plot

Lets try to create a simple scatter plot depicting the salaries of the female staff. Remember our data frame newdata_fonly has information about female employees only, so no filters are required.

Here what I am telling R is to create a scatter plot between two variables, age and salary of data framenewdata _f.

Learn_R_Programming_36

Oops I got an error !!!

Learn_R_Programming_37

The problem is that the small figure region created by chart is not large enough to contain just the default margins. I will try to modify it by using the code below and then run the plot() function again.

Learn_R_Programming_38

Voila ! it works.

Learn_R_Programming_39

Bar Plot

Now lets try one bar chart as well. Remember the table() function. How it counts the rows and groups them by gender. We will do it again but this time save the values into object “counts”

Learn_R_Programming_40

Next lets use the barplot() function and tell it to plot counts, with main title of chart as “Employees per Gender” and label the Y axis as “No. of Employees”

Learn_R_Programming_41

Write a dataset to a file

Lets try to create a tab delimited(sep=”\t”) file “newdata.txt” in the path

c:\temp\System\SAMPLE DATA-STATS\r-tutorial

Learn_R_Programming_42

You can open and view the file.

Learn_R_Programming_43

This marks the end of our journey of Learning basic  programming. Do let me know your feedback via comments.

Some additional Tips:-

#To clear the consolectrl+L

#Remove Objectrm(newdata_f)

#list all objectsls()

#Multiple assignmentsx<-y<-newdata

#Clean all objects from Workspacerm(list=ls())

#help in Re.g: ?subset() 

Sometimes R needs additional packages to be able to do certain tasks:e.g. to import an xlsx file, if you straight away try to call function read.xlsx , it will give you an error.

Learn_R_Programming_44

You need to install this package first and then attach it

#To install packageinstall.packages(‘xlsx’)

#To attach a packagelibrary(xlsx)

Now you can call the function read.xlsx()and read an xlsx file without any errors.

Some Functions in R

Numeric

FunctionDescription
abs(x)absolute value
sqrt(x)square root
ceiling(x)ceiling(3.475) is 4
floor(x)floor(3.475) is 3
trunc(x)trunc(5.99) is 5
round(x, digits=nround(3.475, digits=2) is 3.48
signif(x, digits=nsignif(3.475, digits=2) is 3.5
cos(x), sin(x), tan(xalso acos(x), cosh(x), acosh(x), etc.
log(x)natural logarithm
log10(x)common logarithm
exp(x)e^x

 Character

x <- “abcdef”
substr(x, 2, 4) is “bcd”
substr(x, 2, 4) <- “22222” is “a222ef”

toupper(x)Uppercase
tolower(x)Lowercase

Contributed by: Ubaid Darwaish