Learn R Programming in 3 hours-Part2

ubaid darwaish

This is in continuation of the series “Learn R Programming in 3 hours“. In case you have not gone through the previous section, please do so before you proceed.

In the previous 

 (Learn R Programming in 3 hours- Part 1 ) we discussed the following:-

Installing R and R Studio
The basic R Studio interface.
Import data into R
Check: no of rows, no of column and the structure of your data
View the data that you imported
Create computed columns and use basic numeric functions
Sort the data in R

In this chapter, we will continue from where we left and by end of this chapter you should be able to do the following in R:-

• You should be able to derive the value of column based on condition
• You should be able to apply basic character functions to your data
• You should be able to convert data types in R
• You should be able to deal with basic date columns
• You should be able to do mathematical calculations with your data
• You should be able to remove duplicates in your data
• You should be able to replace and find missing values in your data
• You should be able to Remove or Keep columns from a dataset

So lets get started:-

Create New Column based on Condition

Now what we need to do is to create a new column called gender, we will create this column based on the values of the column sex. We will be using ifelse()function, the way it works as:-

Ifelse(condition, if_true,  if_false)

Learn_R_Programming_16

Here what I am telling R is that create a column genderin table names, if column sexof table nameshas incoming value as ‘boy’ else put the value as Female. Then I will view the output using head function.

You might have noticed that I am using something like c(‘Male’). Its actually a function to concatenate multiple vector values. A vector is a sequence of data elements that are of same type.

If we had to assign binary values instead of Male and Female we didn’t have to use c() function.

Learn_R_Programming_17

Modify a column 

Previously we named the values in Column Gender as Male or Female. Now we will try to recreate the column gender by applying the substringfunction to extract only the first alphabet i.e. ‘M’ or ‘F’.

Here is how this function works substr(text,start_position,length_of_characters)

We will view the output using head function.

Now lets change the case to lowercase and view the output.

Learn_R_Programming_19

Convert from Factor to Character and then to date

Now lets take a step back and talk briefly about data types in R. There are many data types in R but we will discuss only few.

Numeric: Decimal values are called numerics in R.

Integer: Non Decimal numbers are stored as integers in R.

Factor: R stores nominal values by making it a factor. The factor stores the nominal values as a vector of integers in the range [ 1… k ] (where k is the number of unique values in the nominal variable)

Character: is used to represent string values in R. We convert objects into character values with the as.character()function

Date: Dates are internally represented as the number of days since 1970-01-01, with negative values for earlier dates. You can use the as.Date()function to convert character data to dates

table/datasetin R is called a data frame.

Coming back to console lets check the structure of data frame names, by using str() function.

We can see that idis stored as int, name,sex,year_of_birthas Factor and gender as character.

We can simply convert name and sex into character by using as.character function. But to convert year_of_birth into date, we need to follow certain steps.

Firstly we need to convert year_of_birth from factor to character, and then from character to date.

And when we convert it from character to date, we will also be required to specify the format in which it needs to read it (%d/%m%Y i.e. 27/11/1989).

I have done it in single line, firstly as.character() function works and then as.date() function works.

We will view the output using headfunction. And also check using str() function that the year_of_birth now is a date column.

Learn_R_Programming_20

Add another date column 

Function Sys.Date( ) returns today’s date in R and date() returns the current date and time.

Let’s create a new column namely today in table names with today’s date. And preview the output using head function.

Learn_R_Programming_21

Compute age

Now lets create another column in table namesas age. We will try to calculate the persons age by finding out the number of days between today and the persons date of birth and then divide it by 365 to find the number of years. The reason I am using as.Integer()function to specify the datatype of the new column.

Also I will use the ceiling()function, to get rid of the decimal part of the calculation.  Lets view the data using headfunction.

Learn_R_Programming_22

Remove Duplicates

Let’s run a simple statement to select only rows where id=1015 and retrieve all columns.

Remember table_name[operation for rows , operation for columns]?

Learn_R_Programming_23

We see there are multiple rows with same id, lets try to delete the duplicates using the unique()function. Here what I am telling R is to re-create the salary data frame as unique rows from salary data frame.

Learn_R_Programming_24

Then we re-check if duplicates have been removed for id 1015.

Replace missing values

Lets first view the rows in data frame nameswhere value for column sexis missing .

Learn_R_Programming_25

We see that the people where value for column sex as missing, have been incorrectly labelled as f (females) in column gender. Lets try to change that value to u(unknown).

Learn_R_Programming_26

Here what I am telling R is to select the rows with missing values of sexand update the column genderwith values u.

We again select the rows with missing values for sexand see if column genderhas been updated.

Drop/Retain Variables

Lets try to drop the redundant or unnecessary columns/variables like sex,year_of_birth, today. We will create a vector(myvars_d) with all the columns that we don’t want to keep. And later use it to re-create the table without those columns.

Learn_R_Programming_28

In the following example we will retain certain columns, I am selecting the columns sex, year_of_birth and today from the table definition of table “names” and storing them in “myvars_k”

Then I am re-creating “names” from “names“ with only the columns listed in myvars_k.

Learn_R_Programming_27

See you soon with Part 3….

Contributed by: Ubaid Darwaish