Topic outline
-
Welcome to the course on 'R Programming & Big Data Analytics'
In the new data era , the power of data analytics is allowing businesses to provide more value added products and services. New discoveries and trends are being identified in fields such as banking, medical, manufacturing, sales and marketing, mastering the appropriate tools to derive knowledge from data has become key. Several tools and platforms have emerged to be able to mine the Big Data and provide meaningful insights. In this course, you will discover the power of R integrated in a Big Data environment. You will first be introduced with the basics of R and Big Data before embarking on the journey to R and Big Data analytics. Through the guided activities provided, any novice user can easily embark in R and Big Data.
-
Instruction to learners
-
Communication
-
Forum
-
-
Overview
This Unit will cover the basic aspects that will allow you to get the R environment ready for use. You will learn how to install R and use R as well as how to get access to the help feature. The various aspects covered in this unit include:
- Introduction to R
- Installation of R Studio
- Console and Script Editor
- Installation of R Packages
- R Calculator
- R help
Learning Outcomes
Upon completion of this unit, you should be able to :
- explain the R environment
- install and use R Studio
-
Learning Resources
-
Video Resources
-
-
Learning Activities
-
Assignment
-
References & Summary
-
Page
-
Overview
This Unit highlights the fundamentals of the R programming language. It starts with the R syntax, discusses about variables, provides an in-depth insight on the R data structures, identifies the common control structures and ends with an overview of functions.
Learning Outcomes
Upon completion of this unit, you should be able to :
- use the console window and the script editor
- have an overview of the arithmetic, relational and logical operators
- work with variables
- examine the different data structures that exist in R
- familiarise yourself with the two main control structures: decisions and loops
- work with in-built and user-defined functions
-
Learning Resources
-
Video Resources
-
-
Learning Activities
-
References & Summary
-
Page
- use the console window and the script editor
-
Overview
This Unit provides an overview of how data is analysed and visualized using R. It starts with a description of data frames and how to read and manipulate them. Descriptive measure statistics are then explained followed by examples of visualizations options provided in R studio.
Learning Outcomes
Upon completion of this unit, you should be able to :
- identify datasets and explain how they are organised
- manipulate data in a dataframe
- import and export data in R
- use R functions for data visualisation
-
Learning Resources
-
Video Resources
-
-
Learning Activities
-
File
-
File
-
Assignment
-
Assignment
-
References & Summary
-
Page
- identify datasets and explain how they are organised
-
This is the first assignment that will count for the final module marking. Instructions on the assignment can be downloaded below.
-
File
-
Overview
In this Unit, you will get an in-depth knowledge about Big Data landscape. You will become conversant with the terminology and core concepts behind Big Data such as evolution of Big Data, the characteristics of Big Data and different challenges that have cropped up the era of Big Data and the growing volume of data. The unit also covers the different application domains where Big Data can be applied such as healthcare, banking and finance, retail, hospitality and transportation, government and security. Furthermore, the unit will include steps to set up the Cloudera Big Data platform to get acquainted with the Big Data environment.
Learning Outcomes
Upon completion of this unit, you should be able to :
- explain Big data concepts
- learn the characteristics of Big Data
- recognise the challenges of Big Data
- be acquainted with the application domains for Big Data
- know how to set up a Big Data environment
-
Learning Resources
-
Video Resources
-
-
Learning Activities
-
References & Summary
-
Page
- explain Big data concepts
-
Overview
In this Unit, you will get an overview of the Big Data Ecosystem and the different components that exist in this ecosystem. You will also understand what computer clusters are and their role in the Big Data Ecosystem. This unit also explains the Hadoop Distributed File System (HDFS) which is designed like a Master- Slave architecture. The tasks of the NameNode and DataNode are elaborated. This unit also covers the roles of the two components of Map Reduce namely the JobTracker and the TaskTracker. In addition, the unit will explain the development of the MapReduce programs and covers the steps for creating a new MapReduce program and running it locally with a small subset of data.
Learning Outcomes
Upon completion of this unit, you will be able to:
- describe
the Hadoop Ecosystem
- explain
the Hadoop core components
- describe
the concepts of HDFS, MapReduce and Master/ Slave architecture
- elaborate
on the functions of the JobTracker and TaskTracker
- describe
other related tools in the Hadoop Ecosystem
- run
MapReduce programs
-
Learning Resources
-
Video Resources
-
-
Learning Activities
-
References & Summary
-
Page
- describe
the Hadoop Ecosystem
-
Overview
Units 1-5 have covered the fundamentals of R, Big Data and the Big Data ecosystem. This unit gives an overview of Big Data analytics techniques and explains the phases of the data analytics life cycle. Moreover, some real world examples where Big Data analytics could be applied, are described. The unit also emphasizes on Machine Learning as a technique to analyse big datasets. Supervised Machine Learning techniques namely Linear Regression, Logistic Regression and Random Forest and unsupervised Machine Learning techniques namely K-Means algorithm and Principal Components Analysis (PCA) have been discussed. Some algorithms have been applied to a dataset using Spark.
Learning Outcomes
Upon completion of this unit, you will be able to:
- understand the techniques for Big Data Analytics
- discuss the phases of the data analytics project life cycle
- obtain an insight on the Big Data analytics problems
- identify tools for Big Data analytics
- assess the importance of Machine Learning
- differentiate between the supervised and unsupervised machine learning algorithms
- apply supervised and unsupervised
algorithms using SparkR on a dataset
-
Learning Resources
-
Video Resources
-
-
Learning Activities
-
Assignment
-
Assignment
-
Assignment
-
Assignment
-
Assignment
-
References & Summary
-
Page
-
This is the second assignment that will count for the final module marking. Instructions on the assignment can be downloaded below.