Lecture Note Week 3

Note

Learning Objective:

LO1: Understand basic R syntax

LO2: Understand basic Power BI point and click function

Section 1: Introduction to R Language

What is R?

R is a programming language and free software environment for statistical computing and graphics supported by the R Foundation for Statistical Computing. The R language is widely used among statisticians and data miners for developing statistical software and data analysis.

Why R?

  • R is open-source and free
  • R is a powerful tool for data analysis
  • R has a large and active community
  • R has a large number of packages for data analysis
  • R is a programming language and can be used for automating data analysis tasks
  • R is a great tool for reproducible research

RStudio

Rsudio is an integrated development environment (IDE) for R. It includes a console, syntax-highlighting editor that supports direct code execution, as well as tools for plotting, history, debugging and workspace management.

Basic of R

R as a Calculator

R Objects

In R, everything is an object. These objects serve as containers for various types of data. Whether you’re dealing with a single number, a character string (like a word), or a complex structure like the output of a plot or a statistical analysis summary, it’s all represented as an object.

Creating Objects:

To create an object, you simply give it a name. For instance

In this example, an object is created called x and it is assigned the value 5. The <- is the assignment operator. It assigns the value on the right to the object on the left. You can also use = to assign values to objects, but it’s considered bad practice.

Viewing Objects:

To view the value of an object, you simply type the name of the object and press enter. For instance:

R keeps track of all objects in the current workspace during the session. You can see all the objects in the current workspace by typing ls() in the console.

Opearations with Objects:

You can perform operations with objects. For instance:

Objects names:

Object names can contain letters, numbers, periods, and underscores. However, they can only starts with letters or underscore and nothing else. They are case-sensitive, so x and X are different objects. They cannot start with a number or a period. If you would like to insist to have numbers or period as the first character, you can use backticks to define the object name. It is called nonsyntactic names. For instance, you can define the following:

Exercises:

  1. Create an object called a and assign the value 10 to it.
  2. Create an object called b and assign the value 20 to it.
  3. Create an object called star and assign the value a + b to it.
  4. View the value of star.
  5. Which of the following is not the syntactic name for an object?
    • x
    • X
    • 1x
    • x1
    • x.y
    • x_y

Data Types

R has several data types. The most common data types are:

  1. Numeric (double)
  • represents eal numbers (e.g., 3.14, 0.0001, 1000.0).
  • can be positive or negative.
  • can be in scientific notation (e.g., 1.23e-5).
  • used for continuous data like measurements, weights, heights, etc.
  1. Character
  • represents text data (e.g., “hello”, “world”, “R is fun”).
  • must be enclosed in quotes.
  • used for categorical data (e.g., “High School”, “Primary School”, “University”).
  1. Logical (boolean)
  • represents binary data (e.g., TRUE or FALSE).
  • used for logical operations.
  1. Integer
  • represents whole numbers (e.g., 1, 2, 3, 1000).
  • can be positive or negative.
  • used for counting data like number of students, number of cars, etc.
  • sometimes you will see it ends with L (e.g., 1L, 2L, 3L, 1000L). This is to indicate that the number is an integer.
  1. Factors
  • represents categorical data (e.g., “High School”, “Primary School”, “University”).
  • used for categorical data.
  1. Complex
  • represents complex numbers (e.g., 1 + 2i, 3 + 4i, 5 + 6i).
  • used for complex data like electrical engineering, physics, etc.

However, we seldom deal with complex data types. We will focus on the first four data types.

Exercises:

  1. What is the data type of 3.14?
  2. What is the data type of "hello"?
  3. What is the data type of TRUE?
  4. What is the data type of 1L?
  5. What is the data type of factor("High School")?
  6. What is the data type of 1 + 2i?
  7. What is the data type of "1"?

Note: You can check your answers using typeof() function.

Functions

A function is a block of code that performs a specific task. R has a large number of in-built functions and also allows users to define their own functions. We will learn more about how to create functions in the coming weeks. But so far, we will use some of the in-built functions. Anything that starts with ( and end with ) is a function.

Exercises:

Which of the following is a function?

  1. mean()
  2. median
  3. sd()
  4. var()
  5. sum[]

Vectors

Vectors are the most basic data structure in R. They are one-dimensional arrays that can hold numeric, character, or logical data. You can create a vector using the c() function. For instance:

Accessing Elements:

You can access the elements of a vector using the index. For instance:

Exercises:

  1. Create a vector called a with the values 11, 2, 33, 4, 5.
  2. Create a vector called b with the values "a", "bc", "c", "d", "ef".

Data Frames

Data frames are the most common data structure in R. They are used to store tabular data. A data frame is a list of vectors of equal length. Each vector represents a column in the data frame, and each element in the vector represents a row in the data frame.

Creating Data Frames:

You can create a data frame using the data.frame() function. For instance:

Accessing Data Frames:

You can access the elements of a data frame using the row and column indices. For instance:

Exercises:

  1. Create a data frame called students with the following columns: name, age, gender, height, and weight. The data should be as follows:
  • name: “Alice”, “Bob”, “Charlie”, “David”
  • age: 25, 30, 35, 40
  • gender: “F”, “M”, “M”, “M”

Lists

Lists are another data structure in R that can hold elements of different data types. You can create a list using the list() function. For instance:

You can access the elements of a list using the index. For instance:

Exercises:

  1. Create a list called y with the following elements: 1, "b", FALSE.

Packages

R packages are collections of functions and data sets developed by the community. They increase the power of R by improving existing base R functionalities, or by adding new ones.

You can install packages using the install.packages() function, and load them using the library() function. For instance:

You do not need to install packages every time. You just need to install it once, and you can use them directly by loading them using the library() function.

Section 2: Introduction to Power BI

Welcome to Power BI!

Power BI is a powerful business analytics tool that enables you to visualize data, share insights, and make data-driven decisions. This introductory document will guide you through the basics of Power BI and help you get started with creating your first report.

1. What is Power BI?

Power BI is a suite of business analytics tools by Microsoft that includes software, services, and connectors. It transforms your unrelated sources of data into coherent, visually immersive, and interactive insights. Your data may be an Excel spreadsheet, or a collection of cloud-based and on-premises hybrid data warehouses.

Getting Started

1. Download and Install Power BI Desktop

  • Go to the Power BI website and download the Power BI Desktop.

  • Follow the installation instructions to install Power BI Desktop on your computer.

2. Open Power BI Desktop

Once installed, open Power BI Desktop. You will see the following interface:

3. Connect to Data

To start working with data, you need to connect Power BI to your data source. Here’s how:

  1. Click on the “Home” tab.
  2. Click on the “Get Data” button.

  1. Select your data source from the list (e.g., Excel, SQL Server, Web, etc.).

  1. Follow the prompts to load your data into Power BI.

4. Transform Data with Power Query Editor

After loading your data, you might need to clean and transform it. Use Power Query Editor for this purpose:

  • Click on the “Home” tab.

  • Click on the “Transform Data” button.

The Power Query Editor will open, where you can clean and prepare your data.

5. Create Visualizations

Once your data is ready, you can start creating visualizations:

  • Go back to the main Power BI Desktop window.

  • In the “Visualizations” pane, select the type of visualization you want to create (e.g., bar chart, line chart, pie chart, etc.).

  • Drag and drop fields from the “Fields” pane to the “Values”, “Axis”, “Legend”, etc., in the “Visualizations” pane.

6. Build Reports

Combine multiple visualizations into a report:

  • Use the “Report” view in Power BI Desktop.

  • Arrange your visualizations on the canvas.

  • Add titles, labels, and other elements to enhance your report.

7. Save and Publish Your Report 👈 (This feature will not be taught for this course as it needs a paid version)

  • Click on the “File” tab.

  • Select “Save As” to save your report locally.

  • To share your report, click on the “Publish” button.

Follow the prompts to publish your report to the Power BI service, where you can share it with others.

Conclusion

Congratulations! You have successfully created your first Power BI report. Power BI is a versatile tool with many advanced features, so continue exploring and experimenting to unlock its full potential. Happy data analyzing!

Section 3: Data Wrangling in Power BI

1. Introduction to Power BI

Overview of Power BI: Power BI is a suite of business analytics tools that deliver insights throughout your organization. You can connect to hundreds of data sources, simplify data prep, and drive ad hoc analysis. Produce beautiful reports, then publish them for your organization to consume on the web and across mobile devices.

Understanding the Power BI Interface:

  • Home Ribbon (1): Access common actions like getting data, viewing data transformations, and publishing reports.

  • Report View (2): Design your report using various visualizations.

  • Data View (2) : See the data tables and make transformations.

  • Model View (2): Manage the relationships between different tables in your dataset.

  • Fields Pane (3): Contains the tables and fields in your data model.

  • Visualizations Pane (3): Choose from a variety of visualizations to represent your data.

Connecting to Various Data Sources: To get started, open Power BI and load your dataset:

  1. Go to the Home tab and click Get Data.

  2. Choose the type of data source (e.g., Excel, SQL Server, etc.).

  1. Navigate to the location of your dataset and load it into Power BI.

2. Data Import and Transformation

Importing Data: Once the dataset is loaded, you can see the data in the Data View. If needed, you can refresh the data to get the latest updates.

Using Power Query Editor:

  1. Click on Transform Data to open the Power Query Editor.

  2. In the Power Query Editor, you can see a list of queries (tables) on the left, the data preview in the middle, and applied steps on the right.

Video Resources for Power BI (Please refer to lecture video)

Enter data manually

Step 1: Open Power BI Desktop 1. Start Power BI Desktop on your computer.

Step 2: Import Data from a CSV File

1. Click on the Home tab in the top menu.

2. Click on Get Data in the ribbon.

3. From the dropdown menu, select Text/CSV.

4. Locate and select your CSV file from your computer, then click Open.

5. In the preview window, review your data to ensure it looks correct.

6. Click Load to import the data into Power BI.

Step 3: Manually Enter Data

1. Click on the Home tab in the top menu.

2. Click on Enter Data in the ribbon.

3. In the new window that opens, you can manually enter your data. Here is an example: - Column1: Product - Column2: Sales - Row1: “Product A”, 100 - Row2: “Product B”, 200

4. After entering your data, click Load to add the table to your model.

Importing data

  1. Click on the Home tab in the top menu.
  2. Click on Get Data in the ribbon.
  3. Select the data source you want to import (e.g., Excel, SQL Server, Web, etc.).
  4. Follow the prompts to connect to your data source and load the data into Power BI.
  5. Review the data preview to ensure it looks correct.
  6. Click Load to import the data into Power BI.

Removing rows and columns

  1. Under the Home tab, click on Transform Data to open the Power Query Editor.

  2. In the Power Query Editor, select the column or row you want to remove.

  3. Right-click on the selected column or row and choose Remove.

  4. Click Close & Apply to save your changes and return to Power BI.

Changing data types

  1. In the Power Query Editor, select the column you want to change the data type for.

  2. Right-click on the column header and choose Change Type.

Data transformation

  1. In the Power Query Editor, select the column you want to transform.

  2. Click on the Transform tab in the top menu.

  3. Choose from various transformation options, such as splitting columns, merging columns, or replacing values.

  4. Go back to Home tab and click Close & Apply to save your changes and return to Power BI.

Mathematical operations

  1. In the Power Query Editor, select the column you want to perform mathematical operations on.

  2. Click on the Add Column tab in the top menu.

  3. Choose from various mathematical operations, such as addition, subtraction, multiplication, or division.

  4. Go to Standard and select the operation you want to perform.

  5. Now highlight two different columns and you should be able to see Statistics tab is not greyed out anymore. Click on Statistics and select the operation you want to perform.

  6. Go to Statistics and select the operation you want to perform.

  7. Go back to Home tab and click Close & Apply to save your changes and return to Power BI.

Refreshing data source

  1. In Power BI, go to the Home tab.

  2. Click on Refresh to update the data from the original data source.

Filtering values

  1. In the Power Query Editor, select the column you want to filter.

  2. Click on the Transform tab in the top menu.

  3. Click on the variable side drop-down list and choose Text Filters or Number Filters to filter the values based on your criteria.

Replacing values

  1. In the Power Query Editor, select the column you want to replace values in.

  2. Click on the Transform tab in the top menu.

  3. Click on Replace Values and enter the old value and new value you want to replace it with.

Removing duplicates/missing values

  1. In the Power Query Editor, select the column you want to remove duplicates or missing values from.

  2. Click on the Home tab in the top menu.

  3. Click on Remove Rows and choose Remove Duplicates or Remove Blank Rows.

Group By

  1. In the Power Query Editor, select the column you want to group by.

  2. Click on the Home tab in the top menu.

  3. Click on Group By and choose the column you want to group by and the operation you want to perform (e.g., sum, average, count).

Pivot

  1. In the Power Query Editor, select the columns you want to pivot.

  2. Click on the Transform tab in the top menu.

  3. Click on Pivot Column and choose the values you want to pivot.

  4. Click Close & Apply to save your changes and return to Power BI.