class: center, middle, inverse, title-slide # Statistics with R: ## Data Wrangling | Data Visualisation | Basic Modelling ### Arif P. Sulistiono ###
Kelas Data on 28 Oct. 2021, 4 Nov. 2021, and 11 Nov. 2021
An employee of the Republic of Indonesia's Ministry of Finance. Funded by the Indonesia Endowment Fund for Education ("Lembaga Pengelola Dana Pendidikan"), at the moment, on study leave to join a PhD program in the School of Economics, the University of Nottingham with research interests in Indonesia's government bonds market and their bondholders' behaviour. 

Also a research assistant at Also a research assistant at <a href=""></a>. <br> <br> <a href=""><i class="fa fa-envelope fa-fw"></i></a></a> <a href=""> <i class="fa fa-facebook fa-fw"></i></a> <a href=""><i class="fa fa-github fa-fw"></i></a> <a href=""> <i class="fa fa-instagram fa-fw"></i></a> <a href=""> <i class="fa fa-medium fa-fw"></i></a> <a href=""> <i class="fa fa-twitter fa-fw"></i></a> <a href=""> <i class="fa fa-wordpress fa-fw"></i></a> ] --- # <span style="color:orange">**Fun**</span>ctions .panelset[ .panel[.panel-name[1: Data Wrangling] .pull-left[ ## .center[<svg viewBox="0 0 640 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns=""> <path d="M575.2 325.7c.2-1.9.8-3.7.8-5.6 0-35.3-28.7-64-64-64-12.6 0-24.2 3.8-34.1 10-17.6-38.8-56.5-66-101.9-66-61.8 0-112 50.1-112 112 0 3 .7 5.8.9 8.7-49.6 3.7-88.9 44.7-88.9 95.3 0 53 43 96 96 96h272c53 0 96-43 96-96 0-42.1-27.2-77.4-64.8-90.4zm-430.4-22.6c-43.7-43.7-43.7-114.7 0-158.3 43.7-43.7 114.7-43.7 158.4 0 9.7 9.7 16.9 20.9 22.3 32.7 9.8-3.7 20.1-6 30.7-7.5L386 81.1c4-11.9-7.3-23.1-19.2-19.2L279 91.2 237.5 8.4C232-2.8 216-2.8 210.4 8.4L169 91.2 81.1 61.9C69.3 58 58 69.3 61.9 81.1l29.3 87.8-82.8 41.5c-11.2 5.6-11.2 21.5 0 27.1l82.8 41.4-29.3 87.8c-4 11.9 7.3 23.1 19.2 19.2l76.1-25.3c6.1-12.4 14-23.7 23.6-33.5-13.1-5.4-25.4-13.4-36-24zm-4.8-79.2c0 40.8 29.3 74.8 67.9 82.3 8-4.7 16.3-8.8 25.2-11.7 5.4-44.3 31-82.5 67.4-105C287.3 160.4 258 140 224 140c-46.3 0-84 37.6-84 83.9z"></path></svg>] * Rows: `filter()`, `arrange()`, `recode()`, `slice()`, `slice_min()`, `slice_max()`, `slice_head()`, `slice_tail()` * Columns: `select()`, `relocate()`, `rename()` * Both: `count()`, `mutate()`, `transmute()` ] .pull-right[ ## .center[<svg viewBox="0 0 576 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns=""> <path d="M342.8 352.7c5.7-9.6 9.2-20.7 9.2-32.7 0-35.3-28.7-64-64-64-17.2 0-32.8 6.9-44.3 17.9-16.3-29.6-47.5-49.9-83.7-49.9-53 0-96 43-96 96 0 2 .5 3.8.6 5.7C27.1 338.8 0 374.1 0 416c0 53 43 96 96 96h240c44.2 0 80-35.8 80-80 0-41.9-32.3-75.8-73.2-79.3zm222.5-54.3c-93.1 17.7-178.5-53.7-178.5-147.7 0-54.2 29-104 76.1-130.8 7.3-4.1 5.4-15.1-2.8-16.7C448.4 1.1 436.7 0 425 0 319.1 0 233.1 85.9 233.1 192c0 8.5.7 16.8 1.8 25 5.9 4.3 11.6 8.9 16.7 14.2 11.4-4.7 23.7-7.2 36.4-7.2 52.9 0 96 43.1 96 96 0 3.6-.2 7.2-.6 10.7 23.6 10.8 42.4 29.5 53.5 52.6 54.4-3.4 103.7-29.3 137.1-70.4 5.3-6.5-.5-16.1-8.7-14.5z"></path></svg>] * Analysing: `group_by()`, `summarise()`/`summarize()`, `rowwise()` * Merging: `left_join()`, `inner_join()`, `right_join()`, `full_join()`, `semi_join()`, `anti_join()` * Manipulating: `ifelse()` * Dealing with `NA`: `fill()`, `replace()`, `zoo::na.approx()`, `drop_na()` * Combining: `rbind()`, `bind_rows`, `cbind()`, `bind_cols` * Reshaping: `pivot_wider()`, `pivot_longer()` ] ] .panel[.panel-name[2: Data Visualisation] .pull-left[ ## .center[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns=""> <path d="M416 128c-.6 0-1.1.2-1.6.2 1.1-5.2 1.6-10.6 1.6-16.2 0-44.2-35.8-80-80-80-24.6 0-46.3 11.3-61 28.8C256.4 24.8 219.3 0 176 0 114.1 0 64 50.1 64 112c0 7.3.8 14.3 2.1 21.2C27.8 145.8 0 181.5 0 224c0 53 43 96 96 96h320c53 0 96-43 96-96s-43-96-96-96zM88 374.2c-12.8 44.4-40 56.4-40 87.7 0 27.7 21.5 50.1 48 50.1s48-22.4 48-50.1c0-31.4-27.2-43.1-40-87.7-2.2-8.1-13.5-8.5-16 0zm160 0c-12.8 44.4-40 56.4-40 87.7 0 27.7 21.5 50.1 48 50.1s48-22.4 48-50.1c0-31.4-27.2-43.1-40-87.7-2.2-8.1-13.5-8.5-16 0zm160 0c-12.8 44.4-40 56.4-40 87.7 0 27.7 21.5 50.1 48 50.1s48-22.4 48-50.1c0-31.4-27.2-43.1-40-87.7-2.2-8.1-13.5-8.5-16 0z"></path></svg>] * Scatter plot: `geom_point()` * Line chart: `geom_line()` * Bar plot: `geom_bar()`, `geom_col()` * Box plot: `geom_boxplot()` * Histogram: `geom_histogram()` ] ] .panel[.panel-name[3: Basic Modelling] .pull-left[ ## .center[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns=""> <path d="M48 352c-26.5 0-48 21.5-48 48s21.5 48 48 48 48-21.5 48-48-21.5-48-48-48zm416 0c-26.5 0-48 21.5-48 48s21.5 48 48 48 48-21.5 48-48-21.5-48-48-48zm-119 11.1c4.6-14.5 1.6-30.8-9.8-42.3-11.5-11.5-27.8-14.4-42.3-9.9-7-13.5-20.7-23-36.9-23s-29.9 9.5-36.9 23c-14.5-4.6-30.8-1.6-42.3 9.9-11.5 11.5-14.4 27.8-9.9 42.3-13.5 7-23 20.7-23 36.9s9.5 29.9 23 36.9c-4.6 14.5-1.6 30.8 9.9 42.3 8.2 8.2 18.9 12.3 29.7 12.3 4.3 0 8.5-1.1 12.6-2.5 7 13.5 20.7 23 36.9 23s29.9-9.5 36.9-23c4.1 1.3 8.3 2.5 12.6 2.5 10.8 0 21.5-4.1 29.7-12.3 11.5-11.5 14.4-27.8 9.8-42.3 13.5-7 23-20.7 23-36.9s-9.5-29.9-23-36.9zM512 224c0-53-43-96-96-96-.6 0-1.1.2-1.6.2 1.1-5.2 1.6-10.6 1.6-16.2 0-44.2-35.8-80-80-80-24.6 0-46.3 11.3-61 28.8C256.4 24.8 219.3 0 176 0 114.1 0 64 50.1 64 112c0 7.3.8 14.3 2.1 21.2C27.8 145.8 0 181.5 0 224c0 53 43 96 96 96h43.4c3.6-8 8.4-15.4 14.8-21.8 13.5-13.5 31.5-21.1 50.8-21.3 13.5-13.2 31.7-20.9 51-20.9s37.5 7.7 51 20.9c19.3.2 37.3 7.8 50.8 21.3 6.4 6.4 11.3 13.8 14.8 21.8H416c53 0 96-43 96-96z"></path></svg>] * Loading the dataset: `read_csv()` or `read_excel()`, * Correlation: `Hmisc::rcorr(as.matrix())` * Summary: `stargazer()` * Linear regression: `lm()`, `lm_robust()`, `dynlm()` * Reporting: `broom::tidy()` * Saving: `write_csv()` ] ] ] --- class: inverse, center, middle # %>% .large[Pipe operator: ...then] --- # Prerequisites ### Clean the environment ```r rm(list=ls()) ls() ``` -- ### Install the package & load the library ```r install.packages("tidyverse") library(tidyverse) ``` -- ### Set the working directory ```r getwd() setwd("/Users/arifpras/OneDrive - The University of Nottingham/BB_KelasData") dir() ``` --- exclude: false # Data ## Ages ```r library(readxl) op_ages <- read_excel(path = "/Users/arifpras/OneDrive - The University of Nottingham/BB_KelasData/KelasData/00_Datasets/OP_all.xlsx", sheet = "OP_ages") DT::datatable(op_ages, fillContainer = FALSE, options = list(pageLength = 3)) ```
--- exclude: false # Data ## Powers ```r library(readxl) op_powers <- read_excel(path = "/Users/arifpras/OneDrive - The University of Nottingham/BB_KelasData/KelasData/00_Datasets/OP_all.xlsx", sheet = "OP_powers") DT::datatable(op_powers, fillContainer = FALSE, options = list(pageLength = 3)) ```
--- class: inverse, center, middle # Data Visualisation --- # Datasaurus .center[ ![:scale 40%]( Never trust summary statistics alone. ] .footnote[.small[Source: `library(datasauRus)`;]] --- # ggplot2 .pull-left[ ### .left[Basic elements: * Data: `data = ...` * Geometries: `geom_` * Aesthetics: `aes(x = ..., y = ..., ...)` * Scales: `scale_` * Statistical transformations: `stat_` * Coordinate system: `coord_` * Facets: `facet_` * Visual themes: `theme()` ] ] --- # Decision trees <iframe src="" width="100%" height="400px" data-external="1"></iframe> .left[.footnote[.small[Source:]]] --- # Assigning colors .panelset[ .panel[ .panel-name[1: Color palette] <center><img src="" width="1000px" /></center> <br><br><br><br><br><br><br><br> .footnote[.small[Source: "Hands-On Data Visualization" by Jack Dougherty & Ilya Ilyankou]] ] .panel[ .panel-name[2: Qualitative variables] <center><img src="" height="400px" /></center> <br><br> .footnote[.small[Source:]] ] .panel[ .panel-name[3: Quantitative variables] <center><img src="" height="400px" /></center> <br><br> .footnote[.small[Source:]] ] ] --- # Line plot .center[ <img src="KelasData_files/figure-html/unnamed-chunk-7-1.png" width="70%" style="display: block; margin: auto;" /> ] --- # Animating bar plot .pull-left[ <br> <blockquote class="twitter-tweet" data-width="300" data-theme="light" data-cards="hidden" data-dnt="true" align="center"><p lang="in" dir="ltr">Visualisasi data tentang utang dan pendapatan per penduduk dengan menggunakan bahasa pemrograman R. Kode bisa dilihat pada tautan <a href=""></a>.<br><br>Secara rasio, tidak berbeda jauh dengan data rasio utang pemerintah terhadap PDB seperti yang disampaikan <a href="">@DJPPRkemenkeu</a>. <a href=""></a></p>— Arif P. Sulistiono (@arifpras) <a href="">May 1, 2020</a></blockquote> <script async src="" charset="utf-8"></script> ] .pull-right[ <img src="*tKoVzc3cI9ihIuovTFf2jA.gif" style="width: 80%" align="left"> ] --- # Animating line plot .pull-left[ <br> <blockquote class="twitter-tweet" data-width="300" data-theme="light" data-cards="hidden" data-dnt="true" align="center"><p lang="in" dir="ltr">Visualisasi data tentang pergerakan nilai tukar <a href="">#rupiah</a> terhadap dollar AS pada <a href="">#krisis</a> 2008 dan tahun ini dengan menggunakan bahasa pemrograman <a href="">#R</a>. Kode bisa dilihat pada tautan <a href=""></a>. <a href=""></a></p>— Arif P. Sulistiono (@arifpras) <a href="">May 4, 2020</a></blockquote> <script async src="" charset="utf-8"></script> ] .pull-right[ <img src="*W58miPPhRcKO4dYf6eBqvw.gif" style="width: 80%" align="left"> ] --- # Practical sessions <iframe src="" width="100%" height="400px" data-external="1"></iframe> .left[.footnote[.small[Source:]]] --- class: inverse, center, middle # Basic Modelling --- # Initial specifications .center[ ![:scale 50%]( $$ \text{Sales}_t = \alpha_0 + \alpha_1 \text{Chapters}_t + \alpha_2 \text{LastMovie}_t + \alpha_3 \text{VIX index}_t + \upsilon_t $$ $$ \text{Sales}_t = \beta_0 + \beta_1 \text{Pages}_t + \beta_2 \text{LastMovie}_t + \beta_3 \text{VIX index}_t + \epsilon_t $$ ] --- # Dataset ```r library(readxl) op_sales <- read_excel(path = "/Users/arifpras/OneDrive - The University of Nottingham/BB_KelasData/KelasData/00_Datasets/OP_all.xlsx", sheet = "OP_sales") DT::datatable(op_sales, fillContainer = FALSE, options = list(pageLength = 3)) ```
