class: center, middle, inverse, title-slide # Statistics with R: ## Data Wrangling | Data Visualisation | Basic Modelling ### Arif P. Sulistiono ###
Kelas Data on 28 Oct. 2021, 4 Nov. 2021, and 11 Nov. 2021
mof.dac
--- # About me .center[ ![:scale 50%](https://pa1.narvii.com/5869/51088074557b4091944c2007ec33384020920f87_hq.gif) An employee of the Republic of Indonesia’s Ministry of Finance. Funded by the Indonesia Endowment Fund for Education ("Lembaga Pengelola Dana Pendidikan"), at the moment, on study leave to join a PhD program in the <a href="https://www.nottingham.ac.uk/economics/people/arif.sulistiono">School of Economics, the University of Nottingham</a> with research interests in Indonesia's government bonds market and their bondholders' behaviour. Also a research assistant at <a href="https://www.tracktheeconomy.ac.uk/arif-sulistiono">tracktheeconomy.ac.uk</a>. <br> <br> <a href="mailto:ap.sulistiono@gmail.com"><i class="fa fa-envelope fa-fw"></i></a></a> <a href="https://www.facebook.com/arifpras"> <i class="fa fa-facebook fa-fw"></i></a> <a href="https://github.com/arifpras"><i class="fa fa-github fa-fw"></i></a> <a href="https://www.instagram.com/arifpras"> <i class="fa fa-instagram fa-fw"></i></a> <a href="https://arifpras.medium.com/"> <i class="fa fa-medium fa-fw"></i></a> <a href="https://twitter.com/arifpras"> <i class="fa fa-twitter fa-fw"></i></a> <a href="https://arifpras.com"> <i class="fa fa-wordpress fa-fw"></i></a> ] --- # <span style="color:orange">**Fun**</span>ctions .panelset[ .panel[.panel-name[1: Data Wrangling] .pull-left[ ## .center[<svg viewBox="0 0 640 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M575.2 325.7c.2-1.9.8-3.7.8-5.6 0-35.3-28.7-64-64-64-12.6 0-24.2 3.8-34.1 10-17.6-38.8-56.5-66-101.9-66-61.8 0-112 50.1-112 112 0 3 .7 5.8.9 8.7-49.6 3.7-88.9 44.7-88.9 95.3 0 53 43 96 96 96h272c53 0 96-43 96-96 0-42.1-27.2-77.4-64.8-90.4zm-430.4-22.6c-43.7-43.7-43.7-114.7 0-158.3 43.7-43.7 114.7-43.7 158.4 0 9.7 9.7 16.9 20.9 22.3 32.7 9.8-3.7 20.1-6 30.7-7.5L386 81.1c4-11.9-7.3-23.1-19.2-19.2L279 91.2 237.5 8.4C232-2.8 216-2.8 210.4 8.4L169 91.2 81.1 61.9C69.3 58 58 69.3 61.9 81.1l29.3 87.8-82.8 41.5c-11.2 5.6-11.2 21.5 0 27.1l82.8 41.4-29.3 87.8c-4 11.9 7.3 23.1 19.2 19.2l76.1-25.3c6.1-12.4 14-23.7 23.6-33.5-13.1-5.4-25.4-13.4-36-24zm-4.8-79.2c0 40.8 29.3 74.8 67.9 82.3 8-4.7 16.3-8.8 25.2-11.7 5.4-44.3 31-82.5 67.4-105C287.3 160.4 258 140 224 140c-46.3 0-84 37.6-84 83.9z"></path></svg>] * Rows: `filter()`, `arrange()`, `recode()`, `slice()`, `slice_min()`, `slice_max()`, `slice_head()`, `slice_tail()` * Columns: `select()`, `relocate()`, `rename()` * Both: `count()`, `mutate()`, `transmute()` ] .pull-right[ ## .center[<svg viewBox="0 0 576 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M342.8 352.7c5.7-9.6 9.2-20.7 9.2-32.7 0-35.3-28.7-64-64-64-17.2 0-32.8 6.9-44.3 17.9-16.3-29.6-47.5-49.9-83.7-49.9-53 0-96 43-96 96 0 2 .5 3.8.6 5.7C27.1 338.8 0 374.1 0 416c0 53 43 96 96 96h240c44.2 0 80-35.8 80-80 0-41.9-32.3-75.8-73.2-79.3zm222.5-54.3c-93.1 17.7-178.5-53.7-178.5-147.7 0-54.2 29-104 76.1-130.8 7.3-4.1 5.4-15.1-2.8-16.7C448.4 1.1 436.7 0 425 0 319.1 0 233.1 85.9 233.1 192c0 8.5.7 16.8 1.8 25 5.9 4.3 11.6 8.9 16.7 14.2 11.4-4.7 23.7-7.2 36.4-7.2 52.9 0 96 43.1 96 96 0 3.6-.2 7.2-.6 10.7 23.6 10.8 42.4 29.5 53.5 52.6 54.4-3.4 103.7-29.3 137.1-70.4 5.3-6.5-.5-16.1-8.7-14.5z"></path></svg>] * Analysing: `group_by()`, `summarise()`/`summarize()`, `rowwise()` * Merging: `left_join()`, `inner_join()`, `right_join()`, `full_join()`, `semi_join()`, `anti_join()` * Manipulating: `ifelse()` * Dealing with `NA`: `fill()`, `replace()`, `zoo::na.approx()`, `drop_na()` * Combining: `rbind()`, `bind_rows`, `cbind()`, `bind_cols` * Reshaping: `pivot_wider()`, `pivot_longer()` ] ] .panel[.panel-name[2: Data Visualisation] .pull-left[ ## .center[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M416 128c-.6 0-1.1.2-1.6.2 1.1-5.2 1.6-10.6 1.6-16.2 0-44.2-35.8-80-80-80-24.6 0-46.3 11.3-61 28.8C256.4 24.8 219.3 0 176 0 114.1 0 64 50.1 64 112c0 7.3.8 14.3 2.1 21.2C27.8 145.8 0 181.5 0 224c0 53 43 96 96 96h320c53 0 96-43 96-96s-43-96-96-96zM88 374.2c-12.8 44.4-40 56.4-40 87.7 0 27.7 21.5 50.1 48 50.1s48-22.4 48-50.1c0-31.4-27.2-43.1-40-87.7-2.2-8.1-13.5-8.5-16 0zm160 0c-12.8 44.4-40 56.4-40 87.7 0 27.7 21.5 50.1 48 50.1s48-22.4 48-50.1c0-31.4-27.2-43.1-40-87.7-2.2-8.1-13.5-8.5-16 0zm160 0c-12.8 44.4-40 56.4-40 87.7 0 27.7 21.5 50.1 48 50.1s48-22.4 48-50.1c0-31.4-27.2-43.1-40-87.7-2.2-8.1-13.5-8.5-16 0z"></path></svg>] * Scatter plot: `geom_point()` * Line chart: `geom_line()` * Bar plot: `geom_bar()`, `geom_col()` * Box plot: `geom_boxplot()` * Histogram: `geom_histogram()` ] ] .panel[.panel-name[3: Basic Modelling] .pull-left[ ## .center[<svg viewBox="0 0 512 512" style="height:1em;position:relative;display:inline-block;top:.1em;" xmlns="http://www.w3.org/2000/svg"> <path d="M48 352c-26.5 0-48 21.5-48 48s21.5 48 48 48 48-21.5 48-48-21.5-48-48-48zm416 0c-26.5 0-48 21.5-48 48s21.5 48 48 48 48-21.5 48-48-21.5-48-48-48zm-119 11.1c4.6-14.5 1.6-30.8-9.8-42.3-11.5-11.5-27.8-14.4-42.3-9.9-7-13.5-20.7-23-36.9-23s-29.9 9.5-36.9 23c-14.5-4.6-30.8-1.6-42.3 9.9-11.5 11.5-14.4 27.8-9.9 42.3-13.5 7-23 20.7-23 36.9s9.5 29.9 23 36.9c-4.6 14.5-1.6 30.8 9.9 42.3 8.2 8.2 18.9 12.3 29.7 12.3 4.3 0 8.5-1.1 12.6-2.5 7 13.5 20.7 23 36.9 23s29.9-9.5 36.9-23c4.1 1.3 8.3 2.5 12.6 2.5 10.8 0 21.5-4.1 29.7-12.3 11.5-11.5 14.4-27.8 9.8-42.3 13.5-7 23-20.7 23-36.9s-9.5-29.9-23-36.9zM512 224c0-53-43-96-96-96-.6 0-1.1.2-1.6.2 1.1-5.2 1.6-10.6 1.6-16.2 0-44.2-35.8-80-80-80-24.6 0-46.3 11.3-61 28.8C256.4 24.8 219.3 0 176 0 114.1 0 64 50.1 64 112c0 7.3.8 14.3 2.1 21.2C27.8 145.8 0 181.5 0 224c0 53 43 96 96 96h43.4c3.6-8 8.4-15.4 14.8-21.8 13.5-13.5 31.5-21.1 50.8-21.3 13.5-13.2 31.7-20.9 51-20.9s37.5 7.7 51 20.9c19.3.2 37.3 7.8 50.8 21.3 6.4 6.4 11.3 13.8 14.8 21.8H416c53 0 96-43 96-96z"></path></svg>] * Loading the dataset: `read_csv()` or `read_excel()`, * Correlation: `Hmisc::rcorr(as.matrix())` * Summary: `stargazer()` * Linear regression: `lm()`, `lm_robust()`, `dynlm()` * Reporting: `broom::tidy()` * Saving: `write_csv()` ] ] ] --- class: inverse, center, middle # %>% .large[Pipe operator: ...then] --- # Prerequisites ### Clean the environment ```r rm(list=ls()) ls() ``` -- ### Install the package & load the library ```r install.packages("tidyverse") library(tidyverse) ``` -- ### Set the working directory ```r getwd() setwd("/Users/arifpras/OneDrive - The University of Nottingham/BB_KelasData") dir() ``` --- exclude: false # Data ## Ages ```r library(readxl) op_ages <- read_excel(path = "/Users/arifpras/OneDrive - The University of Nottingham/BB_KelasData/KelasData/00_Datasets/OP_all.xlsx", sheet = "OP_ages") DT::datatable(op_ages, fillContainer = FALSE, options = list(pageLength = 3)) ```
--- exclude: false # Data ## Powers ```r library(readxl) op_powers <- read_excel(path = "/Users/arifpras/OneDrive - The University of Nottingham/BB_KelasData/KelasData/00_Datasets/OP_all.xlsx", sheet = "OP_powers") DT::datatable(op_powers, fillContainer = FALSE, options = list(pageLength = 3)) ```
--- class: inverse, center, middle # Data Visualisation --- # Datasaurus .center[ ![:scale 40%](https://raw.githubusercontent.com/Z3tt/DataViz-Teaching/master/Datasaurus/datasauRus.gif) Never trust summary statistics alone. ] .footnote[.small[Source: `library(datasauRus)`; https://github.com/Z3tt/TidyTuesday/tree/master/plots/2020_42]] --- # ggplot2 .pull-left[ ### .left[Basic elements: * Data: `data = ...` * Geometries: `geom_` * Aesthetics: `aes(x = ..., y = ..., ...)` * Scales: `scale_` * Statistical transformations: `stat_` * Coordinate system: `coord_` * Facets: `facet_` * Visual themes: `theme()` ] ] --- # Decision trees <iframe src="https://www.data-to-viz.com/#portfolio" width="100%" height="400px" data-external="1"></iframe> .left[.footnote[.small[Source: https://www.data-to-viz.com]]] --- # Assigning colors .panelset[ .panel[ .panel-name[1: Color palette] <center><img src="https://d33wubrfki0l68.cloudfront.net/c25e86bc59337d57b4e24c4bf80ecbb12db841f8/59edf/img/ggplot-tutorial/map-principles-color-schemes.png" width="1000px" /></center> <br><br><br><br><br><br><br><br> .footnote[.small[Source: "Hands-On Data Visualization" by Jack Dougherty & Ilya Ilyankou]] ] .panel[ .panel-name[2: Qualitative variables] <center><img src="https://d33wubrfki0l68.cloudfront.net/1336224150e3b2822a76db24f51728b41f0540f5/25251/img/ggplot-tutorial/nominal_ordinal_binary.png" height="400px" /></center> <br><br> .footnote[.small[Source: https://github.com/allisonhorst/stats-illustrations/]] ] .panel[ .panel-name[3: Quantitative variables] <center><img src="https://d33wubrfki0l68.cloudfront.net/f9c11a301f597b8e2e5a2f26c691b7c51450c7aa/754d3/img/ggplot-tutorial/continuous_discrete.png" height="400px" /></center> <br><br> .footnote[.small[Source: https://github.com/allisonhorst/stats-illustrations/]] ] ] --- # Line plot .center[ <img src="KelasData_files/figure-html/unnamed-chunk-7-1.png" width="70%" style="display: block; margin: auto;" /> ] --- # Animating bar plot .pull-left[ <br> <blockquote class="twitter-tweet" data-width="300" data-theme="light" data-cards="hidden" data-dnt="true" align="center"><p lang="in" dir="ltr">Visualisasi data tentang utang dan pendapatan per penduduk dengan menggunakan bahasa pemrograman R. Kode bisa dilihat pada tautan <a href="https://t.co/YzqY3hmGfc">https://t.co/YzqY3hmGfc</a>.<br><br>Secara rasio, tidak berbeda jauh dengan data rasio utang pemerintah terhadap PDB seperti yang disampaikan <a href="https://twitter.com/DJPPRkemenkeu?ref_src=twsrc%5Etfw">@DJPPRkemenkeu</a>. <a href="https://t.co/1D6k2nmHsW">pic.twitter.com/1D6k2nmHsW</a></p>— Arif P. Sulistiono (@arifpras) <a href="https://twitter.com/arifpras/status/1256101386758115328?ref_src=twsrc%5Etfw">May 1, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] .pull-right[ <img src="https://miro.medium.com/max/960/1*tKoVzc3cI9ihIuovTFf2jA.gif" style="width: 80%" align="left"> ] --- # Animating line plot .pull-left[ <br> <blockquote class="twitter-tweet" data-width="300" data-theme="light" data-cards="hidden" data-dnt="true" align="center"><p lang="in" dir="ltr">Visualisasi data tentang pergerakan nilai tukar <a href="https://twitter.com/hashtag/rupiah?src=hash&ref_src=twsrc%5Etfw">#rupiah</a> terhadap dollar AS pada <a href="https://twitter.com/hashtag/krisis?src=hash&ref_src=twsrc%5Etfw">#krisis</a> 2008 dan tahun ini dengan menggunakan bahasa pemrograman <a href="https://twitter.com/hashtag/R?src=hash&ref_src=twsrc%5Etfw">#R</a>. Kode bisa dilihat pada tautan <a href="https://t.co/YzqY3hmGfc">https://t.co/YzqY3hmGfc</a>. <a href="https://t.co/PMHFjWIJQB">pic.twitter.com/PMHFjWIJQB</a></p>— Arif P. Sulistiono (@arifpras) <a href="https://twitter.com/arifpras/status/1257366542553055232?ref_src=twsrc%5Etfw">May 4, 2020</a></blockquote> <script async src="https://platform.twitter.com/widgets.js" charset="utf-8"></script> ] .pull-right[ <img src="https://miro.medium.com/max/960/1*W58miPPhRcKO4dYf6eBqvw.gif" style="width: 80%" align="left"> ] --- # Practical sessions <iframe src="https://arifpras.github.io/WranglingViz/" width="100%" height="400px" data-external="1"></iframe> .left[.footnote[.small[Source: https://arifpras.github.io/WranglingViz/]]] --- class: inverse, center, middle # Basic Modelling --- # Initial specifications .center[ ![:scale 50%]( https://c.tenor.com/WcoVJ8aQcwkAAAAC/one-piece-monkey-d-luffy.gif) $$ \text{Sales}_t = \alpha_0 + \alpha_1 \text{Chapters}_t + \alpha_2 \text{LastMovie}_t + \alpha_3 \text{VIX index}_t + \upsilon_t $$ $$ \text{Sales}_t = \beta_0 + \beta_1 \text{Pages}_t + \beta_2 \text{LastMovie}_t + \beta_3 \text{VIX index}_t + \epsilon_t $$ ] --- # Dataset ```r library(readxl) op_sales <- read_excel(path = "/Users/arifpras/OneDrive - The University of Nottingham/BB_KelasData/KelasData/00_Datasets/OP_all.xlsx", sheet = "OP_sales") DT::datatable(op_sales, fillContainer = FALSE, options = list(pageLength = 3)) ```
--- # Dependent variable .center[ <img src="KelasData_files/figure-html/unnamed-chunk-10-1.png" width="70%" style="display: block; margin: auto;" /> ] --- # Independent variables .center[ <img src="KelasData_files/figure-html/unnamed-chunk-11-1.png" width="70%" style="display: block; margin: auto;" /> ] --- class: center, inverse, middle .center[ ![:scale 50%](https://media.giphy.com/media/f1kifoMrj6g4E/giphy.gif) ] .large[<span style="color:orange"> Let's practice! </span>] --- # Practical sessions <iframe src="https://arifpras.github.io/BasicModelling/" width="100%" height="400px" data-external="1"></iframe> .left[.footnote[.small[Source: https://arifpras.github.io/BasicModelling/]]] --- class: center, middle, clear .center[ ![:scale 50%](https://media.giphy.com/media/tIZUToOMEFGM0/giphy.gif) ] .large[ Thank you for listening. ] .small[**All teaching materials are available on https://github.com/arifpras/KelasData**] .small[ Slides created via the R packages: [**xaringan**](https://github.com/yihui/xaringan) and [gadenbuie/xaringanthemer](https://github.com/gadenbuie/xaringanthemer). <br> The chakra comes from [remark.js](https://remarkjs.com), [**knitr**](http://yihui.name/knitr), and [R Markdown](https://rmarkdown.rstudio.com). ] --- # Acknowledgements .small[ * Grolemund, G., & Wickham, H. (2017). R for Data Science. O’Reilly Media. * R Core Team (2019). R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria. URL https://www.R-project.org/ * Wickham et al., (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686, https://doi.org/10.21105/joss.01686 * Claus O. Wilke: Data Visualization in R, https://wilkelab.org/SDS375/ * Cedric Scherer's personal blog: https://www.cedricscherer.com/ * ...and other sources i.e. stackoverflow.com, github.com, etc. ### Datasets: * One Piece chapter and character: https://www.kaggle.com/michau96 * One Piece anime rating: https://www.kaggle.com/aditya2803 * One Piece power ranking: https://www.opfanpage.com/2018/06/29/one-piece-power-ranking-chart/ * One Piece characters' age: https://listfist.com/list-of-one-piece-characters-by-age * One Piece by volume: https://listfist.com/list-of-one-piece-volumes * One Piece by chapter: https://listfist.com/list-of-one-piece-manga-chapters * One Piece sales: https://erzat.blog/top-sales-according-to-series-for-the-month-of-september-2021/ and https://twitter.com/WSJ_manga/status/1214168838511702016 * VIX index: https://finance.yahoo.com/quote/%5EVIX?p=%5EVIX ### GIF files: * Luffy, flying: https://pa1.narvii.com/5869/51088074557b4091944c2007ec33384020920f87_hq.gif * Luffy, thinking: https://c.tenor.com/WcoVJ8aQcwkAAAAC/one-piece-monkey-d-luffy.gif * Shanks vs. Akainu: https://www.quora.com/What-would-happen-if-Deku-was-sent-to-the-One-Piece-universe ]