Resources

Here I have compiled a list or a commented bibliography, if you will, of resources that I have found useful along the course of my academic studies. By no means I am an expert on these topics but I do believe that these will help some people out. All efforts that I’ve dedicated to this page are dedicated to the memory of my grandfather, Jorge Pazmiño. Do not hesitate to contact me for suggestions and comments about these.

Learning R

R through RStudio, one of my longtime friends. Here you will find a list of resources which I believe can help you get started on your adventures with this marvelous software. It should be noted that I am including these resources with a practical rather than technical approach, which might be a little biased toward econometrics, so it is possible that people on other fields find other resources more helpful.

Probably the best introduction to the sofware that exists today. It has been endlessly recommended by all sorts of experts on R, and it is probably because it is a clear and practical guide on how to use R for all sorts of stuff (although the title says specifically data science I strongly believe this intro is fit for anybody who wants to use R). It will teach you what R is, what it isn’t and what it does best. It is a crash course on five major functions that the language has which are Wrangle, Program, Explore, Model and Communicate. However, because it is an introductory textbook, you’ll feel that it only scratches the surface on these three things. In my opinion the “wrangle” part is pretty comprehensive, but if you need more, you’ll have to go to other resources, specially for the “Communicate” part. The only difficulty you might find with this book is installing the software, but the next resource will help you with that.

A great tutorial series and overall a great YouTube channel, probably best for those that prefer learning by watching than by reading. Also, I believe that at least for installing a distribution of R and the RStudio IDE it is better to watch videos of other people doing so than reading it. The installation of the software might be one of the complications that R has compared to other software. The tutorial is done on a Mac, but as of right now I have not found that the installation process for a Windows computer is terribly different.

-Website: R Tutorials: Basics by Simon Ejdemyr.

A collection of written tutorials online. It might overlap with the resources that I am including here, however, this set of tutorials are a bit more technical and focus a bit more on Base R rather than the Tidyverse. Good to compliment your knowledge, and I know some of you out there prefer Base R to the Tidyverse.

A popular book which very clearly introduces R and simple statistics knowledge. I’ve seen it recommended consistently as a good guide if you feel you’ve understood R4DS but still feel unclear how to continue. Some good alternatives could be Exploratory Data Analysis with R, R in a Nutshell or the R Cookbook.

If you’re here from my GitHub repository and want to start using R to analyze the AmericasBarometer data, this is the tutorial for you. It will give you a very general overview of R and will teach you to do the more specific stuff needed to analyze the AmericasBarometer data. One of the most important things that this tutorial includes is the use of survey design objects of the survey package to produce survey-weighted results. Keep in mind it is written in Spanish, so if you can’t read Spanish and need help, don’t hesitate to contact me. Thomas Lumley’s Complex Surveys: A Guide to Analysis Using R is the only book I’ve found which discusses the use of survey data on R, but I found the book fairly advanced, so I think this resource is the best thing there is as far as I know.

Unlike R4DS, this book is an intro to programming in R, “powerful R tools for solving data problems”, rather than doing the data analysis. Suffice to say this book will teach you about objects, functions, loops, and other programming concepts which are also seen in other programming languages. A great complement for R4DS, but maybe not as essential for beginners or people focused to use R for data analysis only.

This is another amazing free book which walks you through the required R concepts, workflows, packages to master all examples and exercises in Introductory Econometrics: A Modern Approach by Jeffrey Wooldridge, but it also can serve as a standalone introduction to the R language and its applications for econometrics. It will serve as one of the most powerful weapons on an Economics student’s arsenal while in school, considering that I’ve often heard that in econometrics courses the software applications are often neglected. This book is considerably better for beginners compared to other classic books on regression and econometrics like An R Companion to Applied Regression and Applied Econometrics with R. It also has a sister textbook, Using Python for Introductory Econometrics for those interested in applying Python to econometrics as well.

A great book which serves as an introduction to the concepts of causality and research design, which includes code on R, Stata and Python. This is the next step from URfIE for students of econometrics, as it is simple enough to not require complicated mathematics but it comprehensive enough to cover advanced econometric topics. It is very accesible and a pathway into more difficult econometrics books like Causal Inference: The Mixtape, Mastering ’Metrics or Mostly Harmless Econometrics. An alternative/complement to The Effect is Introduction to Econometrics with R by Hanck et al.

A “Rosetta Stone” for statistical software. Think of it as a dictionary for statistical or data wrangling methods where you have definitions, explanations, but most importantly, you have code implementations for different statistical software, including but not limited to R, Python and Stata. Considering contributing to LOST if you can, there are still many things that are waiting to be done.

A short but sweet introduction on how to produce almost any type of graph imaginable in R using the famous ggplot2 package. It also includes short introductions to R, the pipe operator %>% and how to graph using base R. It is very useful as a quick desktop reference, however, it won’t actually teach you too much about the logic behind ggplot2, the grammar of graphics. For this, I’ll recommend the original piece by Leland Wilkinson and the book that introduces the ggplot2 package.

At a certain point you’ll want to stop spitting code at the script window and start communicating the results of your analysis to other people, at least to your professor or TA for your homework assignments. While URfIE and Dr. Ejdmyr’s tutorials give you good advice on how to export results to your report or paper, the best way to communicate results is to be able to mix prose and code to produce dynamic documents on RStudio itself. Some of this is explained in R4DS and also in URfIE, but this book delves much, much deeper in all of this. This prose-code workflow is explained with marvelous detail in this book. It will introduce RMarkdown, but most importantly, it will introduce knitr/sweave and .Rnw files in the R context. The latter are files which incorporate LaTeX and R, something which is very powerful and almost indispensable when producing complex documents. Until this day, I have not found a better exposition of the LaTeX and R integration.

Although LaTeX is more powerful than RMarkdown, it does have a steeper learning curve. Besides, RMarkdown allows you to do a lot of interesting stuff, like publishing webpages using blogdown, Quarto, publishing to RPubs, outputting to Microsoft Word and PowerPoint1 among others. This book discusses the use of RMarkdown document generation from beginning to end. A more practical approach, similar to Chang’s book on ggplot2 can be found on RMarkdown Cookbook.

However, in my personal opinion RMarkdown still makes it too difficult to format long, complex documents with strict formatting guidelines. RMarkdown might not give you enough power to do complex stuff, although I have heard that the thesisdown and bookdown packages may give you more power to do this while keeping RMarkdown syntax. I think that if you’ve reached the point where RMarkdown feels too simple for you, you will be better off in the long run learning LaTeX and compiling your documents using .Rnw files as described in the Dynamic Docs books, however, do as you like, de gustibus non est disputandum.

In order to produce research that can be easily reproduced, code that is written clearly and in an ordered manner is necessary. This might be trickier than it sounds, but this resource carefully and concisely presents some coding style guidelines to ensure that the R code you write can be understood by other people. The authors also provide us with other style guides, even one by Hadley Wickham.

A crystal-clear introduction to project version control with Git and GitHub. Version control, although a simple premise, has a somewhat difficult implementation with R and in general with anything that is not Microsoft Word or Google Docs. This book offers a very clear introduction to this with a practical approach aimed at complete beginners.

  • Document(s): Cheatsheets

Many people might not have time to look at a book, video tutorial or even their own previous work when trying to do some task in R. This is what Cheatsheets are for: quick reference guides which work best when you already know the concepts and just need a refresher. A good one is Tom Short’s R Reference Card but there are tons of Cheatsheets out there. RStudio provides you with quick access to some of these by clicking Help -> Cheat Sheets.

  • Document(s): Manuals

This wouldn’t be a complete resource list if I didn’t include the actual manuals which can be found here. However, I do believe these are quite technical, even when considering the complete beginner’s manual/book R for Beginners by Emmanuel Paradis. Complete beginners might find them difficult to understand, but I’ve realized that when facing difficult issues with no apparent solution it is good to carefully read the documentation, or to ask on Stack Overflow. The most recommended ones are R Installation and Administration and An Introduction to R. These are the manuals for the R language, but chances are that if you’re stuck, is because you’re stuck on the usage of a package, so you must consult that package’s documentation, which can be found on CRAN.

An absolutely amazing compilation of applied R books that come into contact and are selected by the author. If you’re looking for a book that especifically discusses a specialized topic you need to work on and you’ve already learned the basics of the language, look it up on the Big Book. There’s even a book on R Programming with Minecraft. You can help the author include more books by filling out his Google Forms survey.

There are some field-specific books/websites which I think are interesting that haven’t been included to the Big Book yet, which I’ll include below. These might be included in the future, as I will ask Oscar to add them.

Below I will include some resources that I haven’t exactly read, but I’ve repeatedly heard that they are important for improving your R skills after picking up the basics:

Some extra advice

Working with R throughout my life as a student and for my personal project, I’ve met with endless issues. I’ve been able to solve most of them by googling my question and reading other people’s Stack Overflow’s questions, or reading books. However, there have been complicated problems which have had rather difficult solutions. In this section I’d like to give some unsolicited advice for you to not run into these problems or how to solve them efficiently if you do.

On building descriptive statistic and regression tables

The stargazer package is one of the most time-efficient ways of constructing summary tables in R. You only need to install and load the package into your file, then use the stargazer() function with a data frame or model object. With data, it can print a fairly decent descriptive statistics table. There are additional ways to customize the function’s output (remember you can use ? before the function’s call to see the function’s documentation on RStudio’s Help pane). It can print side-by-side regressions tables by feeding a list of regressions to the functions too. The package can output the tables to LaTeX, html or text. The stargazer() function is also compatible with many different types of model objects.

However, this package is not the ultimate solution for building regression tables. It quickly becomes clumsy when trying to add special types of standard errors to your estimated model coefficients, when you’re trying to add new goodness-of-fit statistics, etc. I’ve also found out that it does not work too well when used with .Rnw and .Rmd files. Below, I will talk about the great package modelsummary, however, I still believe that stargazer is a good alternative when you’re starting out, for quickly visualizing models and descriptive statistics for your data.

modelsummary is the ultimate alternative to stargazer. I used this package extensively in my undergraduate capstone project and found it to be amazing. It can support virtually all R model objects and allows endless customizations for your tables. One amazing feature is that you can pass arguments to the modelsummary function which work with the kable package, which are particularly useful for complex tables. One example is the longtable = T argument, which isn’t a built-in argument of modelsummary, but it is an argument for the kable package. Also, this package works perfectly with .Rnw and .Rmd files. modelsummary is very, very recommended and will probably save you time in the long run if you start using this one instead of stargazer when you start to create tables from R output. If you want guidance on how to use the package, you can check out all the vignettes made by the package author and the way that I used the package in my papers, all available in my GitHub profile.

Survey data and binary-outcome models

During the data analysis for my capstone project, I worked extensively with survey data and eventually got to running binary-outcome models, namely logit and probit models, with the data. In the literature that used the datasets I was using, it was standard practice to run survey-weighted models to get complex sample-adjusted standard errors and adjusted coefficients. I had no trouble doing this using the survey package in R, however, I ran into trouble when I wanted to calculate average partial effects- it was impossible to do so with both the margins() or the marginaleffects() functions (both have their package named after themselves.).

After much asking around and reading documentation, I found two solutions. The first one is complex and clunky, and not so effective, but it works. The whole thing is described in this Stack Overflow question that I made when I became desperate one time at 3 am.

The second solution is the one that definitely solved the issue and I discovered it only recently. It is possible to calculate average partial effects of the models that had given me issues in the past when I changed the data type of certain variables. Since I had imported my data from Stata files (.dat files) using the haven package, when I ran class() on certain variables I found that it had double as the data type. I found that changing the data type to integer or numeric using the as.* functions solved the issue, now, running marginaleffects() on the model objects with the updated variables caused no issues. A certainly strange problem; I might write about it on the Posts section and/or contribute to LOST with this info.

References

1.
Paradis, E. R for Beginners 2005.
2.
Jean Fal et al R Style Guide 3/25/2019.
3.
The Latin American Public Opinion Project Usando los datos del Barómetro de las Américas 2021.
4.
5.
6.
Venables, W.N.; Smiths, D.M.; R Core Team An Introduction to R. 2022.
7.
Burns, P. The R Inferno 2011.
8.
Adler, J. R in a nutshell; Second edition.; O’Reilly: Sebastopol, CA, 2012; ISBN 9781449312084.
9.
Baruffa, O. Big Book of R 2022.
10.
Brooke Anderson, Karl Broman Gergely Daróczi Mario Inchiosa David Smith and Ali Zaidi R Programming with Minecraft 2020.
11.
Chang, W. R graphics cookbook: Practical recipes for visualizing data; Second edition.; O’Reilly: Sebastopol, CA, 2018; ISBN 9781491978603.
12.
Fox, J.; Weisberg, S. An R companion to applied regression; 2nd edition.; SAGE Publications: Los Angeles, 2011; ISBN 9781452217192.
13.
Grolemund, G. Hands-on programming with R; First edition.; O’Reilly: Sebastepol, CA, 2014; ISBN 9781449359010.
14.
Heiberger, R.M.; Neuwirth, E. R through Excel: A spreadsheet interface for statistics, data analysis, and graphics / by Richard M. Heiberger, Erich Neuwirth; Use R!; Springer: Dordrecht; London, 2009; ISBN 9781441900517.
15.
Heiss, F. Using R for introductory econometrics; 2nd edition.; Independently Published: [s.l.], 2020; ISBN 9798648424364.
16.
Heiss, F.; Brunner, D.T. Using Python for introductory econometrics; 1st edition.; [Florian Heiss, Daniel Brunner]; [Manufactured by CreateSpace]: Dusseldorf, Germany; Orlando, FL, 2020; ISBN 9798648436763.
17.
Huntington-Klein, N. The Effect: An introduction to research design and causality; 1st ed.; Chapman & Hall/CRC: Boca Raton, 2021; ISBN 978-1032125787.
18.
Jenny Bryan et al STAT545 2022.
19.
Jockers, M.L. Text Analysis with R for Students of Literature; Quantitative Methods in the Humanities and Social Sciences; 1st ed. 2014.; Springer International Publishing; Imprint: Springer: Cham, 2014; ISBN 9783319031644.
20.
21.
Kleiber, C.; Zeileis, A. Applied Econometrics with R; Springer: New York, 2008; ISBN 9780387773162.
22.
Long, J.D.; Teetor, P. R cookbook: Proven recipes for data analysis, statistics, and graphics / J. D. Long and Paul Teetor; Second edition.; O’Reilly: Beijing, 2019; ISBN 9781492040682.
23.
Lumley, T. Complex surveys: A guide to analysis using R / Thomas Lumley; Wiley series in survey methodology; Wiley-Blackwell: Oxford, 2010; ISBN 9780470284308.
24.
Marvin, F.R. Poems and translations; Sherman French & company: Boston, 1914;
25.
Matloff, N.S. The art of R programming: Tour of statistical software design; No Starch Press: San Francisco, 2011; ISBN 9781593273842.
26.
Peng, R.D. Exploratory data analysis with R; LeanPub: [United States], 2016; ISBN 1365060063.
27.
Short, T. R Reference Card 2004.
28.
Financial, macro and micro econometrics using R; Vinod, H.D., Rao, C.R., Eds.; Handbook of statistics, 0169-7161; North Holland: Amsterdam, 2020; Vol. volume 42; ISBN 9780128202500.
29.
Wickham, H. ggplot2: Elegant graphics for data analysis; Use R!; Springer: New York; London, 2009; ISBN 9780387981406.
30.
Wickham, H. Advanced R; The R series; CRC Press: Boca Raton, FL, 2015; ISBN 9781466586970.
31.
Wickham, H. R packages; First Edition.; O’Reilly: Beijing, 2015; ISBN 9781491910597.
32.
Wickham, H.; Grolemund, G. R for Data Science: Import, tidy, transform, visualize, and model data; 1st ed.; O’Reilly: Beijing, 2017; ISBN 9781491910399.
33.
Wilkinson, L.; Wills, G. The grammar of graphics; Statistics and computing; 2nd ed.; Springer: New York, 2005; ISBN 9780387245447.
34.
Xie, Y. Dynamic documents with R and knitr; The R series; Second edition.; CRC Press: Boca Raton, FL, 2017; ISBN 9781315360706.
35.
Xie, Y.; Allaire, J.J.; Grolemund, G. R Markdown: The Definitive Guide; Chapman & Hall/CRC the R series; 1st ed.; CRC Press Taylor and Francis Group: Boca Raton, 2018; ISBN 9780429782961.
36.
Xie, Y.; Dervieux, C.; Riederer, E. R Markdown cookbook; Chapman & Hall/CRC the R series; 1st ed.; Chapman & Hall/CRC: Boca Raton, 2020; ISBN 9781000290882.

Footnotes

  1. This is specially useful considering how widespread the use of MS Office is in both industry and academia. However, I still feel that RMarkdown is a bit clumsy when outputting to MS Office. I know that using bookdown and officedown may solve many issues that are found with RMarkdown when outputting to MS Word, but I’m not sure how as of now. I hope to include this material in the future.↩︎

  2. Not really a book or a standalone website, but the course material for a course Dr. Andrew Heiss taught at the Andrew Young School of Policy Studies at Georgia State University. All of Dr. Heiss’ material is g-o-l-d. See his page here.↩︎

  3. The same as the previous footnote, but this is the course material for Jenny Bryan’s course on statistics at the University of British Columbia. Happy Git is also part of that course.↩︎