References

Andersen P.K., Gill R.D. (1982). Cox’s regression model for counting processes: a large sample study. Annals of Statistics, 10, 1100-1120.

Agresti, A. (2013). Categorical Data Analysis. Wiley.

Barnard, J., Rubin, D.B. (1999). Small sample degrees of freedom with multiple imputation. Biometrika, 86, 948-955.

Bates, D., Mächler, M., Bolker, B., Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67, 1.

Boehmke, B., Greenwell, B. (2019). Hands-On Machine Learning with R. CRC Press.

Bollen, K.A. (1989). Structural Equations with Latent Variables. Wiley Series in Probability and Mathematical Statistics. Wiley.

Box, G.E. (2013). An Accidental Statistician. Wiley.

Box, G.E., Cox, D.R. (1964). An analysis of transformations. Journal of the Royal Statistical Society: Series B (Methodological), 26(2), 211-243.

Breiman, L., Friedman, J., Stone, C.J., Olshen, R.A. (1984). Classification and Regression Trees. CRC Press.

Breiman, L. (2001). Random forests. Machine Learning, 45(1), 5-32.

Brown, L.D., Cai, T.T., DasGupta, A. (2001). Interval estimation for a binomial proportion. Statistical Science, 16(2), 101-117.

Buolamwini, J., Gebru, T. (2018). Gender shades: Intersectional accuracy disparities in commercial gender classification. Proceedings of Machine Learning Research, 81, 1-15.

Cameron, A.C., Trivedi, P.K. (1990). Regression-based tests for overdispersion in the Poisson model. Journal of Econometrics, 46(3), 347-364.

Casella, G., Berger, R.L. (2002). Statistical Inference. Brooks/Cole.

Charytanowicz, M., Niewczas, J., Kulczycki, P., Kowalski, P.A., Lukasik, S. & Zak, S. (2010). A Complete Gradient Clustering Algorithm for Features Analysis of X-ray Images. In: Information Technologies in Biomedicine, Ewa Pietka, Jacek Kawa (eds.), Springer-Verlag, Berlin-Heidelberg, 15-24.

Chollet, F., Allaire, J.J. (2022). Deep Learning with R. Second edition. Manning.

Cochran, W.G. (1954). Some methods of strengthening the common \(\chi^2\) tests. Biometrics, 10, 417-451.

Committee on Professional Ethics of the American Statistical Association. (2018). Ethical Guidelines for Statistical Practice. https://www.amstat.org/ASA/Your-Career/Ethical-Guidelines-for-Statistical-Practice.aspx

Cook, R.D., & Weisberg, S. (1982). Residuals and Influence in Regression. Chapman & Hall.

Cortez, P., Cerdeira, A., Almeida, F., Matos, T., Reis, J. (2009). Modeling wine preferences by data mining from physicochemical properties. Decision Support Systems, 47(4), 547-553.

Costello, A.B., Osborne, J. (2005). Best practices in exploratory factor analysis: Four recommendations for getting the most from your analysis. Practical Assessment, Research, and Evaluation, 10(1), 7.

Cox, D. R. (1972). Regression models and life‐tables. Journal of the Royal Statistical Society: Series B (Methodological), 34(2), 187-202.

Dastin, J. (2018). Amazon scraps secret AI recruiting tool that showed bias against women. Reuters.

Davison, A.C., Hinkley, D.V. (1997). Bootstrap Methods and their Application. Cambridge University Press.

Delacre, M., Lakens, D., Leys, C. (2017). Why psychologists should by default use Welch’s t-test instead of Student’s t-test. International Review of Social Psychology, 30(1).

Drymon, M.M. (2008). Disguised As the Devil: How Lyme Disease Created Witches and Changed History. Wythe Avenue Press.

Eck, K., Hultman, L. (2007). One-sided violence against civilians in war: Insights from new fatality data. Journal of Peace Research, 44(2), 233-246.

Eddelbuettel, D., Balamuta, J.J. (2018). Extending R with C++: a brief introduction to Rcpp. The American Statistician, 72(1), 28-36.

Efron, B. (1983). Estimating the error rate of a prediction rule: improvement on cross-validation. Journal of the American Statistical Association, 78(382), 316-331.

Elston, D.A., Moss, R., Boulinier, T., Arrowsmith, C., Lambin, X. (2001). Analysis of aggregation, a worked example: numbers of ticks on red grouse chicks. Parasitology, 122(05), 563-569.

Fine, J.P., Gray, R.J. (1999). A proportional hazards model for the subdistribution of a competing risk. Journal of the American statistical association, 94(446), 496-509.

Fisher, R.A. (1935). The Design of Experiments. Oliver & Boyd.

Fleming, G., Bruce, P.C. (2021). Responsible Data Science: Transparency and Fairness in Algorithms. Wiley.

Franks, B. (Ed.) (2020). 97 Things About Ethics Everyone in Data Science Should Know. O’Reilly Media.

Friedman, J.H. (2002). Stochastic Gradient Boosting, Computational Statistics and Data Analysis, 38(4), 367-378.

Gao, L.L, Bien, J., Witten, D. (2022). Selective inference for hierarchical clustering. Journal of the American Statistical Association, DOI: 10.1080/01621459.2022.2116331.

Groll, A., Tutz, G. (2014). Variable selection for generalized linear mixed models by L1-penalized estimation. Statistics and Computing, 24(2), 137-154.

Hall, P. (1992). The Bootstrap and Edgeworth Expansion. Springer Science & Business Media.

Hartigan, J.A., Wong, M.A. (1979). Algorithm AS 136: A k-means clustering algorithm. Journal of the Royal Statistical Society: Series C (Applied Statistics), 28(1), 100-108.

Henderson, H.V., Velleman, P.F. (1981). Building multiple regression models interactively. Biometrics, 37, 391–411.

Herr, D.G. (1986). On the history of ANOVA in unbalanced, factorial designs: the first 30 years. The American Statistician, 40(4), 265-270.

Hoerl, A.E., Kennard, R.W. (1970). Ridge regression: biased estimation for nonorthogonal problems. Technometrics, 12(1), 55-67.

Holzinger, K., Swineford, F. (1939). A Study in Factor Analysis: The Stability of a Bifactor Solution. Supplementary Educational Monograph, no. 48. University of Chicago Press.

Hu, L.; Bentler, P.M. (1999). Cutoff criteria for fit indexes in covariance structure analysis: conventional criteria versus new alternatives. Structural Equation Modeling. 6 (1): 1-55.

Hyndman, R. J., Athanasopoulos, G. (2018). Forecasting: Principles and Practice. OTexts.

Imai, K., Keele, L., Yamamoto, T. (2010). Identification, inference, and sensitivity analysis for causal mediation effects. Statistical Science, 25(1), 51-71.

James, G., Witten, D., Hastie, T., Tibshirani, R. (2021). An Introduction to Statistical Learning with Applications in R. Springer.

Kuznetsova, A., Brockhoff, P. B., Christensen, R. H. (2017). lmerTest package: tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1-26.

Liero, H., Zwanzig, S. (2012). Introduction to the Theory of Statistical Inference. CRC Press.

Liu, X., Swenson, N.G., Lin, D., Mi, X., Umaña, M.N., Schmid, B., Ma, K. (2016). Linking individual-level functional traits to tree growth in a subtropical forest. Ecology (Durham), 97(9), 2396-2405.

Long, J.D., Teetor, P. (2019). The R Cookbook. O’Reilly Media.

Moen, A., Lind, A.L., Thulin, M., Kamali–Moghaddamd, M., Roe, C., Gjerstad, J., Gordh, T. (2016). Inflammatory serum protein profiling of patients with lumbar radicular pain one year after disc herniation. International Journal of Inflammation, 2016, Article ID 3874964.

Persson, I., Arnroth, L., Thulin, M. (2019). Multivariate two-sample permutation tests for trials with multiple time-to-event outcomes. Pharmaceutical Statistics, 18(4), 476-485.

Petterson, T., Högbladh, S., Öberg, M. (2019). Organized violence, 1989-2018 and peace agreements. Journal of Peace Research, 56(4), 589-603.

Picard, R.R., Cook, R.D. (1984). Cross-validation of regression models. Journal of the American Statistical Association, 79(387), 575–583.

Prentice R.L., Williams B.J., Peterson A.V. (1981). On the regression analysis of multivariate failure time data. Biometrika, 68, 373-379.

Rasch, D., Kubinger, K.D., Moder, K. (2011). The two-sample t test: pre-testing its assumptions does not pay off. Statistical Papers, 52(1), 219.

Recht, B., Roelofs, R., Schmidt, L., Shankar, V. (2019). Do ImageNet classifiers generalize to ImageNet?. arXiv:1902.10811.

Rubin, D.B. (1987). Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons.

Schoenfeld, D. (1982). Partial residuals for the proportional hazards regression model. Biometrika, 69(1), 239-241.

Scrucca, L., Fop, M., Murphy, T.B., Raftery, A.E. (2016). mclust 5: clustering, classification and density estimation using Gaussian finite mixture models. The R Journal, 8(1), 289.

Smith, G. (2018). Step away from stepwise. Journal of Big Data, 5(1), 32.

Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. Journal of the Royal Statistical Society: Series B (Methodological), 58(1), 267-288.

Tibshirani, R., Walther, G., Hastie, T. (2001). Estimating the number of clusters in a data set via the gap statistic. Journal of the Royal Statistical Society: Series B (Statistical Methodology), 63(2), 411-423.

Thulin, M. (2014a). The cost of using exact confidence intervals for a binomial proportion. Electronic Journal of Statistics, 8, 817-840.

Thulin, M. (2014b). On Confidence Intervals and Two-Sided Hypothesis Testing. PhD thesis. Department of Mathematics, Uppsala University.

Thulin, M. (2014c). Decision-theoretic justifications for Bayesian hypothesis testing using credible sets. Journal of Statistical Planning and Inference, 146, 133-138.

Thulin, M. (2016). Two‐sample tests and one‐way MANOVA for multivariate biomarker data with nondetects. Statistics in Medicine, 35(20), 3623-3644.

Thulin, M., Zwanzig, S. (2017). Exact confidence intervals and hypothesis tests for parameters of discrete distributions. Bernoulli, 23(1), 479-502.

Tobin, J. (1958). Estimation of relationships for limited dependent variables. Econometrica, 26, 24-36.

Wasserstein, R.L., Lazar, N.A. (2016). The ASA statement on p-values: context, process, and purpose. The American Statistician, 70(2), 129-133.

Wei, L.J. (1992). The accelerated failure time model: a useful alternative to the Cox regression model in survival analysis. Statistics in Medicine, 11(14‐15), 1871-1879.

Wickham, H. (2019). Advanced R. CRC Press.

Wickham, H., Bryan, J. (2023). R Packages. O’Reilly Media.

Wickham, H., Grolemund, G. (2017). R for Data Science. O’Reilly Media.

Wickham, H., Navarro, D., Lin Pedersen, T. (forthcoming). ggplot2: Elegant Graphics for Data Analysis. Third edition.

Wilke, C.O. (2019). Fundamentals of Data Visualization. O’Reilly Media.

Xie, Y., Allaire, J.J., Grolemund, G. (2018). R Markdown: the definitive guide. Chapman & Hall.

Zeileis, A., Hothorn, T., Hornik, K. (2008). Model-based recursive partitioning. Journal of Computational and Graphical Statistics, 17(2), 492-514.

Zhang, D., Fan, C., Zhang, J., Zhang, C.-H. (2009). Nonparametric methods for measurements below detection limit. Statistics in Medicine, 28, 700–715.

Zhang, Y., Yang, Y. (2015). Cross-validation for selecting a model selection procedure. Journal of Econometrics, 187(1), 95-112.

Zou, H., Hastie, T. (2005). Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society: Series B (Methodological), 67(2), 301-320.

Online resources

A number of reference cards and cheat sheets can be found online. I like the one at: https://cran.r-project.org/doc/contrib/Short-refcard.pdf
R-bloggers (https://www.r-bloggers.com/) collects blog posts related to R. A great place to discover new tricks and see how others are using R.
RSeek (http://rseek.org/) provides a custom Google search with the aim of only returning pages related to R.
Stack Overflow (https://stackoverflow.com/questions/tagged/r) and its sister-site Cross Validated (https://stats.stackexchange.com/) are questions-and-answers sites. They are great places for asking questions, and in addition, they already contain a ton of useful information about all things R-related. The RStudio Community (https://community.rstudio.com/) is another good option.
The R Journal (https://journal.r-project.org/) is an open-access peer-reviewed journal containing papers on R, mainly describing new add-on packages and their functionality.

Modern Statistics with R

References

Further reading

Online resources