Digital Soil Mineralogy relates to the data-driven analysis of soil X-ray powder diffraction (XRPD) data. Such data are considered to be precise digital signatures of a given soil’s mineralogy, within which all of the information required to identify and quantify the various mineral components within these complex mixtures is encoded.

In recent years various methods for Digital Soil Mineralogy have been developed and published in peer-reviewed literature. These methods include the use of supervised and unsupervised machine learning to predict and interpret soil properties from XRPD data, the application of novel multivariate statistical methods, and automated approaches for mineral quantification. Each chapter in this course will detail one such method, providing code and data for reproducible examples that can be adapted by readers for their own projects/research.

Whilst all data and methods presented herein relate to soil samples, the methods can be considered transferable to all aspects of environmental mineralogy and beyond!


To run the examples provided throughout this document, it is recommended that you have R and RStudio installed on your machine. Once that’s set up, then additional extensions (packages) required along the way can be installed and loaded. R and it’s extensions are designed to be multi-platform so all material presented here should work on Windows, Mac, or Linux. The only package needed from the very start of the document is powdR, and subsequent packages will be introduced in later chapters. To install powdR, use:


Code Conventions

This document contains many chunks of R code that provide reproducible examples that can be copied and pasted to run on your own computer, for instance:

#Summarise a vector of integers 1 to 10
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    3.25    5.50    5.50    7.75   10.00

The R session information when compiling this book is shown below:

## R version 4.2.1 (2022-06-23 ucrt)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19042)
## Matrix products: default
## locale:
## [1] LC_COLLATE=English_United Kingdom.utf8 
## [2] LC_CTYPE=English_United Kingdom.utf8   
## [3] LC_MONETARY=English_United Kingdom.utf8
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.utf8    
## attached base packages:
## [1] stats     graphics  grDevices utils     datasets  methods   base     
## loaded via a namespace (and not attached):
##  [1] bookdown_0.27    codetools_0.2-18 digest_0.6.28    R6_2.5.1        
##  [5] jsonlite_1.7.2   magrittr_2.0.1   evaluate_0.15    highr_0.9       
##  [9] stringi_1.7.6    rlang_0.4.11     rstudioapi_0.13  jquerylib_0.1.4 
## [13] bslib_0.3.0      rmarkdown_2.14   tools_4.2.1      stringr_1.4.0   
## [17] xfun_0.31        yaml_2.2.1       fastmap_1.1.0    compiler_4.2.1  
## [21] htmltools_0.5.2  knitr_1.39       sass_0.4.0

Text outputs associated with R code are denoted by two hashes (##) by default, as you can see from the example above. This is for your convenience when you want to copy and run the code (the text output will be ignored since it is commented out). Package names are in bold text (e.g. powdR), and inline code and filenames are formatted in a typewriter font (e.g., summary(1:10)). Function names can easily be identified by the parentheses that follow them (e.g., mean(1:10)).

What to expect

This document is divided into chapters that each detail specific aspects of Digital Soil Mineralogy. To start with, the basics of handling XRPD data in R are introduced, which progresses to more advanced manipulation of such data that cannot be realised with proprietary XRPD software. Subsequently, specific examples of methods for Digital Soil Mineralogy are provided that include high throughput quantitative analysis, data mining, and cluster analysis. As such, the documentation is separated into the following chapters:

  • Chapter 1: Loading and handling XRPD data in R
  • Chapter 2: Quantitative analysis of XRPD data using full pattern summation
  • Chapter 3: The use of machine learning to predict and interpret soil properties from XRPD data
  • Chapter 4: The application of cluster analysis to identify discrete groups of soils based on mineralogy
  • Chapter 5: Identifying soils analogues for Martian mineralogy based on XRPD data

Each chapter contains reproducible R code along with written explanations. For those that prefer video tutorials, there are a number of embedded YouTube videos throughout the course material that describe and explain the R code. With exception to Chapter 1, all chapters are standalone so there is no need to read everything!

About the authors

  • Benjamin Butler is a Digital Mineralogist at the James Hutton Institute, Aberdeen, UK. His interests centre around the use of XRPD to characterise the mineral composition of environmental mixtures such as soils, construction and demolition waste, and sea ice. As an active R user he has authored the powdR package that provides a range of methods for quantitative analysis of XRPD data using full pattern summation. Aside from dealing with data, he is involved in regular soil surveys on forestry land across Scotland, allowing him to observe soils from the digging to the databases.

  • Steve Hillier is a Soil Mineralogist at the James Hutton Institute, Aberdeen, UK. Steve conceived the concept of digital mineralogy when he realized that attempts to understand how soil properties relate to mineralogical properties did not need to involve complex intermediate steps such as quantitative mineralogical analysis of XRD data but could instead work directly with more precise uninterpreted primary XRD data, linking back to mineralogical interpretations at a later stage in the analysis. Steve is well know for his work on quantitative mineralogical analysis using full pattern methods as exemplified by his success in the Reynolds Cup Round Robin. Steve is outgoing Chair of IUSS commission 2.4 on soil mineralogy (2018-2022) and has been actively promoting data driven approaches to understanding soil mineralogy/soil property relationships.


The research that comprises the bulk of this material, and the time required to create this material was kindly funded by the Macaulay Development Trust. Additional support from the International Union of Soil Sciences (IUSS) Stimulus Fund is gratefully acknowledged, as is the support of the Scottish Government’s Rural and Environment Science and Analytical Services Division (RESAS).