Machine learning describes an array of computational and nested statistical methods whereby a computer can 'learn' and subsequently make predictions or identify patterns in data. With the increasing volume and variety of numerical data in the geosciences, and widespread availability of the needed computing power, machine learning techniques are a logical addition to the numerous possible approaches that can be applied to the search for ore deposits. The three core research chapters in this thesis develop the application of machine learning in the context of mineral exploration. Emphasis is placed on the Random Forests algorithm for mapping lithology in a range of settings and at a variety of stages in the exploration process. Information entropy is used to assist both in assessing and communicating any complex combinations, and potential inaccuracy, of classification results. Through the thesis, methods are employed with future practical usage in mind, such that machine learning may be used by the geologist (as domain expert) in an objective manner. The first of these core studies uses the Random Forests algorithm to re-classify the solid geology lithology map of the Heron South project, located in the Eastern Goldfields ofWestern Australia. This study uses geophysical and remote sensing data, in the absence of geochemical samples and geological ground truthing with most of the project under transported cover. This is characteristic of an early stage, reconnaissance exploration project. A sparse training sample of 1.6 percent of the total area, is taken as training data, allowing much of the areas geology the freedom to be reclassified. This study demonstrates that Random Forests, with proper consideration given to sampling and training data selection, can be used effectively to produce or improve geological mapping in little-explored areas. Information entropy is shown to be valuable in predicting where classification was likely to be inaccurate or a region highly complex. The second core study uses Random Forests to produce a solid geology map of the Kliyul porphyry prospect of British Columbia, Canada, using a fusion of available geophysical and geochemical data, typical of a greenfields stage exploration project. Soil and rock chip sample sites were taken as training data, used to classify the remainder of the project area. Assessment of the probability distributions produced using the Random Forests algorithm enabled regions with an elevated probabilityof intrusions (a key indicator lithology) to be mapped, even where not observed in training data. The results of this study highlight the value of a soft, ensemble classifier such as Random Forests, and the value to be gained from an assessment of the spatial distribution of class probabilities as opposed to viewing a final map as a solution in isolation. In the third and final core study, a range of training data sampling paradigms are tested in a data rich area located in the Domes region of the Central African Copper belt hosting the Sentinel (Ni) and Enterprise (Cu) deposits. This study simulates early and advanced stage exploration project maturity in incorporating a priori geological Information. It culminates in the use of Random Forests to undertake an objective audit of the present company geological map. Further to this, unsupervised clustering is used in the production of a geological map in the absence of training or constraint through identifying the natural grouping of data. The results of these studies highlight the importance of proper sample balancing and explore the repercussions of limited and/or non-representative training data. The use of the information entropy proxy is developed to identify where a classification may depart from the domain represented by training data. The ranking of input data that is performed in association with the Random Forests classification can be used to improve clustering results through optimising dataset selection. Through the three core research chapters, a set of practical considerations and recommendations for explorers are provided. It is demonstrated that Random Forests can provide an objective audit and subsequent refinement of a pre-existing geological map. The expression of uncertainty using information entropy, and the assessment of class probabilities, can be used to appraise the results from the machine learning analyses. This includes validation in the case of complex outcome combinations, and generation of new insights. Ranking of input datasets via Random Forests can enhance understanding of data and improve both Random Forests classification results and improve clustering. With the proper selection of appropriate datasets, clustering (for example immobile trace elements) and scaling can indeed produce results that correspond well with lithology. Studies presented in this thesis use data from current/active exploration projects and methods are distilled to streamlined workflows using industry standard software and data formats. In summary, these methods, previously the domain of computer and data scientists, are now developed to be more widely accessible to mineral explorers.
Copyright 2021 the author Chapter 3 appears to be the equivalent of a post-print version of an article published as: Kuhn, S., Cracknell, M. J., Reading, A. M., 2018. Geophysics, 83(4), B183-B193. Copyright the authors.Published by the Society of Exploration Geophysicists. All article content, except where otherwise noted (including republished material), is licensed under a Creative Commons Attribution 4.0 Unported License (CC BY). See http://creativecommons.org/licenses/by/4.0 Chapter 4 appears to be the equivalent of a pre-print version of an article published as: Kuhn, S., Cracknell, M. J., Reading, A. M., Sykora, S., 2020. Identification of intrusive lithologies in volcanic terrains in British Columbia by machine learning using random forests: The value of using a soft classifier, Geophysics, 85(6), B249-B258. Copyright the authors.Published by the Society of Exploration Geophysicists. All article content, except where otherwise noted (including republished material), is licensed under a Creative Commons Attribution 4.0 Unported License (CC BY). See http://creativecommons.org/licenses/by/4.0 Chapter 5 appears to be the equivalent of a pre-print version of an article published as: Kuhn, S., Cracknell, M. J., Reading, A. M., 2019. Lithological mapping in the Central African Copper Belt using Random Forests and clustering: Strategies for optimised results, Ore geology reviews, 112, 103015. Copyright 2019 The Authors. Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license (http://creativecommons.org/licenses/BY-NC-ND/4.0/).