Statistical methodology for tab-charts : data reduction techniques in laser ablation analyses

Yung, CH

doi:10.25959/23241050.v1

whole_YungChiHo2008_thesis.pdf (19.97 MB)

Statistical methodology for tab-charts : data reduction techniques in laser ablation analyses

thesis

posted on 2023-05-27, 12:31 authored by Yung, CH

The researchers in the CODES and the School of Earth Sciences operate a laboratory to study the composition of rock samples, which are collected from the field site. The rock samples are put into a machine. This machine will create series plots of all elements (called tab-chart), which indicate the distribution of elements in the samples. In the tab-chart, a significant signal change implies the change of composition in the sample and a flat part implies a mineral layer (phase) existing in the sample. Currently, the researchers identify these properties by their knowledge and experience. In some situations, they are difficult to make their judgement on these properties since they are not obvious and clear. Thus, an automatic and systematic method is requested to help them to solve this problem. Total 1848 (= 66 samples x 28 elements in each sample) tab-charts of primary and secondary samples are provided by the School of Earth Sciences. These primary and secondary samples are not real and created in the laboratory. These tab-charts have the shape of background noise (a flat part) at the first stage, jump (a significant signal change) at the second stage, plateau (another flat part) at the third stage and drop (another significant signal change) at the last stage. Although this project focus on the standard samples only, the analysis and results can be extended to the real samples. The first four chapters of this project explain and describe the equipments of the laboratory, the mechanism and process of geological analysis on the sample and tab-chart description. These chapters provide the knowledge for reference only and not the main interest in this project. This project is to focus and concentrate on the mathematical analysis on the tab-chart. The problem mentioned in the first paragraph is actually a change point analysis (detection) or time series segmentation issue in mathematics and statistics. Many methods are presented and invented to solve this problem in the papers. They include cumulative sums of difference (CUSUM), perceptually important points (PIP), fuzzy set theory and genetic algorithm and so on. To my best knowledge, most of these methods focus on point change detection only. On the top of this detection, a method is expected that it can also provide the researchers about the statistical summaries of the flat part in the tab-chart (i.e. the mean, standard deviation and trend of element amount in the layer). Therefore, time series model could be considered and a good choice to achieve the above two targets. In addition, time series model has the advantage that it is easily implemented in the worksheet. However, some algorithms and modifications are needed to make such that the time series model can identify any point change in the tab-chart. Among various time series models, the linear Holt exponential smoothing model is selected in this project. This selection is made after considering and comparing some common and popular time series models. For simple exponential smoothing model, it has one estimate or equation (i.e. smooth) and one parameter (i.e. a) only. This model is rejected because signal changes (i.e. jump and drop) have large slope but flat parts (i.e. background and plateau) have gentle slope. One estimate is not enough to reflect this slope property of the tab-chart. For damped-trend linear exponential smoothing model, it has two estimates or equations (i.e. smooth and trend) and three parameters (i.e. a and p and y). Although two estimates are enough to reflect the slope property, three parameters may complicate the problem analysis and there is another better choice, linear Holt exponential smoothing model. This model also has two estimates (two equations) but has two parameters only (i.e. ˜í¬± and ˜í‚â§). It is not guarantee that this model is the best choice, but any results and findings from this model can help to explore other time series model in the further studies. The linear exponential smoothing model is modified before trying to fit it to the tab-chart. The modified model has the variable (dynamic) parameters, a and p, and a threshold value, T. In the fitting process, if the trend estimate of the model exceeds the threshold value, the variable parameters will take values ˜í¬±1 and ˜í‚â§1. Otherwise, they will take another values ˜í¬±2 and ˜í‚â§2. The reason of using this policy is based on the difference of slope between significant signal changes (i.e. background and plateau) and flat parts (i.e. jump and drop). The parameters and the threshold value are adjusted manually until the model is fitted well to the tab-chart. After fitting the model to all tab-charts, it discovered that the threshold value T is more influential and important in finding the well-fitted model than the two parameters ˜í¬± and ˜í‚â§. Besides, the fitted curve of the model is spiky when the threshold value is small but becomes smooth when the value gets larger. There is a remark that the above parameters policy is only an initial trial and not perfect, the experiment result will reflect and reveal what is the drawback and disadvantage. For convenience, henceforth HOLT model is named for the above modified linear smoothing model. The first algorithm of detecting the point change (or time series segmentation) by HOLT model is called classification method. The main idea of the algorithm is explained at the following. The change of the variable parameters (i.e. ˜í¬± and ˜í‚â§) indicates the stage change in the tab-chart (i.e. from significant signal change to flat part or vice versa). For example, the values of parameters change from (˜í¬±1, ˜í‚â§1) to (˜í¬±2, ˜í‚â§2) as the tab-chart moves from background (stage) to jump (stage) in standard sample. This classification method is not practical for the researchers and not automatic because it needs the human adjustment of parameters beforehand. However, this method has two purposes. Firstly, this method is a way to develop another automatic classification method. Secondly, this method is used as a tool to analysis the data reduction and fitted error of HOLT model. The second algorithm of detecting the point change by HOLT model called classification rules. This algorithm is an automatic method because it uses a set of rules to divide the tab-chart into different stages. The classification rules are developed from classification method in the following way. After the tab-chart is well fitted by the HOLT model, the graphs of trend estimate versus smooth estimate are plotted for all tab-charts. The rules are drawn by comparing the trend-smooth graphs with the tab-charts. The background stage of the tab-chart has small trend and smooth values in the trend-smooth graphs. The jump stage of the tab-chart has larger trend and smooth values. The plateau stage of the tab has larger smooth values but small trend values. The drop stage of the tab-chart has negative and large trend values. The performance of the classification rules are verified and tested by applying to the tab-charts of standard samples. Some guidelines of evaluating the performance are made to minimize the personal and biased judgement. Different person possibly has different judgement and view on some borderline cases. The classification rules have the successful rate ranging from 45% to 80% in various elements. However, after excluding the tab-charts of having close background and plateau, the successful rate of the classification rates will be at least 65%. This project provides the good method and platform of identifying the point change. One promising way of improving the performance is to refine and modify the rules. Although the rules seem to play more important role in the performance than the parameters (i.e. ˜í¬± and ˜í‚â§). an experiment should be carried out to investigate any effect of the parameters on the performance. For both classification method and rules, there is a problem of misclassification. This problem is that the classification and rules have the terrible performance in some tab-charts. In other words, there are a lot of observations in these tab-charts being wrongly classified. However, all these tab-charts have the property that the level of background and plateau are very close. Thus, this is possibly the cause of misclassification but the gentle signal change (i.e. jump or drop having gentle slope) is another possible cause. Anyway, this problem gives us a hint to improve the performance of both method and rules. For example, another set of rules is needed to tackle these tab-charts. Besides, classification method is better than classification rules because rules are hardly to replace the human visual judgement. However, classification rules are automatic and more practical than classification method. Data reduction is another characteristic of the HOLT model. The model is capable of removing the noise or variation from tab-chart. The researchers can gain the useful information by observing the fitted curve (values) of the model. Apart from the graphs, quantitative analysis is also included in this thesis to help us to understand the data reduction in another angle. The square-root of the sample size formula is used to calculate the standard error over the background and plateau. This formula has the assumption of independent data. Although many tab-charts have the autocorrelation because the ARIMA model can be fitted to them, this violation does not cause the problem of using the formula. The formula is not used to estimate the statistical summaries of the underlying process. It is used to measure the degree of variation and fluctuation in the background and plateau. Over the plateau, the standard error of HOLT model (i.e. mean of fitted curve or values) is smaller than that of tab-chart (i.e. mean of the actual observation). This supports that the HOLT can reduce the variation over the plateau. However, the situation is reversed over the background. The stand...

History

Publication status

Unpublished

Rights statement

Copyright 2008 the author - The University is continuing to endeavour to trace the copyright owner(s) and in the meantime this item has been reproduced here in good faith. We would be pleased to hear from the copyright owner(s). Thesis (MSc)--University of Tasmania, 2008. Includes bibliographical references