posted on 2023-05-16, 22:53authored byLi, J, Hibbert, DB, Fuller, S, Vaughn, G
Matching spectra is necessary for database searches, assessing the source of an unknown sample, structure elucidation, and classification of spectra. A direct method of matching is to compare, point by point, two digitized spectra, the outcome being a parameter that quantifies the degree of similarity or dissimilarity between the spectra. Examples studied here are correlation coefficient squared and Euclidean cosine squared, both applied to the raw spectra and first-difference values of absorbance. It is shown that spectra do not fulfill the requirements for a normal statistical interpretation of the correlation coefficient; in particular, they are not normally distributed variables. It is therefore not correct to use a Student's t-test to calculate the probability of the null hypothesis that two spectra are not correlated on the basis of a correlation coefficient between them. We have investigated the effect on the similarity indices of systematically changing the mean and standard deviation of a single Gaussian peak relative to a reference Gaussian peak, of changing one peak, and of changing many peaks, in a simulated 10-peak spectrum. Squared Euclidean cosine is least sensitive to changes and the first-difference methods are most sensitive to changes in mean and standard deviation of peaks. A shift of the center of a peak has a greater effect on the indices than increases in peak width, but a decrease in peak width does lead to significant changes in the indices. We recommend that if these indices are to be used to match spectra, appropriate windows should be chosen to avoid dilution by regions with no significant change.