Using pattern recognition entropy to select mass chromatograms to prepare total ion current chromatograms from raw liquid chromatography-mass spectrometry data
The total ion current chromatogram (TICC) obtained by liquid–chromatography-mass spectrometry (LC-MS) is often extremely complex and ‘noisy’ in appearance, particularly when an electrospray ionization source is used. Accordingly, meaningful qualitative and quantitative information can be obtained in LC-MS by data mining processes. Here, one or more higher-quality mass chromatograms can be identified/extracted/isolated and combined to form a TICC, wherein much of the background mass noise is eliminated, and quantitative data for chromatographic peaks can be obtained. Pattern Recognition Entropy (PRE) is a new application of Shannon’s statistical concept of entropy. PRE is both a pattern recognition tool and a summary statistic that can be used to identify information-containing mass chromatograms, where higher quality data (higher signal-to-noise mass chromatograms) usually have lower PRE values. Reduced TICCs are obtained by first calculating the PRE values of the component mass chromatograms. A plot of PRE value vs. m/z for the mass chromatograms is then generated, and the resulting band of PRE values is fit to a piecewise spline polynomial. The distribution of the differences between the individual PRE values and the spline fit is then used to select ‘good’ mass chromatograms. For the data set considered herein, best results were obtained with a threshold of 0.5 standard deviations below the average value (value of the spline). PRE reduces the number of component mass chromatograms significantly (by an order of magnitude) and at the same time preserves most of the chemical information that is collectively in them. It can also distinguish between mass chromatograms of chemically similar species. PRE is arguably a less computationally intensive alternative to the widely used CODA algorithm for variable reduction. It produces reduced TICCs of comparable if not higher quality, and it requires only a single user input for variable selection. Reduced TICCs generated by PRE can be smoothed to further improve their signal-to-noise ratios.