Reversed-phase liquid chromatography (RPLC) is the most commonly used chromatographic technique in the pharmaceutical industry. In RPLC, a computer-assisted approach is capable of accelerating the process of method development by predicting the retention behaviour of compounds of interest, this would then be followed by an optimisation step to improve chromatographic performance. These objectives can be achieved using a combination of analytical routines and chemometric techniques, and quantitative structure-retention relationship (QSRR) modelling is a promising solution from a variety of chemometric methods. QSRR aims to find meaningful relationships between chromatographic parameters and the molecular descriptors of the compounds of interest. QSRR has been applied for the characterisation of columns, the interpretation of retention mechanisms, the prediction of retention, and the identification of unknown compounds. The first part of this thesis illustrates the application of QSRR methodology to predict the retention times of compounds extracted from the literature. Several filters including Tanimoto similarity, log D and log P, dual-filtering, and the ratio of retention factors (k-ratio) were applied to yield training sets for the construction of QSRR models and the results were explored and compared. QSRR methodology seeks mathematical equations where the chromatographic parameters are expressed as functions of molecular descriptors that were generated, in this case, using Dragon software. To achieve that, the genetic algorithm (GA), employed as feature selection method, combined with a partial least squares (PLS) regression was utilised to correlate measured retention data to various computed molecular descriptors. Among the constructed models, the filter that utilised the ratio of retention factor appeared to be the most effective approach to minimizing prediction errors. But, since the k-ratio filter is impractical, it cannot be used directly in QSRR modelling for retention prediction. However, in QSRR modelling the concept of chromatographic similarity should be considered and implemented using a measure of similarity that adequately reflects the retention of compounds. The use of a Tanimoto similarity filter based on the chemical structures of the analytes resulted in unacceptable prediction errors because the level of similarity of compounds in the training sets to the target compounds was not sufficiently high (TS > 0.5). Retention predictions generated using log D, and log P filters, and a dual-filter were not sufficient for the purpose of screening in chromatographic method development. Results indicated the need for larger and more homogenous datasets for QSRR modelling, allowing sufficient numbers of compounds with a high pair-wise similarity to be found when using the Tanimoto filter. The second part of this thesis demonstrated and compared the contribution of different resources of molecular descriptors to QSRR prediction performance where training sets were formed using different filtering approaches. Molecular descriptors of compounds were calculated using both Dragon, and VolSurf+. Filtering approaches including leave-one-out (LOO), training-test, local compound type (LCT), and local second dominant interaction after hydrophobicity (LSDI) were utilised to allocate compounds to training sets prior to deriving local QSRR models. Instead of predicting retention times directly, retention was predicted indirectly in this chapter by modelling the five solute coefficients of the Hydrophobic Subtraction Model (HSM). Among these four filtering approaches, the LSDI approach showed the best prediction for the five solute coefficients, followed by the LCT approach, demonstrating that approaches embedded with compound classification yielded better prediction of solute coefficients and hence, retention. In terms of the comparison of descriptor resources, no significant difference was observed given the comparable results obtained. The HSM is capable of indirectly providing sufficient accuracy of retention prediction by fitting the predicted solute coefficients and column parameters together. In the third and main part of the thesis, the utility of the QSRR approach for modelling solute coefficients in the HSM to predict retention times across a wide range of RPLC columns was demonstrated. Different approaches were utilised to cluster compounds for the training sets prior to deriving local QSRR models. The performance of each filtering approach in enhancing prediction accuracy was compared against the classical approach where a global QSRR model is derived using all the compounds in the training set without filtering. The predictive power of the established QSRR models was evaluated using a series of criteria and statistical analysis. In addition, a proof of concept demonstration of the use of QSRR was performed by predicting the retention times of five representative compounds on nine columns for which HS coefficients were known. It was shown that modelling the solute coefficient ˜í‚àë' in the HSM is all that is necessary to achieve sufficient prediction accuracy. Results showed improvement in prediction accuracy for the LSDI, LCT and Tanimoto approaches compared with global models derived from the whole dataset without filtering. Of these approaches, the LCT exhibited advantages in its ease of application and the larger number of compounds which can be modelled using this approach. However, the Tanimoto approach was the simplest to apply and, provided that there was a sufficient number of similar compounds in the dataset which meet the TS score cut-off, it yielded sufficiently accurate results. The predictions reported show sufficient accuracy to meet the major objective of this study, namely to determine the likelihood of co-elution of analytes. In the last part of the thesis, retention index (RI) prediction QSRR models were developed that offer useful predictive ability for compounds having the same molecular weight, allowing false positives to be removed during the interpretation of structure identification in non-targeted metabolomics (NTM). A novel dual-filtering approach combining structural similarity and chromatographic similarity was employed to build suitable training sets for target analytes for the accurate prediction of RI. The elimination of false positives from the list of potential candidate compounds produced by exact mass database searches is demonstrated using the proposed QSRR approach. The predicted RI values generated using the dual-filtering-based QSRR models were very well correlated with the measured data, presenting a percentage root mean square error of prediction (%RMSEP) value of 8.45%. By applying the retention index prediction filter to the modelled compounds in each exact-mass group, 53% of groups were found where at least one false positive could be eliminated. Results have shown that the QSRR modelling using dual-filtering as the strategy for the generation of training sets permits a robust and highly accurate prediction of retention index. Additionally, the results demonstrate that the developed QSRR strategy is capable of eliminating false positives, thereby increasing the confidence of structure identification in MS-based non-targeted metabolomics.
Copyright 2018 the author Parts of chapter 3 appears to be the equivalent of a pre-print version of an article published as: Tyteca, E., Talebi, M., Amos, R., Park, S. H., Taraji, M., Wen, Y., Szucs, R., Pohl, C. A., Dolan, J. W., Haddad, P. R., 2017. Towards a chromatographic similarity index to establish localized quantitative structure-retention models for retention prediction: use of retention factor ratio, Journal of chromatography A, 1486, 50-58 Chapter 4 appears to be the equivalent of a pre-print version of an article published as: Wen, Y., Talebi, M., Amos, R. I. J., Szucs, R., Dolan, J. W., Pohl, C. A., Haddad, P. R., 2018, Retention prediction in reversed phase high performance liquid chromatography using quantitative structure-retention relationships applied to the Hydrophobic Subtraction Model, Journal of chromatography A, 1541, 1-11 Chapter 5 appears to be the equivalent of a pre-print version of an article published as: Wen, Y., Amos, R. I. J., Talebi, M., Szucs, R., Dolan, J. W., Pohl, C. A., Haddad, P. R., 2018. Retention index prediction using quantitative structure-retention relationships for improving structure identification in Non-Targeted Metabolomics, Analytical chemistry, 90(15), 9434-9440