The streamlined \scoping\" of analytical methods leading to reductions in time and cost is one of major challenges in chromatographic method development in the pharmaceutical and other industries. The screening (or scoping) of initial chromatographic parameters such as stationary phase candidates and rough mobile phase conditions can be accelerated by the in-silico prediction of chromatographic retention. Quantitative Structure-Retention Relationships (QSRRs) which allow the prediction of retention time of analytes based only on their chemical structures offers an attractive approach to speed up the scoping phase of method development by minimising the number of initial experiments required at this stage. This present study presents an investigation on the establishment of QSRR models leading to rapid robust and rugged scoping method development in ion chromatography (IC) which is a well established technique for the analysis of inorganic and organic ions. This study commenced with updating by a new \"porting\" methodology the retention data of anions embedded in the Virtual Column¬¨vÜ Software (Thermo Fisher Scientific Sunnyvale CA USA) which is a commercial tool for simulating and optimising IC separations. The retention data of inorganic and small organic anions on three Thermo Fisher Scientific columns (49 ions on AS20 41 on AS19 and 40 on AS11HC) were recalibrated by the porting equations which express the relationship between old (or embedded) and new (or updated) retention data. The three porting equations on each column under study were derived by performing three isocratic separations with six representative analyte ions (chloride bromide iodide perchlorate sulfate and thiosulfate). Average errors in retention times for ten test anions under three eluent concentrations on the three columns were found to be less than 1.3% showing that the updated retention data were accurate and reliable enough to be employed as anion datasets for the subsequent QSRR studies. The QSRR approach was integrated with a well-known linear solvent strength (LSS) model in IC which correlates retention factor with eluent concentration: logk = a ‚Äö- blog[eluent]. The prediction of the two retention parameters (a and b) in this model has a great advantage in that it would allow the prediction of retention times (t\\(_R\\)) of ions under all eluent compositions. The first QSRR study was performed on the abovementioned updated retention data of inorganic and small organic anions. Molecular descriptors were calculated for each test analyte and an evolutionary algorithm (EA) was employed to extract the most relevant molecular descriptors followed by multiple linear regression (MLR) to generate QSRR models correlating the two parameters (a and b) with the selected molecular descriptors. Six QSRR models (a and b respectively on the three columns AS20 AS19 and AS11HC) were successfully generated resulting in good predictive performance for a- and b-values (R\\(^2\\) >0.98 and root-mean-square error (RMSE) <0.11 for training sets; Q\\(_{ext(F3)}\\)\\(^2\\) >0.7 and root-meansquare error of prediction (RMSEP) <0.4 for external test sets) and hence t\\(_R\\)-values (R\\(^2\\) of 0.98 RMSE of 0.89 min Q\\(_{ext(F3)}\\)\\(^2\\) of 0.96 and RMSEP of 1.18 min). The main objective in QSRR modelling is to build models with high predictive power allowing reliable retention prediction for the unknown compounds across the chromatographic space. With the aim of enhancing the predictive power of the models an approach called \"federation of local models\" was employed in generating QSRR models where for each target ion a local model is created using only either structurally or chromatographically similar ions from the dataset as opposed to the classical approach where all the compounds in the dataset are used to derive a global‚ÄövÑvp predictive model. The role of similarity in QSRR in IC was investigated systematically by employing a dataset (on Thermo Fisher Scientific CS17 column) consisting of larger molecular weight cations (molecular mass up to 507) being of interest in the pharmaceutical industry along with the anion datasets used above. The QSRR models for a- and b-values were developed by employing a genetic algorithm-partial least squares (GA-PLS) regression tool for descriptor selection and modelling. This approach was preferred to MLR in order to minimise problems of chance correlation over-fitting and multi-collinearity of descriptors associated with modelling datasets with a much greater number of descriptors than samples. For this purpose similar ions to each target ion were clustered into training sets prior to QSRR modelling by utilising various similarity measures such as Tanimoto similarity (TS) index the retention factor ratio (or k-ratio) or the combination of TS and k-ratio. First the pair-wise TS score was utilised. The TS is a popular similarity measure based on a comparison of 2-dimentional fingerprints existing between two molecules and the TS score varies between 0 to 1 with 1 representing 100% similarity. When applying a TS threshold of 0.6 predictive and reliable QSRR models were produced Q\\(_{ext(F2)}\\)\\(^2\\) > 0.8 and RMSEP < 0.1) and hence accurate retention time predictions with a RMSEP of 0.44 min. Second the k-ratio proposed as a chromatographic similarity index was investigated. Ions eluted in the vicinity of any given target ion would be assumed to be chromatographically similar to the target ion and these ions would exhibit k-ratio values close to 1. In addition to successful QSRR modelling of inorganic and small organic anions on two columns (AS20 and AS19) the k-ratio-based ion clustering provided the most accurate QSRR models for cations on CS17 column with excellent predictive power for retention time prediction Q\\(_{ext(F2)}\\)\\(^2\\) of 0.99 and RMSEP of 0.22 min). However the k-ratio method is impractical for retention prediction of unknown ions due to the lack of the retention information for those ions. In order to address this limitation a dual filter-based localised QSRR modelling approach was developed where a k-ratio (<1.2) index was combined with both a TS (>0.5) and a ‚Äöv†vú logP (<0.4) index. Predictive QSRR models for a- b- and tR- values were created successfully Q\\(_{ext(F2)}\\)\\(^2\\) of 0.960.95 and 0.96 RMSEP of 0.060.02 and 0.38 min). Additionally the dual filter-based clustering method allowed more ions (50 ions) to be eligible for QSRR modelling in comparison to the TS approach (22 ions)."
Copyright 2017 the author Chapter 3 appears to be the equivalent of a post-print version of an article published as: Park, S. H., Shellie, R. A., Dicinoski, G. W., Schuster, G., Talebi, M., Haddad, P. R., Szucs, R., Dolan, J. W., Pohl, C. A., 2016. Enhanced methodology for porting ion chromatography retention data, Journal of chromatography A, 1436, 59-63 Chapter 4 appears to be the equivalent of a post-print version of an article published as: Park, S. H., Haddad, P. R., Talebi, M., Tyteca, E., Amos, R. I. J., Szucs, R., Dolan, J. W., Pohl, C. A., 2016. Retention prediction of low molecular weight anions in ion chromatography based on quantitative structure-retention relationships applied to the linear solvent strength model, Journal of chromatography A, 1486, 68-75 Chapter 5 appears to be the equivalent of a post-print version of an article published as: Park, S. H.,Talebi, M., Amos, R. I. J., Tyteca, E., Haddad, P. R., Szucs, R., Pohl, C. A., Dolan, J. W., 2016.Towards a chromatographic similarity index to establish localised quantitative structure-retention relationships for retention prediction. II Use of Tanimoto similarity index in ion chromatography, Journal of chromatography A, 1523, 173-182 Chapter 6 appears to be the equivalent of a post-print version of an article published as: Park, S. H., Haddad, P. R., Amos, R. I. J., Talebi, M., Szucs, R., Pohl, C. A., Dolan, J. W., 2017. Towards a chromatographic similarity index to establish localised Quantitative Structure-Retention Relationships for retention prediction. III Combination of Tanimoto similarity index, logP, and retention factor ratio to identify optimal analyte training sets for ion chromatography, Journal of chromatography A, 107-116