10.1. available at the R-project homepage [42]. Peak lists were aligned by
the msc.peaks.align command of caMassClass and transformed into a binary mass table where rows represented all unique masses of the aligned spectra set and every column represented the spectrum of one sample. The size of the mass ranges defining a unique peak in the alignment, designated as bin size, was restricted to a maximum of 2,000 ppm. Among other features, LXH254 nmr the algorithm of the msc.peaks.align command minimizes the bin size in the given range, maximizes the space between bins and ensures that no two peaks of the same spectrum are in the same bin. For the calculation of qualitative data, the presence of the respective mass in the spectrum of a sample was marked
with 1, absence with 0, i.e. all mass intensities were removed. These tables were the basis for the calculation of distances (R-routine ‘dist’, parameter ‘binary’ for the distance measure) which were used for the construction of cladograms, Sammon plots [43], and k-means cluster analysis using the R-routines ‘hclust’ (parameter ‘ward’ for the agglomeration method) [44], ‘sammon’ (used with default settings) and ‘kmeans’ (three initial cluster centers, maximum of 100 iterations, Hartigan-Wong algorithm [45]). Statistical analysis with ClinProTools software Raw spectra from the specimens in Table 3 were imported into ClinProTools 3.0 software for statistical Alisertib analysis. Each species was represented by 20 to 24 spectra to cover measurement variability. The multiple spectra of multiple species were imported as a “class” for the respective species. ClinProTools preformed a normalization and recalibration of mass spectra before further analysis, thereby reducing measurement variability effects significantly. Peak picking was performed based on the overall average spectrum over the whole mass range (signal to noise threshold of 5). Further spectra processing
parameters were: baseline correction (convex hull), resolution (300 ppm), smoothing (Savitzky Golay, 5 cycles with 2 m/z width), Multivariate statistical analyses were performed using the four supervised algorithms and PCA which are implemented in ClinProTools. For the Genetic Algorithm, models with maximum 5 peaks and 50 generations were calculated and k-nearest neighbor (kNN) classification was performed with 5 neighbors. Orotic acid Also for Support Vector Machine the maximum number of peaks was set to 5 and kNN classification was performed with 5 neighbors. Supervised Neural Network was calculated with automated optimization of peak number, maximum 25. For the Quick Classifier, a maximum number of differentiating peaks of 25 was allowed; selection of peaks was based on ranking in t-test. For PCA, “level” scaling was selected. Acknowledgements We are grateful to Gabi Echle, Katja Fischer, Michaela Ganss, and Robert Schneider for their excellent technical assistance. This work was supported by the EU, EAHC Agreement – No 2007 204. References 1.