ER_antagonist_agonist.cnv
--------------------------------------------------------------------------------------
Illustration for classification problems. 

The ligand sets come from DUD database and contains ER agonists and antagonists. Baysian models were build for both descriptors and fingerprints. Clustering was based on the fingerprints. Try the different cluster levels and visualize the structures by hovering the mouse pointer on the [1] or [2]s in the panel. The cholesterol scaffold consistently sorted into one cluster. One can also separate the compounds by running SOM. Open up the SOM_all_descriptors, this is the SOM based on descriptors and choose the cell population to see two hot spots. These two hot spots contains the agonists and antagonist. Click in the cells around these hot spots you will find these are also either agonists or antagonists. Choose Distance to selected cell to see how these two hot spots are far away from each other. Pick up sombits_Linear and this SOM is not as clean as the descriptor SOM. There are isles of hot spots. Click on those to see the different structures. For the largest 3 cells I did MCS, in particularly cell 2 and 3 one can see the ER scaffold. It is clear that for this dataset the descriptors are better at classifying the compounds but the fingerprints are better at picking up the scaffolds. Since some scaffolds are shared between agonists and antagonists fingerprints are not good at classifying the compounds. 



 Models.cnv
------------------------------------------------------------------------------------------
Demo for prediction of a property. 

Two datasets from the validation collection were used here. One for Fxa and one for CDK2. Qikprop properties were also calculated for these compounds and only one protonation state is kept for each compound. The compounds are congeneric series. 

First of the all the compounds were separated into different view which is a good way to group compounds. There is a model for each of the sets and for each of the models. Worth looking into: MCS, where I have used different atom types for CDK2 compounds, mcs_cdk2-fn for 66 compounds, one can see the hinge binding motif; PCA: there is one for all compounds and if you pull up the plot then you can see two clusters which are CDK2 and Fxa actives. 
