Ready to dive into Tox Prediction?
The team put together an example code file, which you can play with on Google Colab without logging in or setting anything up. Click the button.
The video below introduces the concepts in the code, and is useful for both scientists and data scientists. Enjoy!
What did we learn?
-
We learned how to build classification and regression models for predicting hERG inhibition and mitochondrial toxicity, using features generated from chemical structures. We went through all steps of the process including data exploration, data cleaning, modelling and model evaluation. We also touched on how data cleaning in reality is much more nuanced, as its best not to mix data from incompatible datasets.
-
We learned how to evaluate machine learning models, and that metrics can be misleading, particularly with imbalanced datasets common in drug discovery - so its important to calculate a wide range of metrics to have a good idea of how your model is performing.
-
We learned how to tune model parameters to improve performance using Grid Searching and Cross Validation, and heard of some potential other approaches to improve the efficiency of parameter tuning.
-
We learned that model interpretation is an important part of AI for drug discovery, and generated some hypotheses for compound properties driving hERG inhibition by looking at the two most important features, LogP and TPSA.