Introduction
Basic Problem
Chemical toxicity can be associated with many hazardous biological effects such as gene damage, carcinogenicity, or induction of lethal human or animal diseases. The standard experimental protocols (so called in vivo methods) have been established by chemical industry, pharmaceutical companies, and government agencies to test chemicals for their toxic potential. More than 120,000 compounds have to be registered during next 10 years at the EU for implementation of new legislation concerning the registration, evaluation, authorization and restriction of chemicals (REACH). The German Federal Institute for Risk Assessment estimated that the legislation could lead to a demand for up to 45 million laboratory animals [1].
The computational predictions starting directly from the structures of molecules, so called in silico predictions, can provide a sound alternative to animal testing, save significant experimental costs as well as life of animals. The prediction of toxicity against animals, however, is a very complex task and it will not be directly addressed by this challenge. As a first step we will consider prediction of environmental toxicity of chemicals against T. pyriformis. The growth inhibition of ciliated protozoan T. pyriformis log(IGC50-1) is a commonly accepted toxicity-screening tool that has been under development and implementation by Prof. Schultz and co-workers for many years (see more information at http://www.vet.utk.edu/TETRATOX/).
Challenge
Why do we need experimental tests? Why we cannot easily develop computational tools to predict toxicity of chemicals? In fact, the chemical space has enormous dimensionality (estimated to be 1080-10100, i.e. more than number of atoms in the Universe). Any training dataset that could be used for model development covers just a tiny piece of the whole chemical space. Thus, in this field practically all approaches work with extrapolation rather than with interpolation of data, and the assumptions that test and training data are generated by the same distribution are generally not valid. This fact can lead to low prediction of even simple physico-chemical properties of molecules as demonstrated and discussed elsewhere [2]. Therefore, in silico methods may not predict with the same accuracy all molecules from the test set.
[1] Hofer, T.; Gerner, I.; Gundert-Remy, U.; Liebsch, M.; Schulte, A.; Spielmann, H.; Vogel, R.; Wettig, K., Animal testing and alternative approaches for the human health risk assessment under the proposed new European chemicals regulation. Arch. Toxicol. 2004, 78 (10), 549-64.
[2] Tetko, I. V.; Poda, G. I.; Ostermann, C.; Mannhold, R., Accurate In Silico log P Predictions: One Can't Embrace the Unembraceable. QSAR Comb. Sci. 2009, in press.