Download data
Provided Data
The training and known test set data were used in our previous studies [1-2]. The new "blind" data set was courtesy provided by Prof. Schultz. Thus, three datasets are available:
The experimental data for the Blind set has not been yet previously published and will be available by Prof. Schultz only after September 1st. These data cover the structural domain defined by the training dataset. The environmental toxicity is measured as log(IGC50-1) (see also Introduction).
Structural information of molecules (SMILES) and several sets of descriptors of molecules are available:
The descriptors can be used to develop models. The participants can also extend the descriptors. The structures of molecules are provided as:
SMILES format -- Excel file also includes values for the BLIND set
MOL2 format -- 3d.tar file with 3D optimized structures
N.B.! Please, notice that DRAGON descriptors were updated on Tuesday July, 28th.
E-state indices |
DRAGON |
SimulationsPlus |
QuantumChemistry |
MOE |
---|---|---|---|---|
Excel -- download | not provided [4] | Excel -- download | Excel -- download | Excel -- download |
ARFF -- Training ARFF -- Known Test ARFF -- Blind Test TEXT -- Training |
ARFF -- Training ARFF -- Known Test ARFF -- Blind Test TEXT -- Training |
ARFF -- Training ARFF -- Known Test ARFF -- Blind Test TEXT -- Training |
ARFF -- Training ARFF -- Known Test ARFF -- Blind Test TEXT -- Training |
ARFF -- Training ARFF -- Known Test ARFF -- Blind Test TEXT -- Training |
[1] Zhu, H.; Tropsha, A.; Fourches, D.; Varnek, A.; Papa, E.; Gramatica, P.; Oberg, T.; Dao, P.; Cherkasov, A.; Tetko, I. V. Combinatorial QSAR Modeling of Chemical Toxicants Tested against Tetrahymena pyriformis J. Chem. Inf. Model. 2008, 48 (4), 766-784.
[2] Tetko, I. V.; Sushko, I.; Pandey, A. K.; Zhu, H.; Tropsha, A.; Papa, E.; Oberg, T.; Todeschini, R.; Fourches, D.; Varnek, A., Critical assessment of QSAR models of environmental toxicity against Tetrahymena pyriformis: focusing on applicability domain and overfitting by variable selection J. Chem. Inf. Model. 2008, 48 (9), 1733-46.
[3] MOE (The Molecular Operating Environment) Version 2008.10, software available from Chemical Computing Group Inc., 1010 Sherbrooke Street West, Suite 910, Montreal, Canada H3A 2R7. http://www.chemcomp.com
[4] Dragon descriptors are not provided in this format due to the limitations on a maximum number of columns in Excel.