zincidentifier

所属分类:人工智能/神经网络/深度学习
开发工具:Perl
文件大小:1519KB
下载次数:4
上传日期:2014-01-06 20:27:57
上 传 者逗乐的东北姑娘
说明:  蛋白质锌结合位点预测的源代码,适合生物信息学初学者。
(Zinc binding site prediction of protein source code, suitable for beginners bioinformatics.)

文件列表:
zincidentifier (0, 2012-08-31)
zincidentifier\.DS_Store (6148, 2012-08-31)
__MACOSX (0, 2012-08-31)
__MACOSX\zincidentifier (0, 2012-08-31)
__MACOSX\zincidentifier\._.DS_Store (225, 2012-08-31)
zincidentifier\download (0, 2012-08-29)
zincidentifier\download\apo_dataset.txt (203844, 2012-08-26)
__MACOSX\zincidentifier\download (0, 2012-08-31)
__MACOSX\zincidentifier\download\._apo_dataset.txt (225, 2012-08-26)
zincidentifier\download\benchmark_dataset.txt (914140, 2012-05-21)
__MACOSX\zincidentifier\download\._benchmark_dataset.txt (225, 2012-05-21)
zincidentifier\download\independent_test_dataset.txt (183126, 2012-05-21)
__MACOSX\zincidentifier\download\._independent_test_dataset.txt (225, 2012-05-21)
zincidentifier\download\input.txt (1541, 2012-05-21)
__MACOSX\zincidentifier\download\._input.txt (225, 2012-05-21)
zincidentifier\download\zinc_classifier.R (953, 2012-08-27)
__MACOSX\zincidentifier\download\._zinc_classifier.R (225, 2012-08-27)
zincidentifier\download\zincidentifier.pl (4466, 2012-08-27)
__MACOSX\zincidentifier\download\._zincidentifier.pl (225, 2012-08-27)
__MACOSX\zincidentifier\._download (225, 2012-08-29)
zincidentifier\feature_extraction_zinc.pl (7815, 2012-08-27)
__MACOSX\zincidentifier\._feature_extraction_zinc.pl (225, 2012-08-27)
zincidentifier\features (0, 2012-05-21)
zincidentifier\features\1A0B.CN (84325, 2011-11-26)
__MACOSX\zincidentifier\features (0, 2012-08-31)
__MACOSX\zincidentifier\features\._1A0B.CN (225, 2011-11-26)
zincidentifier\features\1A0B.hb2 (22540, 2011-10-26)
__MACOSX\zincidentifier\features\._1A0B.hb2 (225, 2011-10-26)
zincidentifier\features\1A0B.pdb (114372, 2011-09-07)
__MACOSX\zincidentifier\features\._1A0B.pdb (225, 2011-09-07)
zincidentifier\features\1A0B.rsa (9999, 2011-10-24)
__MACOSX\zincidentifier\features\._1A0B.rsa (225, 2011-10-24)
zincidentifier\features\1A0B_A.fasta.txt (159, 2012-05-21)
__MACOSX\zincidentifier\features\._1A0B_A.fasta.txt (225, 2012-05-21)
zincidentifier\features\net_param_1A0B.pdb_A_6.5_12.net (14070, 2011-10-17)
__MACOSX\zincidentifier\features\._net_param_1A0B.pdb_A_6.5_12.net (225, 2011-10-17)
zincidentifier\features\PSSM_1A0B_A.PSSM (20650, 2011-09-16)
__MACOSX\zincidentifier\features\._PSSM_1A0B_A.PSSM (225, 2011-09-16)
... ...

zincidentifier Usage Notes =================== Program and documentation is Copyright (C) 2012 Cheng Zheng and Jiangning Song. All rights reserved. THIS SOFTWARE MAY ONLY BE USED FOR NON-COMMERCIAL PURPOSES. PLEASE CONTACT THE AUTHOR IF YOU REQUIRE A LICENSE FOR COMMERCIAL USE. I. Installation Requirement ======= 1. R (You can download R at http://cran.r-project.org/mirrors.html) 2. Perl (You can download Perl at http://www.perl.org/get.html) 3. randomForest package (randomForest package can be installed by run the command "install.packages(randomForest)") II. Installation: ======= Download the datasets and sourcecodes in the same directory , and you can use it directly. III. Usage: ======= training_set.txt is a train file and input.txt is a test file. The data in the test file will be normalized firstly by zincidentifier.pl and generate the file input.txt.norm. Then the Random Forest-based classifier will be used to make the prediction and output a prediction score for each residue each time. The RF classifiers will be used 100 times to generate 100 ouptput scores. The zincidentifier.pl script calculates the average of these 100 predictions and generates a final score between "-1" to "1", where "-1" denotes non-zinc binding residue and "1" denotes zinc-binding residue. For example, run zincidentifier using the following command: perl zincidentifier.pl input.txt The result file will be generated in the current directory. IV. Additional notice: ======= The input file feature order is: 1. Sample ID, 2. PSSM_5_H, 3. Conservation_score_V5, 4. All_polar_abs_V5, 5. PSSM_5_C, 6. ChainCHED_H_(P), 7. ChainCHED_D_(P), 8. Ex_CN_V5, 9. Hbplus_V5, 10. ChainCHED_E_(P), 11. Chain_E_sum, 12. Chain_CHED_sum, 13. Chain_residue_H_(P), 14. Chain_length, 15. Net_Closen_Cent_V9, If you have any questions about the selected 14 features, see the "Feature extraction" and "Feature importance and contribution" Section in Our Research Paper for detail. V. Feature extraction ======= All_polar_abs_V5 1. Download the software "naccess" form http://www.bioinf.manchester.ac.uk/naccess/ and sign the "Confidentiality Agreement"; 2. Compile the software "naccess" and run the command like "naccess example.pdb". It will produce the files such as "*.asa","*.log" and "*.rsa"; 3. Move the "*.rsa" file to the directory "features". Ex_CN_V5 1. Download the software "biopython" from http://www.biopython.org and follow the guidence of biopython to install it. 2. Find the script "hsexpo" in the directory "biopython/Script/Structure/hsexpo" and run the command like "hsexpo -t CN example.pdb -o example.CN". 3. Copy the "example.CN" to the directory "features". PSSM_5_C,PSSM_5_H 1. Download the software "BLAST" from ftp://ftp.ncbi.nih.gov/blast/ and non-redundant database from ftp://ftp.ncbi.nih.gov/blast/. 2. Install the "BLAST" and run the command like "blastpgp -d nr -i example.txt -j 3 -Q example.pssm". 3. Copy the "example.PSSM" to the directory "features". Notice: The file name must be end of ".PSSM" in order to run the perl file "feature_extraction_zinc.pl". Conservation_score_V5 This conservation score was directly derived from the PSSM generated by PSI-BLAST. You can find more detail in the perl script "feature_extraction_zinc.pl". Hbplus_V5 1. If you would like to install HBPLUS, please download the confidentiality agreement from http://www.biochem.ucl.ac.uk/~mcdonald/hbplus/, sign the agreement, and email it to the author (roman@ebi.ac.uk). 2. Then, you shall be able to install "hbplus" and run the command like "hbplus -h 2.7 -d 3.35 example.pdb". This will generate the output file called example.hb2 which lists all the hydrogen bonds. 3. Move the "example.hb2" to the directory "features". Notice: This software could not calculate PDB file with a large amount of residues. You can modify the value of MAXNRES in the file "hbplus.h". As far as I am concerned, I have set it as "#define MAXNRES 12000" which seems to work fine on all current entries in the PDB. Chain_length,Chain_E_sum,Chain_CHED_sum Download the fasta file of the chain from http://www.pdb.org. Calculate Chain_length (the number of all residues in the chain), Chain_E_sum (the numbers of Glu (E) in the chain) and Chain_CHED_sum ( the numbers of CHED in the chain) directly from the fasta file. You can refer to the perl script "feature_extraction_zinc.pl" for more detail. ChainCHED_H_(P), ChainCHED_D_(P), ChainCHED_E_(P), Chain_residue_H_(P) ChainCHED_H_(P)=Chain_H_sum/Chain_CHED_sum; ChainCHED_D_(P)=Chain_D_sum/Chain_CHED_sum; ChainCHED_E_(P)=Chain_E_sum/Chain_CHED_sum; Chain_residue_H_(P)=Chain_H_sum/Chain_length; Contact ======= If you need assistance in getting zincidentifier working, or if you have any comments and suggestions or find any bugs, please contact the authors at the following address: Cheng Zheng Laboratory of Structural Bioinformatics and Integrative Systems Biology Tianjin Institute of Industrial Biotechnology Chinese Academy of Sciences Tianjin 300308 CHINA E-mail: zheng_c@tib.cas.cn

近期下载者

相关文件


收藏者