3D Cluster Analysis
First of all, my apologies for the less than professional code in this release,
which reflects my limited programming skills. The program runs slow but gets
the job done. The scripts are not necessarily user-friendly and may require a
bit of work on the part of the input files or even the scripts themselves.
The following archives contain all necessary files:
3DCA.zip in pkzip format
Or
3DCA.tar.gz in tar and gzipped
The basic reference for 3DCA is:
Landgraf R., Xenarios I., Eisenberg D.,
" Three-dimensional Cluster Analysis Identifies Interfaces and Functional residue Clusters in Proteins ", Journal of Molecular Biology, 307 , 1487-1502 (2001)
Please cite this reference in publications.
Additional software needed:
· FASTA or a similar software to identify homologous sequences (The input
format is that of FASTA)
· Alignment software (GCG pileup or clustalw) You may want to use other
programs, but the input format for 3DCA has to match either GCG or clustalW
· Visualization: The scripts are written to use Rasmol scripts for easy
visualization (you can get the raw data without Rasmol or visualization in
general but obviously visualization of scores and structure helps a lot)
I added a PC version of rasmol and ClustalW (dos version) to this zip file. The
DOS version of clustalW is the one that comes bundled with the Bioedit package
which is available free of charge. Please check for any applicable copyright
regulations or limitations of usage for those programs.
Rasmol for different platforms is also available here:
RASMOL
Bioedit (including clustalw) is available here:
BIOEDIT
The files required for 3D cluster analysis (3DCA) are written in PERL and
have been written in such a way that they may be executed under DOS/Windows
running PERL or a UNIX/LINUX operating system. I have made an effort to
change the names of files and variables as much as possible to make the
program transparent to the user. I checked out those "modifications", but
these retroactive changes may have introduced a bug or two.
The scripts were written to carry out 3DCA analysis as automated as
possible, but some problems that require manual intervention keep on
occurring . There are plenty of print commands throughout the script that I
commented out. Those should aid in trouble shooting these problems. In
general, if you follow the procedure outlined in the Readme file, you should
be able to get the scripts to run. Make sure to check the intermediate
output/input files at every stage to ensure that seem to have the right
format. Almost all problems in execution trace back to problems with input
files. Some are the results of undefined areas of a structure or
interruptions in the numbering scheme of residues. Sometimes, this requires
some "tweaking" of the PDB input files, which is different from case to
case. The most common problem with the input of FASTA data is the use of the
correct search tag for sequences in the original FASTA output file.
Unfortunately, this differs between different FASTA implementations.
However, the input for 3DCA is a simple list of sequences in FASTA format,
which you can generate in any way you want. Many of the more "clumsy"
features of the program are also a reflection of the fact that the scripts
were originally written for various different platforms.
Ralf Landgraf
rlandgraf@mednet.ucla.edu
www.rlandgraf.med.ucla.edu
|
|