* Perl is a stable, cross platform programming language.
* Perl stands for Practical Extraction and Report Language.
* It is used for mission critical projects in the public and private
sectors.
* Perl is Open Source software, licensed under its Artistic
* Perl was created by Larry Wall.
* Perl 1.0 was released to usenet's alt.comp.sources in 1987
* PC Magazine named Perl a finalist for its 1998 Technical
Excellence Award in the Development Tool category.
* Perl is listed in the Oxford English Dictionary.
Supported Operating Systems:
* Unix systems
* Macintosh - (OS 7-9 and X) see The MacPerl Pages.
* Windows - see ActiveState Tools Corp.
* VMS
* And many more...
Best Features Of Perl :
* Perl takes the best features from other languages, such as C, awk,
sed, sh, and BASIC, among others.
* Perls database integration interface supports third-party databases including Oracle, Sybase, Postgres MySQL and others.
* Perl works with HTML, XML, and other mark-up languages.
* Perl supports Unicode.
* Perl is Y2K compliant.
* Perl supports both procedural and object-oriented programming.
* Perl interfaces with external C/C++ libraries through XS or SWIG.
* Perl is extensible. There are over 500 third party modules available
from the Comprehensive Perl Archive Network.
* The Perl interpreter can be embedded into
other systems.
PERL and the Web
* Perl is the most popular web programming language due to its text
manipulation capabilities and rapid development cycle.
* Perl is widely known as " the duct-tape of the Internet.
* Perl's CGI.pm module, part of Perl's standard distribution, makes
handling HTML forms simple.
* Perl can handle encrypted Web data, including e-commerce transactions.
* Perl can be embedded into web servers to speed up processing by as
much as 2000%.
* mod_perl allows the Apache web server to embed a Perl interpreter.
* Perl's DBI package makes web-database integration easy.
Bioinformatics is a method to solve the Biological outcomes based on existing experimental results.
Bioinformatics = Biology + Informatics + Statistics + (Bio-Chemistry + Bio- Physics).
Bioinformatics creates the way for the Biologists to store all the data.
Bioinformatics makes some lab experiments easy by predicting the outcome of the lab experiment.
Somtimes Bioinformatics shows the initial way to start the lab experiment from existing results.
Bioinformatics helps the researchers to get an idea about any lab experiments before they start.
- GLIMMER - To identify coding regions in microbial DNA.
- GeneScan - To predict complete gene structures, including exons, introns, promoter and poly-adenylation signals, in genomic sequences
- GeneMark - For finding genes in bacterial DNA sequences.
- WebGene
- Web interface for several coding region recognition programs.
The local version is significant when we have a large set of sequences to BLAST and this is not affected by the Internet speed /Traffic etc and it can be automated.
The stand alone blast can be downloaded from the NCBI FTP site (The link can be found at the bottom side tool bar in the NCBI main page “FTP Site-> Blast-> executables->Latest”).
The file should be in binary mode. Filenames are of the following form:
jk@jk:~/Desktop/blast-2.2.18/bin$ gunzip blast-2.2.18-ia32-linux.tar.gz #uncompress
jk@jk:~/Desktop/blast-2.2.18/bin$ tar -xpf blast-2.2.18-ia32-linux.tar #extract
For more information on the options look into $man tar/gunzip.
How to execute bl2seq (BLAST two sequence):
The input files to any BLAST softwares should always be in FASTA format.
eg
>gi|229673|pdb|1ALC| Alpha-Lactalbumin
KQFTKCELSQNLYDIDGYGRIALPELICTMFHTSGYDTQAIVENDESTEYGLFQISNALWCKSSQSPQSR
NICDITCDKFLDDDITDDIMCAKKILDIKGIDYWIAHKALCTEKLEQWLCEKE
Syntax:
jk@jk:~/Desktop/blast-2.2.18/bin$ ./bl2seq - # Displays all options
jk@jk:~/Desktop/blast-2.2.18/bin$ ./bl2seq -p blastp -e 0.01 -i
-i First sequence [File In]
-j Second sequence [File In]
-p Program name: blastp, blastn, blastx, tblastn, tblastx. For blastx 1st sequence should be nucleotide, tblastn 2nd sequence nucleotide.
-e E-Value # (optional)
How to execute Blastall:
go to NCBI-> FTP site-> RefSeq-> H_sapiens-> H_sapiens ->chr22.
Note:
>gi|86438068|gb|AAI12638.1| HGD protein [Bos taurus]
MTELKYISGFGNECASEDPRCPGALPEGQNNPQVCPYNLYAEQLSGSAFTCPRSTNKRSWLYRILPSVSH
KPFEFIDQGHITHNWD
>gi|116283875|gb|AAH44758.1| Hgd protein [Mus musculus]
MSVLQRILAVQVPCPKDSWLYRILPSVSHKPFESIDQGHVTHNWDEVGPDPNQLRWKPFEIPKASEKKVD
FVSGLYTLCGAGDIKSNNGLAVHIFLCNSSMENRCFYNSDGDFLIVPQKGKLLIYTEFGKMSLQPNEICV
>gi|116283724|gb|AAH24369.1| Hgd protein [Mus musculus]
MSVLQRILAVQVPCPKDSWLYRILPSVSHKPFESIDQGHVTHNWDEVGPDPNQLRWKPFEIPKASEKKVD
Formatdb:
jk@jk:~/Desktop/blast-2.2.18/bin$ ./formatdb - # displays all options
jk@jk:~/Desktop/blast-2.2.18/bin$ ./blast-2.2.18/bin/formatdb -i
-i Input file(s) for formatting (this parameter must be set) [File In]
-p Type of file T - protein F - nucleotide (default = T)
-o Parse options T - True: Parse SeqId and create indexes. F - False: Do not parse SeqId. ( default = F)
2. Executing Blastall:
jk@jk:~/Desktop/blast-2.2.18/bin$ ./blastall -i
-p Program Name [String] Input should be one of "blastp", "blastn", "blastx", "tblastn", or "tblastx".
-d Database [String] default = nr The database specified must first be formatted with formatdb.
-i Query File [File In]
-o BLAST report Output File [File Out]
The output file will contain the BLAST output for all the input query sequences.
a. Perl::Tidy
When a Perl script is given as an input to perltidy, it creates a intended, structured Perl script and saves it as a separate file using the same name but with a .ty extension. Perltidy does not change the input script.
Steps to follow,
1. Install Perl::Tidy. It can be run on any system with perl 5.004 or later and used on Unix, Windows, VMS and MacPerl.
2. To execute perltidy,
$ perltidy -[option] test_perl_script.pl
This will create a temporary file test_perl_.pl.ty. The test_perl_script.pl .ty file will contain the well structured perl script. There are many options that can be used indent, to take a back-up etc. For more information on installation and execution see, http://perltidy.sourceforge.net/tutorial.html
b. Perl::Critic
Perl-Critic criticizes/analyses the input Perl script and enforces the user to follow various coding guidelines (or policies). The coding guidelines are based on Damian Conway's book Perl Best Practices. The user can enable/disable or create and customize the modules through the Perl::Critic interface.
The user can set the severity levels. There are 5 severity levels: severity "5" is the most or least restrictive level ie Perl::Critic follow the basic policies/guidelines. The five levels are Gentle (equivalent to 5), stern (equivalent to 4), harsh (equivalent to 3), cruel (equivalent to 2), brutal (equivalent to 1).
Perl::Critic requires a few modules to be pre-installed for it to execute. See http://search.cpan.org/~elliotjs/Perl-Critic-1.082/lib/Perl/Critic.pm
Steps to follow,
1. Install Perl::Critic.
2. Execute Perl::Critic
$ perlcritic –1 test_perl_script.pl
For more information see, http://search.cpan.org/dist/Perl-Critic/
2. File handling is easy in Perl.
3. Perl regular expression is very flexible and easy to match similar patters rather than identical ones. It can be used in instance like matching a motif or a repeat in a sequence.
4. There are no strict rules for writing Perl scripts like other languages. That makes it easy for the biologist to learn Perl in short period.
5. Perl scripts can be combined with SHELL scripts for text processing.
6. Using Perl CGI and HTML one can develop the Web pages. Perl CGI is very similar to Perl scripts.
7. CPAN contains hundreds of Perl Modules which are Specific for sequence analysis.
Eg: FASTAParse , Peptide::Pubmed .
8. Perl can be used for System administration purpose also.
9. Perl Template tool kit is another Perl product which can be used for developing advanced web pages.
10. Using perl DBIx it is easier to pass mysql data (backend) to the web page(front end).
11. Processing / Parsing a HTML file is very easy by using CPAN modules.
12. File type conversion is possible in Perl using CPAN modules. Ex:Doc to PDF ,HTML to PDF ..Etc.
13. By using Perl Magick module we can do image processing.
14. Perl critic module will help you to write a best Perl codes by criticizing your code structure.
It's method to predict the biological outcomes before anyone go for full fledged research. It's a method to compare the biological data. Ex: sequence analysis. It's a way to predict or solve the protein structure.
It's the only way for PERSONALIZED MEDICINE in this post genomic era. It's the method to do comparative genomics and predict the Human homolog genes in other species.
It's the method to annotate the newly sequenced genomes.
How the biological problems can be predicted ?
We are living in the world of Computers. By analyzing the existing biological data using Information Technology we can predict the biological outcomes.
What is the HOTTEST branch of Bioinformaics in this post genomic era ?
Personalized medicine is the most hottest and fastest growing field. Personalized medicine can be achieved through bioinformatics only.