Bioinformatics
Part of a series on |
Evolutionary biology |
---|
Part of a series on |
Biochemistry |
---|
Key components |
History and topics |
Glossaries |
Portals: Biochemistry |
Contents
Introduction[edit]
History[edit]
Sequences[edit]
Goals[edit]
- Development and implementation of computer programs that enable efficient access to, management and use of, various types of information
- Development of new algorithms (mathematical formulas) and statistical measures that assess relationships among members of large data sets. For example, there are methods to locate a gene within a sequence, to predict protein structure and/or function, and to cluster protein sequences into families of related sequences.
Relation to other fields[edit]
Sequence analysis[edit]
DNA sequencing[edit]
Sequence assembly[edit]
Genome annotation[edit]
Computational evolutionary biology[edit]
- trace the evolution of a large number of organisms by measuring changes in their DNA, rather than through physical taxonomy or physiological observations alone,
- compare entire genomes, which permits the study of more complex evolutionary events, such as gene duplication, horizontal gene transfer, and the prediction of factors important in bacterial speciation,
- build complex computational population genetics models to predict the outcome of the system over time[20]
- track and share information on an increasingly large number of species and organisms
Comparative genomics[edit]
Pan genomics[edit]
Genetics of disease[edit]
Analysis of mutations in cancer[edit]
Gene and protein expression[edit]
Analysis of gene expression[edit]
Analysis of protein expression[edit]
Analysis of regulation[edit]
Analysis of cellular organization[edit]
Microscopy and image analysis[edit]
Protein localization[edit]
Nuclear organization of chromatin[edit]
Structural bioinformatics[edit]
Network and systems biology[edit]
Molecular interaction networks[edit]
Others[edit]
Literature analysis[edit]
- Abbreviation recognition – identify the long-form and abbreviation of biological terms
- Named entity recognition – recognizing biological terms such as gene names
- Protein–protein interaction – identify which proteins interact with which proteins from text
High-throughput image analysis[edit]
- high-throughput and high-fidelity quantification and sub-cellular localization (high-content screening, cytohistopathology, Bioimage informatics)
- morphometrics
- clinical image analysis and visualization
- determining the real-time air-flow patterns in breathing lungs of living animals
- quantifying occlusion size in real-time imagery from the development of and recovery during arterial injury
- making behavioral observations from extended video recordings of laboratory animals
- infrared measurements for metabolic activity determination
- inferring clone overlaps in DNA mapping, e.g. the Sulston score
High-throughput single cell data analysis[edit]
Biodiversity informatics[edit]
Ontologies and data integration[edit]
Databases[edit]
- Used in biological sequence analysis: Genbank, UniProt
- Used in structure analysis: Protein Data Bank (PDB)
- Used in finding Protein Families and Motif Finding: InterPro, Pfam
- Used for Next Generation Sequencing: Sequence Read Archive
- Used in Network Analysis: Metabolic Pathway Databases (KEGG, BioCyc), Interaction Analysis Databases, Functional Networks
- Used in design of synthetic genetic circuits: GenoCAD
Software and tools[edit]
Open-source bioinformatics software[edit]
Web services in bioinformatics[edit]
Bioinformatics workflow management systems[edit]
- provide an easy-to-use environment for individual application scientists themselves to create their own workflows,
- provide interactive tools for the scientists enabling them to execute their workflows and view their results in real-time,
- simplify the process of sharing and reusing workflows between the scientists, and
- enable scientists to track the provenance of the workflow execution results and the workflow creation steps.
BioCompute and BioCompute Objects[edit]
Education platforms[edit]
Conferences[edit]
Bioinformatics
- Introduction
- The data of bioinformatics
- Storage and retrieval of data
- Goals of bioinformatics
The data of bioinformatics
Storage and retrieval of data
Goals of bioinformatics
Abstract
Keywords
- bioinformatics
- databases
- molecular sequence analysis
- software and analysis tools
- bioinformatics training
Chapter and author info
Show +1. Introduction
1.1. History of emergence and development
2. Bioinformatics help in handling and analysis of the genomics data, genome annotation, and expression profiling
3. Structural bioinformatics: molecular folding, modeling, and design
4. Biological networks and system biology
5. Databases
6. Software, analysis tools, services, and workflow
7. Text mining
8. Education
9. Conclusions and future perspectives
Abstract
BACKGROUND:
OBJECTIVES:
METHODS:
RESULTS AND CONCLUSIONS:
Abstract
What is bioinformatics?
Distinction from medical informatics
Meaning of Bioinformatics:
Branches of Bioinformatics:
Applications of Bioinformatics in Crop Improvement:
Advantages of Bioinformatics:
Limitations of Bioinformatics:
Bioinformatics is the use of mathematical, statistical and computer methods to analyze biological, biochemical, and biophysical data. Because bioinformatics is a young, rapidly evolving field, however, it also has a number of other credible definitions. It can also be defined as the science and technology of learning, managing, and processing biological information. Bioinformatics is often focused on obtaining biologically oriented data, organizing this information into databases, developing methods to get useful information from such databases, and devising methods to integrate related data from disparate sources. The computer databases and algorithms are developed to speed up and enhance biological research.
Bioinformatics can help answer such questions as whether a newly analyzed gene is similar to any previously known gene, whether a protein's sequence can suggest how the protein functions, and whether the genes turned on in a cancer cell are different from those turned on in a healthy cell.
Databases And Analysis Programs
A good deal of the early work in bioinformatics focused on processing and analyzing gene and protein sequences catalogued in databases such as GenBank, EMBL, and SWISS-PROT. Such databases were developed in academia or by government-sponsored groups and served as repositories where scientists could store and share their sequence data with other researchers. With the start of the Human Genome Project in 1990, efforts in bioinformatics intensified, rising to the challenge of handling the large amounts of DNA sequence data being generated at an unprecedented rate. By the midto late-1990s, much of the efforts in bioinformatics centered around genomic data, generated by the Human Genome Project and by private companies, and around proteomic data.
Early analysis of sequence information focused on looking for similarities between genes and between proteins. Algorithms were developed to help researchers rapidly identify similar gene or protein sequences. Such tools were extremely useful for determining whether a newly sequenced piece of DNA was at all similar to sequences already entered in a database. To determine how multiple sequences align and to view their similarities, multiplealignment programs were developed. Such programs helped scientists compare the sequences of closely related genes or compare the sequence of a particular gene or protein as it appears in several species.
To better understand the functional roles of new nucleotide and amino acid sequences, researchers developed algorithms to look for particular sequence "domains." Domains are regions where a particular sequence of nucleotides or amino acids is indicative of function in the protein. For example, a protein may have a domain that binds to ATP or GTP, two important protein regulators.
In addition, these algorithms can detect sequences that denote a region involved in particular types of post-translational modifications, such as tyrosine phosphorylation . Tools such as prosite, blocks, prints, and Pfam can be used to detect and predict such protein domains in sequence data.
Structure is central to protein function, and another set of tools, including SWISS-MODEL, allows researchers to use gene and protein sequence data to predict a protein's three-dimensional structure. Such tools can help predict how mutations in a gene sequence could alter the three-dimensional structure of the corresponding protein. They accomplish such molecular modeling by comparing a novel sequence to the sequences of genes whose protein structures are known.
The majority of tools were developed as academic freeware distributed on the Internet. In the early-to mid-1990s, commercial companies began to develop their own proprietary algorithms and tools, as well as their own proprietary databases. Those databases were then marketed to pharmaceutical and biotech companies as well as to academic research groups. The most commercially viable and profitable businesses focused on the production and sale of proprietary DNA-and gene-sequence databases in the mid-to late-1990s. These databases primarily contained genetic information that were not in the public domain databases, such as GenBank, and they thus offered potential competitive advantages to the drug discovery groups of large pharmaceutical and biotech companies.
Applications Of Bioinformatics To Drug Discovery
The application of bioinformatics to genomics data could be a huge potential boon for the discovery of new drugs. During the 1990s many pharmaceutical companies and biotech companies became convinced that they could speed up their drug-discovery pipelines by taking advantage of the data from the Human Genome Project as well as by funding their own internal genomics programs and by collaborating with third-party genomics companies.
The goal in such practical applications is to use such data as DNA sequence information and gene expression levels to help discover new drug targets. The vast majority of drugs target proteins, but there are a handful of drugs, such as some chemotherapeutic agents, that bind to DNA. In cases where the target is a protein, the drugs themselves are primarily small chemical molecules or, in some cases, small proteins, such as hormones, that bind to a larger protein in the body. Some drugs are therapeutic proteins delivered to the site of the disease.
The extent to which genomics will actually be able to help identify validated drug targets is uncertain. Genomics and bioinformatics are still young areas, and the drug development cycle can take up to ten years. As of 2001 relatively few of the drugs on the market or in the late stages of clinical trials were discovered via genomics or bioinformatics programs.
Specialists
Bioinformatics is applied to at least five major types of activities: data acquisition, database development, data analysis, data integration, and analysis of integrated data.
Data Acquisition.
Data acquisition is primarily concerned with accessing and storing data generated directly off of laboratory instruments. Many of these instruments are either automated or semi-automated high-throughput instruments that generate large volumes of data. The Human Genome Project utilized hundreds of DNA sequencers, producing enormous amounts of data. The data had to be captured in the appropriate format, and it had to be capable of being linked to all the information related to the DNA samples, such as the species, tissue type, and quality parameters used in the experiments. This area of bioinformatics primarily relates to the use of "laboratory information management systems," which are the computer systems used to manage the information needs of a particular laboratory.
Database Development.
Many laboratories generate large volumes of such data as DNA sequences, gene expression information, three-dimensional molecular structure, and high-throughput screening. Consequently, they must develop effective databases for storing and quickly accessing data. For each type of data, it is likely that a different database organization must be used. A database must be designed to allow efficient storage, search, and analysis of the data it contains. Designing a high-quality database is complicated by the fact that there are several formats for many types of data and a wide variety of ways in which scientists may want to use the data. Many of these databases are best built using a relational database architecture, often based on Oracle or Sybase.
A strong background in relational databases is a fundamental requirement for working in database development. Having some background in the molecular biology techniques used to generate the data is also important. Most critical for the bioinformatics specialist is to have a strong working relationship with the researchers who will be using the database and the ability to understand and interpret their needs into functional database capabilities.
Data Analysis.
Being able to analyze data efficiently requires having a good database design, allowing researchers to query the database effectively and letting them quickly obtain the types of information they need to begin their data analysis. If queries cannot be performed, or if performance is tediously slow, the whole system breaks down, since scientists will not be inclined to use the database. Once data is obtained from the database, the user must be able to easily transform it into the format appropriate for the desired analysis tools.
This can be challenging, since researchers often use a combination of publicly available tools, tools developed in-house, and third-party commercial tools. Each tool may have different input and output formats. Starting in the late 1990s, there have been both commercial and in-house efforts at pharmaceutical and biotech companies to reduce the formatting complexities. Such simplification efforts focus on building analysis systems with a number of tools integrated within them such that the transfer of data between tools appears seamless to the end user.
Bioinformatics analysts have a broad range of opportunities. They may write specific algorithms to analyze data, or they may be expert users of analysis tools, helping scientists understand how the tools analyze the data and how to interpret results. A knowledge of various programming languages, such as Java, PERL, C, C++, and Visual Basic, is very useful, if not required, for those working in this area.
Data Integration.
Once information has been analyzed, a researcher often needs to associate or integrate it with related data from other databases. For example, a scientist may run a series of gene expression analysis experiments and observe that a particular set of 100 genes is more highly expressed in cancerous lung tissue than in normal lung tissue. The scientist might wonder which of the genes is most likely to be truly related to the disease. To answer the question, the researcher might try to find out more information about those 100 genes, including any associated gene sequence, protein, enzyme, disease, metabolic pathways, or signal transduction pathway data.
Such information will help the researcher narrow the list down to a smaller set of genes. Finding this information, however, requires connections or links between the different databases and a good way to present and store the information. An understanding of database architectures and the relationship between the various biological concepts in the databases is key to doing effective data integration.
Analysis of Integrated Data.
Once various types of data are integrated, users need a good way to present these various pieces of data so they can be interpreted and analyzed. The information should be capable of being stored and retrieved so that, over time, various pieces of information can be combined to form a "knowledge base" that can be extended as more experiments are run and additional data are integrated from other sources. This type of work requires skills related to database design and architecture. It also requires specific programming skills in various computer languages, as well as expertise in developing interfaces between a computer and its user.
see also Combinatorial Chemistry; Computational Biologist; Evolution of Genes; Genomics; Genomics Industry; High-Throughput Screening; Human Genome Project; Pharmacogenetics and Pharmacogenomics; Proteins; Proteomics; Sequencing DNA.
Bioinformatics
Bioinformatics is a new field that centers on the development and application of computational methods to organize, integrate, and analyze gene -related data. The Human Genome Project (HGP) was an international effort to determine the deoxyribonucleic acid (DNA) base sequence of the entire human genome, which includes about thirty thousand protein -encoding genes, their regulatory elements, and many highly repeated noncoding sections. In 1985, a group of visionary scientists led by Charles DeLisi, who was then the director of the office of health and environmental research at the U.S. Department of Energy (DOE), realized that having the entire human genome in hand would provide the foundation for a revolution in biology and medicine. As a result, the 1988 presidential budget submission to U.S. Congress requested funds to start the HGP. Momentum built quickly and by 1990, DOE and the U.S. National Institutes of Health had laid out plans for a fifteen-year project. An international public consortium and a private company announced completion of a rough draft of the human genome sequence on June 26, 2000, with papers describing the data published eight months later. This is the first generation bestowed with the "parts list" of life, as well as the daunting task of making sense out of it.
Data Management
The Human Genome Project and other genome projects have generated massive data on genome sequences, disease-causing gene variants, protein three-dimensional structures and functions, protein-protein interactions, and gene regulation. Bioinformatics is closely tied to two other new fields: genomics (identification and functional characterization of genes in a massively parallel and high-throughput fashion) and proteomics (analysis of the biological functions of proteins and their interactions), which have also resulted from the genome projects. The fruits of the HGP will have major impacts on understanding evolution and developmental biology, and on scientists' ability to diagnose and treat diseases. Areas outside of traditional biology, such as anthropology and forensic medicine, are also embracing genome information.
Knowing the sequence of the billions of bases in the human genome does not tell scientists where the genes are (about 1.5 percent of the human genome encodes protein). Nor does it tell scientists what the genes do, how genes are regulated, how gene products form a cell, how cells form organs, which mutations underlie genetic diseases, why humans age, and how to develop drugs. Bioinformatics, genomics, and proteomics try to answer these questions using technologies that take advantage of as much gene sequence information as possible. In particular, bioinformatics focuses on computational approaches.
Bioinformatics includes development of databases and computational algorithms to store, disseminate, and rapidly retrieve genomic data. Biological data are complex and abundant. For example, the U.S. National Center for Biotechnology Information (NCBI), a division of the National Institutes of Health, houses central databases for gene sequences (GenBank), disease associations (OMIM), and protein structure (MMDB), and publishes biomedical articles (PubMed). The best way to get a feeling for the magnitude and variety of the data is to access the homepage of NCBI via the World Wide Web (http://ncbi.nlm.nih.gov). A bioinformatics team at NCBI works on the design of the databases and the development of efficient algorithms for retrieving data and comparing DNA sequences.
Applications
Bioinformatics also covers the design of genomics and proteomics experiments and subsequent analysis of the results. For instance, disease tissues (such as those from cancer patients) express different sets of proteins than their normal counterparts. Therefore protein abundance can be used to diagnose diseases. Moreover, proteins that are highly (or uniquely) expressed in disease tissues may be potential drug targets.
Genomics and proteomics generate protein abundance data using different approaches. Genomics determines gene abundance (which is a good indicator of protein abundance) using DNA microarrays, also known as DNA chips, which are high-density arrays of short DNA sequences, each recognizing a particular gene. By hybridizing a tissue sample to a DNA chip, one can determine the activities of many genes in a single experiment. The design of DNA chips—that is, which gene fragments to use in order to achieve maximum sensitivity and specificity, as well as how to interpret the results of DNA chip experiments—are difficult problems in bioinformatics.
Proteomics measures protein abundance directly using mass spectroscopy , which is a way to measure the mass of a protein. Since mass is not unique enough for identifying a protein, one usually cuts the protein with enzymes (that cut at specific places according to the protein sequence) and measures the masses of the resulting fragments using mass spectroscopy. Such "mass distributions" for all proteins with known sequences can be generated using computers and stored. By comparing the mass distribution of an unknown protein sample to those of known proteins, one can identify the sample. Such comparisons require complex computational algorithms, especially when the sample is a mixture of proteins. Although not as efficient as DNA chips, mass spectroscopy can directly measure protein abundance. In fact, spectrometric identification of proteins has been the one of the most significant advances in proteomics.
Bioinformatics can lead to discovery of new proteins. When the cystic fibrosis gene (CF) was first identified in 1989, for example, researchers compared its DNA sequence computationally to all sequences known at that time. The comparison revealed striking homology (sequence similarity) to a large family of proteins involved in active transport across cell membranes. Indeed, the CF gene encodes a membrane-spanning chloride ion channel, called the cystic fibrosis transmembrane regulator, or CFTR. The identification of gene function by searching for sequence homology is a widely used bioinformatics method. When no homology is found, one may still be able to tell if a gene codes for membrane-spanning channels using computational tools. Membranes are bilayers of lipid molecules, which are water insoluble. An ion channel typically has regions outside the membrane (water soluble) and regions inside the membrane (water insoluble) arranged in a certain pattern. Computer algorithms have been developed to capture such patterns in a gene sequence.
By thinking boldly and by setting ambitious goals, the Human Genome Project has brought about a new era in biological and biomedical research. Many revolutionarily new technologies are being developed, most of which have significant computational components. The avalanche of genomic data also enables model-based reasoning. The bright future of bioinformatics calls for individuals who can think quantitatively and in the meantime love biology—an unusual combination.
bioinformatics The collection, storage, and analysis of DNA- and protein-sequence data using computerized systems. Much of the data generated by genome sequencing projects and protein studies is held in various databanks and made available to researchers throughout the world via the Internet. Many computer programs have been developed to analyse sequence data, enabling the user to identify similarities between newly sequenced material and existing sequences. This allows, for example, predictions about the structure and function of a protein from its amino-acid sequence data or from the nucleotide sequence of its gene. Genome-wide sequence analysis of newly discovered organisms, especially bacteria or protoctists, indicates the array of proteins they are likely to manufacture, and therefore the kind of lifestyle they are likely to lead. Also, comparisons between genomes of different species provides information about their possible evolutionary relationships.
Bioinformatics and computational biology
Bioinformatics, or computational biology, refers to the development of new database methods to store geno-mic information, computational software programs, and methods to extract, process and evaluate this information, and the refinement of existing techniques to acquire the genomic data. Finding genes and determining their function, predicting the structure of proteins and RNA (ribonucleic acid) sequences from the available DNA (deoxyribonucleic acid) sequence, and determining the evolutionary relationship of proteins and DNA sequences are also part of bioinformatics.
The genome sequences of some bacteria, yeast, a nematode, the fruit fly Drosophila, and several plants have been obtained in the recent past, with many more sequences having been completed or nearing completion. Although work continues in order to refine the data, the initial sequencing (a rough draft) of the human genome was completed in 2000. It was announced in April 2003 that the complete genome sequence was completed. In May 2006, the sequence of the last chromosome was published in the journal Nature. Although publicly stated that the Human Genome Project has been completed, work continues. As of 2005, the number of genes in the human genome was re-stated as 20,000 to 25,000, down from the estimated number of 30,000 to 40,000. Experts predict that it will take geneticists several more years before a precise number can be given.
In addition, to this accumulation of nucleotide sequence data, elucidation of the three-dimensional structure of proteins coded for by the genes has been accelerating. The result is a vast ever-increasing amount of databases and genetic information. The efficient and productive use of this information requires the specialized computational techniques and software. Bioinformatics has developed and grown from the need to extract and analyze the reams of information pertaining to genomic information like nucleotide sequences and protein structure.
Bioinformatics utilizes statistical analysis, step-wise computational analysis and database management tools in order to search databases of DNA or protein sequences to filter out background from useful data and enable comparison of data from diverse databases. This sort of analysis is ongoing. The exploding number of databases, and the various experimental methods used to acquire the data, can make comparisons tedious to achieve. However, the benefits can be enormous. The immense size and network of biological databases provides a resource to answer biological questions about mapping, gene expression patterns, molecular modeling, molecular evolution, and to assist in the structural-based design of therapeutic drugs.
Obtaining information is a multi-step process. Databases are examined, or browsed, by posing complex computational questions. Researchers who have derived a DNA or protein sequence can submit the sequence to public repositories of such information to see if there is a match or similarity with their sequence. If so, further analysis may reveal a putative structure for the protein coded for by the sequence as well as a putative function for that protein. Four primary databases, those containing one type of information (only DNA sequence data or only protein sequence data), currently available for these purposes are the European Molecular Biology DNA Sequence Database (EMBL), GenBank, SwissProt and the Protein Identification Resource (PIR). Secondary databases contain information derived from other databases. Specialist databases, or knowledge databases, are collections of sequence information, expert commentary, and reference literature. Finally, integrated databases are collections (amalgamations) of primary and secondary databases.
The area of bioinformatics concerned with the derivation of protein sequences makes it conceivable to predict three-dimensional structures of the protein molecules, by use of computer graphics and by comparison with similar proteins, which have been obtained as a crystal. Knowledge of structure allows the site(s) critical for the function of the protein to be determined. Subsequently, drugs active against the site can be designed, or the protein can be utilized to enhance commercial production processes, such as in pharmaceutical bioinformatics.
Bioinformatics also encompasses the field of comparative genomics. This is the comparison of functionally equivalent genes across species. A yeast gene is likely to have the same function as a worm protein with the same amino acid. Alternately, genes having similar sequence may have divergent functions. Such similarities and differences will be revealed by the sequence information. Practically, such knowledge aids in the selection and design of genes to instill a specific function in a product to enhance its commercial appeal.
The most widely known example of a bioinformatics driven endeavor is the Human Genome Project (HGP, which has been mentioned earlier). Charles DeLisi, who at the time was Director of the Health and Environmental Research Programs, under the U.S. Department of Energy (DOE), began the HGP in 1986. The project was formally established in the United States in 1990 as a joint project of the DOE and the U.S. National Institutes of Health. International cooperation occurred among geneticist from the United States, Japan, Germany, France, and the United Kingdom. Work related to the Human Genome Project has allowed dramatic improvements worldwide in molecular biological techniques and improved computational tools for studying genomic function.
See also Chromosome mapping; Deoxyribonucleic acid (DNA); Genetic engineering; Genetic testing; Genome; Molecular biology; Proteomics; Ribonucleic acid (RNA).
Bioinformatics: History, Coverage, Components and Applications
Read this article to learn about the history, coverage, components and applications of bioinformatics.
The bioinformatics covers many specialized and advanced areas of biology. Such areas are: (1) Functional Genomics (2) Structural Genomics (3) Comparative Genomics (4) DNA Microarrays and (5) Medical Informatics.
Bioinformatics is the combination (or marriage) of biology and information technology. Basically, bioinformatics is a recently developed science using information to understand biological phenomenon. It broadly involves the computational tools and methods used to manage, analyse and manipulate volumes and volumes of biological data.
Bioinformatics may also be regarded as a part of the computational biology. The latter is concerned with the application of quantitative analytical techniques in modeling and solving problems in the biological systems. Bioinformatics is an interdisciplinary approach requiring advanced knowledge of computer science, mathematics and statistical methods for the understanding of biological phenomena at the molecular level.
History and Relevance of Bioinformatics:
The term bioinformatics was first introduced in 1990s. Originally, it dealt with the management and analysis of the data pertaining to DNA, RNA and protein sequences. As the biological data is being produced at an unprecedented rate, its management and interpretation invariably requires bioinformatics. Thus, bioinformatics now includes many other types of biological data.
Some of the most important ones are listed below:
i. Gene expression profiles
ii. Protein structure
iii. Protein interactions
iv. Microarrays (DNA chips)
v. Functional analysis of biomolecules
vi. Drug designing.
Bioinformatics is largely (not exclusively) a computer-based discipline. Computers are in fact very essential to handle large volumes of biological data, their storage and retrieval. We have to accept the fact that there is no computer on earth (however advanced) which can store information, and perform the functions like a living cell. Thus a highly complex information technology lies right within the cells of an organism. This primarily includes the organism’s genes and their dictates for the organism’s biological processes and behaviour.
Broad Coverage of Bioinformatics:
Bioinformatics covers many specialized and advanced areas of biology.
Functional genomics:
Identification of genes and their respective functions.
Structural genomics:
Predictions related to functions of proteins.
Comparative genomics:
For understanding the genomes of different species of organisms.
DNA microarrays:
These are designed to measure the levels of gene expression in different tissues, various stages of development and in different diseases.
Medical informatics:
This involves the management of biomedical data with special referee to biomolecules, in vitro assays and clinical trials.
Components of Bioinformatics:
Bioinformatics comprises three components:
1. Creation of databases:
This involves the organizing, storage and management the biological data sets. The databases are accessible to researchers to know the existing information and submit new entries, e.g. protein sequence data bank for molecular structure. Databases will be of no use until analysed.
2. Development of algorithms and statistics:
This involves the development of tools and resources to determine the relationship among the members of large data sets e.g. comparison of protein sequence data with the already existing protein sequences.
3. Analysis of data and interpretation:
The appropriate use of components 1 and 2 (given above) to analyse the data and interpret the results in a biologically meaningful manner. This includes DNA, RNA and protein sequences, protein structure, gene expression profiles and biochemical pathways.
Bioinformatics and the Internet:
The internet is an international computer network. A computer network involves a group of computers that can communicate (usually over a telephone system) and exchange data between users. It is the internet protocol (IP) that determines how the packets of information are addressed and routed over the network. To access the internet, a computer must have the correct hardware (modem/ network card), appropriate software and permission for access to network. For this purpose, one has to subscribe to an internet service provider (ISP).
World Wide Web (www):
www involves the exchange of information over the internet using a programme called browser. The most widely used browsers are Internet explorer and Netscape navigator.
Applications of Bioinformatics:
The advent of bioinformatics has revolutionized the advancements in biological science. And biotechnology is largely benefited by bioinformatics. The best example is the sequencing of human genome in a record time which would not have been possible without bioinformatics.
A selected list of applications of bioinformatics is given below:
i. Sequence mapping of biomolecules (DNA, RNA, proteins).
ii. Identification of nucleotide sequences of functional genes.
iii. Finding of sites that can be cut by restriction enzymes.
iv. Designing of primer sequence for polymerase chain reaction.
v. Prediction of functional gene products.
vi. To trace the evolutionary trees of genes.
vii. For the prediction of 3-dimensional structure of proteins.
viii. Molecular modelling of biomolecules.
ix. Designing of drugs for medical treatment.
x. Handling of vast biological data which otherwise is not possible.
xi. Development of models for the functioning various cells, tissues and organs.
The above list of applications however, may be treated as incomplete, since at present there is no field in biological sciences that does not involve bioinformatics.
Margaret Oakley Dayhoff has been called the “mother and father of bioinformatics” as she was a pioneer of applying mathematics and computational methods to biochemistry.
Educated as a physical chemist she had computers complete her data analysis of theoretical chemistry using punch card machines to calculate the resonance energies of several polycyclic organic molecules. She completed this work as part of her doctoral thesis at Columbia University, following this up with post-doctoral research at the Rockefeller Institute and the University of Maryland.
In 1959 she joined the National Biomedical Research Foundation and shortly thereafter began developing tools to aid protein chemist in determination of amino acid sequences by automatically overlapping the sequences of peptides. Seeing the need for a database of nucleic acids she began collecting protein sequences in the Atlas of Protein Sequence and Structure, publishing the book in 1965 which was followed by several republished editions.
The Atlas was organized by gene families of which Dayhoff was considered to be a pioneer in their recognition. She also developed the first on-line database system, a sequence database, that could be accessed by telephone line for use by remote computers in 1980. In an attempt to reduce the size of data files used by sequencing she developed a one-letter code for amino acids that was accepted by the International Union of Pure and Applied Chemists.
As part of her work with amino acids she originated one of the first substitution matrices, Point Accepted Mutations which is the replacement of a single amino acid in the primary structure of a protein with another single amino acid, accepted by the process of natural selection.
As an active member of the Biophysical Society she served as the first female officer of the group, including as president. Because of her work in bioinformatics as well as mentoring women working in scientific areas the Biophysical Society create an award in her honor that is given to a promising young woman to encourage her to enter a career in scientific research.
Bioinformatics is the application and development of methods from computer science to solve challenges within molecular biology and medicine. Modern molecular biology generates large amounts of data and is therefore highly dependent on advanced computer science. For example, today it is possible to map the genome (the combined genetic material) for a given individual. Increasingly better bioinformatics approaches have to be developed to organize and analyze such data, and to find connections between genes, lifestyle and diseases. Bioinformatics is also central in the study of how proteins function in an organism and in the development of new medicines, for example through the discovery and optimization of new enzymes.
What Is Bioinformatics?
Bioinformatics is the application of information technology to the study of living things, usually at the molecular level. Bioinformatics involves the use of computers to collect, organize and use biological information to answer questions in fields like evolutionary biology. Continue reading for more information about the applications of bioinformatics. Schools offering Bioinformatics degrees can also be found in these popular choices.
Bioinformatics Overview
Over the past decades, the quantity and quality of biological information has skyrocketed, largely because of advances in molecular biology and genomic technology. The Bioinformatics Organization reports that bioinformatics is used to develop databases, like the Human Genome Project, that store, organize and index biological information for analysis.
The value of bioinformatics goes beyond the scientific community. This field allows scientists to create comprehensive databases of biological and health information that can be used to test theories and generate solutions to medical problems that affect us all. The National Center for Biotechnology Information reports that there are three main scientific applications of bioinformatics. These are described below.
Evolutionary Biology
Evolutionary biology looks at the molecules of different organisms and determines whether they share a common evolutionary history. This process has the potential to uncover relationships between life forms never considered before. By using bioinformatics to track this data, evolutionary biologists can gain new insights into the causes of and cures for various diseases.
Protein Modeling
Proteins have specific functions in our bodies determined by DNA sequences. Using bioinformatic techniques, scientists can test theories about how various proteins interact. These tests may help scientists understand how diseases develop in living organisms.
Genome Mapping
Genome mapping is another bioinformatic technique used for scientific research. Computerized genomic maps make it easier for scientists to locate genes, and this increased efficiency results in higher productivity and greater scientific advancements. Due to this development in bioinformatics, scientists can spend less time on the painstaking mapping process and more time testing their hypotheses.
How to Study Bioinformatics
Bioinformatics is typically studied at the graduate level. However, some bachelor's degree programs in relevant fields, like bioengineering, computer science, biology and chemistry, offer a specialization in bioinformatics. Master's degree programs in bioinformatics can prepare graduates for applied research or consulting jobs, and PhD graduates can seek a range of research jobs, as well as university-level teaching positions.
Core bioinformatics courses may include molecular biology, probability, statistics, computing and informatics, while advanced courses may cover population genetics, molecular genomic and epigenomic data analysis, biological mathematical modeling, biostatistics, sustainability mathematics and computational neuroscience. Graduate degree programs typically require laboratory work, research, an internship and a thesis or dissertation.
To continue researching, browse degree options below for course curriculum, prerequisites and financial aid information. Or, learn more about the subject by reading the related articles below: