Artificial Intelligence techniques in Bioinformatics


It is widely recognized that the field of biology is in the midst of a “data explosion”. In recent years, the discipline of \emph{bioinformatics} has allowed biologists to make full use of the advances in computer science and computational statistics in analysing the data. However, as the volume of data grows, the techniques used must
become more sophisticated to cater for large-scale data and noise. There are problems in bioinformatics and many other sciences that cannot be solved satisfactorily even with the fastest computers. Clearly, a more “intelligent” approach is required to solve these increasingly difficult problems. Artificial intelligence methods are often based on the ways in which humans solve search and optimization problems, or how nature has solved its own problems, for example by using the principles of “survival of the fittest” in evolutionary computation.

What is Bioinformatics?

The National Center for Biotechnology Information defines bioinformatics as: “Bioinformatics is the field of science in which biology, computer science, and information technology merge into a single discipline. There are three important sub-disciplines within bioinformatics: the development of new algorithms and statistics with which to assess relationships among members of large data sets; the analysis and interpretation of various types of data including nucleotide and amino acid sequences (DNA), protein domains, and protein structures; and the development and implementation of tools that enable efficient access and
management of different types of information.” +++ Artificial Intelligence and Computer Sciece
in Bioinformatics One of the most fundamental tasks in computer science is search. Many problems can be converted into search
problems, including the simple problem of adding two numbers, such as 2 + 2. The search representation of this problem
is whether there exists a number (in this case, 4) that can be reached from the original statement of the problem. To determine an alignment between two DNA sequences can also be regarded as a search problem: given the starting point of two sequences, find a solution that minimizes as much as possible the differences between the two sequences. The development of search techniques received a major boost with the formalization of graph theory, with graphs being defined formally and precisely in terms of nodes and arcs that connect them.

Markov Networks

Hidden Markov Models (HMMs) can be described formally as a discrete dynamical system governed by a Markov chain that emits a sequence of observable outputs. They are useful for dealing with sequences.

Decision Trees

Decision trees are probably the most used technique. They have been used for a huge variety of applications in commerce and academia ranging from the sciences, through engineering to financial, commercial and risk-based applications. As with many techniques, the success of the identification tree approach is due partly to its simplicity and efficiency. The decision trees can be applied to huge data sets, that other algorithms cant process. Bioinformatics data (such as gene expression data) often has a vast number of variables (genes), so algorithms must be efficient given this level of complexity. The decision trees can perform tasks like Classification of cancer by using diagnosis data.

Neural Networks

Neural networks were originally conceived as computational models of the way in which the human brain works. Like the human brain, they consist of many units connected to each other by variable strength links (lie axons in the brain). The attraction of neural networks is that they can “learn” relationships between sets of variables taken from a system. Once trained, the network can then be shown new examples and asked to predict the outcome of the new data based on the previous examples it has learnt. Another feature of neural networks is that they can be trained by two methods: supervised and unsupervised learning. Supervised learning is often used in instances where the required output is known, unsupervised learning is used when this is not possible or desirable. An example of the use of neural networks in bioinformatics can be seen in the paper “Single-layer artificial neural networks for gene expression analysis”.


This text shows that artificial intelligence techniques can be aplicadadas to many fields, for which is very convenient to use better search techniques, such as decision trees or using learning techniques like neural networks. In this case I wanted to show the use of artificial intelligence in bioinformatics, because i think it’s a very interesting topic. It should be emphasized that the purpose of the text was to show that the techniques can be used in other fields, however, is not shown as used, since this was not the goal. It is shown that techniques such as Markov chains or neural networks are used in bioinformatics but nothing more. If the reader is interested in how to apply these methods, please consult the bibliography or investigate the huge world of internet.

Unless otherwise stated, the content of this page is licensed under Creative Commons Attribution-ShareAlike 3.0 License