Home Life depends on Information Technology

 

 

This image illustrates four things:

  • The alphabet of the DNA code: ATCG
  • The ingenious splitting of one DNA helix into two identical double strands, before the host cell splits into two
  • The wet separation technique used for identification of the DNA code stream
  • Nitrogen isotopes used to mark the new half-strands of new generations of DNA

Image derived from DNA From the Beginning 

The key to Life is DNA, although DNA is not the only element of Life. DNA stores the software running the life processes of an organism. The hosting hardware is a biochemical process. Actually, for an organism to be autonomous, its life process must run in a bounded host processor. For more than 3/4 of the history of Life, bounded meant single-cellular. Although multi-cellular organisms have been very successful during the last billion years, single-cellular organisms appear in enormous numbers, even if counted in terms of species.

Life processes are controlled by the software package and data streams residing in the DNA of the organism. This software package of DNA-encoded information is called the genome. It constitutes the digital "product definition" for the organism. DNA is a data medium that has been maintained and has evolved in a continuous biochemical process through at least 3 800 000 000 years. The evolution path for major variants of organisms constitutes the Tree of Life, in the sense of a binary tree of species, most of which are now extinct. 

Although biologists follow the Tree of Life backwards in time, they have not yet reached the root. Neither has anyone explained what seed made the Tree of Life grow. This situation is no more disturbing than the fact that we do not know what was the cause of Big Bang. For the time being, there is no reason to believe that the Tree of Life was planted on Earth. Our solar system was formed from dust and gases about 4 500 000 000 years ago. During the first billion years, there was heavy bombardment by rocks and comets, some the size of our Moon. Water may have been moved between Mars and Earth several times. Probably the first organisms were violently sifted in a process that was already Darwinian. If there ever was a branch of Life with left-handed DNA, it was obviously extinguished during that process. In this time scale, we humans and and our payload of bacteria are close relatives, sharing not only the basic architecture of DNA but also the basic process architecture. Organisms living 1000 metres down in the bedrock are specified in right-handed DNA but have very different metabolism.

The digital encoding of DNA is both amazingly simple and enormously complex. Each chromosome and each DNA molecule is effectively digital information, much like a data diskette. The multiplication of DNA is a chemically implemented digital copying process. Thus, Life is carried on by a digital information medium. This has been true for the last 3 500 000 000 years, while multi-cellular organisms have been around "only" for 1 000 000 000 years. There are single-cell organisms, the "product model" of which has been almost unchanged during the last 2 000 000 000 years, and there are species as young as 100 000 years. Modern Man is definitely at the recent end. The species Homo Sapiens is about 200 000 years old. 

DNA is a highly stable medium for data storage, although radiation and chemical agents may cause mutations. Actually, when Earth was shielded by an atmosphere, in order to avoid stagnation Nature had to invent new mechanisms to balance the perfection of the digital product definition and the digital copying of DNA strands. From all aspects, the sexual recombination of DNA seems to be the most interesting mechanism for variation. All mammals, and most other "higher" organisms like plants and animals, inherit half their genome from the mother and half from the father. (Mitochondria are embedded organelles - energy converters - inherited only via the egg cell. The chlorophyll organelles of plants are also generally inherited from the mother. Both kinds of organelles contain a small amount of DNA, which is highly stable over generations.)

The geological processes on Earth have called for adaption of species to new environment. Even more, competition between species called for reasonably fast evolution of new variants and new solutions. Fast means very different speed for different organisms. The speed of evolution is better expressed in generations needed than in years needed. A new species can typically develop in 1000 generations, faster in a restricted area with a small population, or under heavy pressure for change.   

If we exclude the most primitive viruses, all biological organisms carry their own "product definition" in the DNA of each cell of their body. Of course, a great number of organisms have single-cell bodies, but the previous statement is still true, from bacteria to the largest whales. Among others, this distributed "product definition" accounts for the self-repair abilities of biological organisms. Artificial products generally have no ability for self-repair, although the product definition may be completely digital.

In one human body, the number of cells is about 100 000 000 000 000 (10^14), roughly 16000 for each other person on Earth. Each cell carries a set of data volumes called chromosomes. A chromosome is logically much like a diskette, but the average data volume is 15 Mbytes. In comparison with DNA, artificial data media are incredibly bulky. This is true even when DNA is compared with media like CD-ROM and DVD. In its 46 chromosomes, each of the 10^14 cells in a human body carries a data volume of more than 700 Mbytes, that is more than a CD record. Thus, one person carries more digital data than all the disk farms and all the hard disks of all computers in the world.

In each chromosome, data is stored in a strand of DNA, a very large linear molecule. Linearly addressable that is - the DNA helix is physically coiled to reduce its length by 10000 times. Much like data on a rotating computer medium, the coiled but linearly addressable sequence of DNA is structured into logical modules. On a computer medium, modules would be called directories and files. Sequential files would be structured into segments by unique stop and start marks. A computer data medium includes unused areas - never used or no longer used. On the biological DNA medium, the effective areas are called genes. Apparently the DNA medium also contains unused areas - never used or no longer used areas. 

DNA is coded in base 4 with states ACGT equivalent to 0-3. For genes that control the synthesis of proteins, 20 different amino acids are identified with a triplet of base 4 numbers, equivalent to numbers 00-63, much like quartets of bits are used to store numbers 0-9 in binary coded decimal (BCD). Correspondingly, some unused combinations of DNA triplets are used as markers, much like BCD codes 10-15 are used for signs etc. 

The complete information carried by the DNA of an individual is called the genome of that individual. The genome nominally defines an individual organism, but the actual outcome is also determined by the [quality of the] processes developing that individual. The size of the human genome is not the largest known, although there is of course a rough correlation between the volume of the genome and the complexity of an organism. Some plants have many more chromosomes than the 46 chromosomes (23 pairs) defining a person.

The DNA molecule is a double helix, with two complementary half-strands. This is essential for the robustness of the copying process, when one cell divides into two. One half-strand will be promoted to each of the two new cells. Each new cell will get a newly built complementary strand as controlled by the inherited half-strand.

The stability of the digital copying process is extremely high. Still one can wonder how 10^14 good copies can be made. The reason is that the number of generations needed is much much lower than that figure. With ideal exponential growth of identical cells, less than 50 generations would create 10^14 cells. With specialized organs, the growth is slower, and cells are replaced during the 100 years a person survives. So, more realistically, the number of copy generations counts in hundreds and thousands. Still, it is not altogether surprising that the quality of the later copies is less secure. 

The present technology for identification of DNA sequences is actually a modulation-demodulation process, with chemical and optical conversion in a wet digital-analog-digital "modem". The speed of this "modem" is below 10 bits per minute, among others because of the wet separation in a gel. We will certainly se development in the art of decoding DNA. Remember that for transmission of computer data, the modem speed using a regular phone line has increased by a factor of 200 over 30 years. 

Finally it must be stated that although the encoding of DNA is essentially understood, we do not understand how the first DNA with replication ability came about. Also, we do not fully understand the "bootstrap" process that starts the growth of a multi-cellular organism from the first cell. There are indications, however, that the principles of nested processes remind of the operating system of a computer. Cloning of a new animal from a "parent" stem cell requires a "master reset" much like the hardware reset on computers. The biological reset is done by an enzyme.  


For further basic information about genetics and DNA, 
please refer also to the site DNA From the Beginning