DPDD of bio info.

Distributed public domain databases (DPDD) of biological information on Internet:
An Introduction of a Color Image Database for Japanese Ants

2. Life sciences and computer networks

In the field of life sciences, where inductive reasoning is used, a number of examples of a given phenomenon (research observations) are required for a hypothesis to gain acceptance. For example, even an apparently simple hypothesis such as the theory of DNA being divided into genes and operons, does not assume scientific significance by itself. It become accepted only when individual examples (observations related to variations in a given species) to which the hypotheses (propositions, the laws of cause and effect) applies are known [1].
For this reason, the amount of information which needs to be understood tends to increase as research in life sciences advances. If this volume of information continues to increase, it will eventually exceed the amount that can be dealt with by any individual. Fortunately or unfortunately, the number of biological researchers was not large in the past, and the volume of biological information that accumulated was not enough to cause serious problems.
In the current age, the so-called "bio-age", however, vast amounts of information are being produced daily by researchers, and this is gradually becoming a problem. A flood of information, which cannot be adequately dealt with even by specialists in a particular field, has been occurring in recent years. Ayala et al. [2] predicted that it would be difficult to write a single comprehensive textbook dealing with all aspects of evolution, and Watson et al. [3] explained that the reason for the fourth edition of the Molecular Biology of the Gene being written by several authors, instead of a single author as in previous editions, was that the volume of information that now needed to be included in this title has become too large for a single author to deal with.
In the past, researchers were required to carry out experiments or make observations, to collect data and then to summarize this data, and publish it in the form of scientific papers. In recent years, however, these have not been the only requirements. Now, it is also essential for researchers to efficiently and effectively utilize the vast amounts of research information (not only the papers but also the underlying raw data) that are being accumulated. To this end, is important to prepare databases to make the vast amounts of information more manageable. Databases will become a key factor for the future progress of life science.

Are computer networks the researchers' helping hand?
In parallel to the changes mentioned above, the computer utilization technology has shown remarkable advances in recent years. Advances are particularly marked in the field of information exchange via computer networks such as the Internet [4]. The Internet was initially designed for national security, but later developed into a network for general scientific research (URL:gopher://akasha.tic.com:70/11/matrix/growth /internet). The number of Internet users has increased sharply in recent years, and researchers in various fields have begun to utilize the Internet in a number of different ways.
The services available on the Internet include electronic mail, news groups, mailing lists, newsletters, electronic journals and electronic libraries. These services can be divided into two types. One type basically supplements or reinforces existing media (e.g., letters, telephone, journals, etc.). Of this type of services, electronic journals and the exchange of pre-prints are expected not only to supplement existing media but eventually to replace such thinks as academic journals [5].
The second type are services which cannot be provided by existing media. This type includes the wide-area databases (WAD), which disseminate research information on computer networks so that it can be shared by many persons. The WAD are characterized by their extensive coverage, including not only information that can be handled by existing media (journals), but also other kinds of research information (e.g., detailed experimental or observation data which is difficult to report in papers, and other information which is not dealt with by existing media). The WAD are intended to disseminate all this information in the form of databases on the computer networks so that it can be shared by many persons.
The trend to produce WAD illustrated above is particularly marked in life sciences in which accumulation of information is indispensable. A well-known example is the genetic databases, which are intended to collect and disseminate information about the basic elements of life (e.g., the base sequences of DNA and amino acid sequences of proteins) so that it can help facilitate studies in this field. Since genetic information is easy to process on computers, databases of genetic information are easy to establish. Following recent advances in the data processing ability of computers, however, WAD have also been establish for types of information which had previously been difficult to treat electronically (e.g., pictures of chromosomes or other specimens, and pictures of electrophoretic separation of DNA). Several gopher servers, which summarize the available WAD are listed below:

1. Summary of gopher sites handling graphic information
Host = gopher.dana.affrc.go.jp
Path = /Other Gopher/ Images from Gopher
2. Summary of gopher sites for research materials
Host = ftp.bio.indiana.edu
Path = /Species
3. Summary of database search services
Host = merlot.welch.jhu.edu
Path = /Database-Searches
4. Summary of electronic journals
Host = gopher.cic.net
Path = /e-serials
5. Summary of network news groups related to biology
Host = net.bio.net
Path = /

Back ::: Forward

Japanese Journal of Computer Science Vol.2, No.1: pp.5-13
Copyright 1995 by The Myrmecological Society of Japan (for English version) and The Japanese Association of Computer Science (for Japanese version),