Introduction
Information systems
Data Base Management Systems (DBMS) and models
Databases, descriptors and descriptor states
Standards for data preparation
Documentation: PGR national efforts
Future considerations
Summary
References
Appendix I. List of catalogues published by NBPGR
Plant genetic resources provide base material to plant breeders for the development of new and superior crop varieties. During the last three decades, a growing awareness has been witnessed to collect and conserve these fast depleting, irreplaceable resources for the good of the present and future generations. At the same time, it has been accepted that the success of the entire genetic resources activities is dependent upon the descriptive information of the conserved material which enables plant breeders to make decisions regarding the material to be used in breeding programmes. This dependence on information grows exponentially with the size of the collections.
Plant genetic resources management involves broadly five stages, viz. exploration and collection, characterization and evaluation, conservation, exchange and utilisation, and documentation. In addition, it is also concerned directly or indirectly with the plant quarantine. At each of the various stages in the above process, information about plant material is used for communication and decision making. It is estimated that the scientists and technicians spend at least 30 percent of their time in handling of data generated at various stages (Rogers et al., 1975). Documentation is, therefore, one of the most critical functions concerned with genetic resources.
During recent years, the term documentation is appropriately known as Information System. An information system is much more than simply documenting information. The information system has to be dynamic, vital and flexible and ensuring the reliability and integrity of the data and providing effective methods for their handling. In India, in view of the size of the country and because of the existing agricultural research system, there is a need to develop an information system based on a network approach including all the institutions throughout the country which are involved in germplasm collection, conservation and utilisation activities. In this chapter, the aspects of information handling, in general, and the efforts made by the NBPGR, and the future plans in this endeavour will be discussed.
Hersh and Rogers (1975) have discussed information requirements for genetic resources application from the point of view of systems approach in which there is a dynamic dialogue between the system's design under consideration and the user. They have indicated how to establish such an approach for documentation for a global network of genetic resources centres. Some degree of progress was made towards this aim during the 1970s, when TAXIR (Taxonomic Information Retrieval) - a general purpose and computer assisted information system was developed at the Taximetrics Laboratory of the University of Colorado, USA. Later, EXIR (Executive Information Retrieval) system was evolved at the same University to meet specific needs of scientists involved in the data management of genebanks. These systems were used only in few genebanks in various places (Izmir, Turkey; Bari, Italy) because of difficulties in portability (especially the big size of computers required to run the programme) and have since been overtaken by fully supported commercial Database Management Systems (DBMS) available for a wide range of computers. The fast development of small sized and low priced personal computers (PCs) has supported or even enabled this development and revolutionised the entire information handling. Now a days, it has become a common practise to use computers in the management of information in genebanks. Because of wide range of options in software and hardware technology, it is not anymore required that everybody should use the same database management system to handle the genebank data as was the case some 15 years ago.
Presently, a good number of genebanks are operating in the entire world. Some of them have developed their own information system, fitting to their requirements and based on the availability of the computer system, and tailored Database Management System (DBMS) software used for other purposes. One of the front runners in the management of genetic resources data is certainly the Nordic Genebank at Weibullsholm Plant Breeding Institute in Sweden. The GRIN (Germplasm Resources Information Network) system developed in USA is quite capable of monitoring information on world's largest collection at the National Seed Storage Laboratory (NSSL), Fort Collins and the cooperating institutions within the USDA research system in USA. Similarly, IRRI, Philippines; CIMMYT, Mexico and several other International Agricultural Research Institutes have developed their own information systems for their respective mandate crops for handling the germplasm. The same holds for several national plant genetic resources programmes in addition to the ones already mentioned.
Yndgaard (1982) has suggested the following guidelines for the development of a genebank information system:
1. Ease of data input (registration) into a storage medium.In addition to this, the system should be simple and user friendly thereby permitting the non-computer scientists to work with it. The system should also be economical and adoptable to the organisational environment.
2. Data validation during input phase.
3. Flexible data storage and retrieval procedures.
4. Availability of data for multiple analysis and use.
5. Exchange of information with other genebanks.
6. Basing the system and its terminology on genetic and biological principles.
One major factor affecting genetic resources information handling and exchange is the voluminous data associated with the germplasm collections. For instance, the National Plant Germplasm System (NPGS) in USA maintains over 400,000 accessions of germplasm in the form of seed and vegetatively propagated seeds. These accessions are primarily landraces and unimproved material from foreign sources. New accessions are added to the NPGS at the rate of 7,000 to 15,000 per year. IRRI, in Philippines, holds over 86,000 samples of rice. The data is alone recorded on over 75 traits generating around 6.4 million pieces of information to manage. The immense size of germplasm holdings becomes a challenge for the information management. Modern computers, with their ability to store vast amount of information and to retrieve the same with greater speed and accuracy have simplified the whole concept of database management. The DBMS is a software package that handles the difficult task associated with creating, accessing and maintaining database records. The programmes in a DBMS package establish an interface between the database itself and the users of database.
Two basic types of database management systems can be identified, namely, hierarchical and relational. A hierarchical system tends to be extremely complex because of superior-subordinate type of relationship between data elements in a hierarchical (tree) structure. In comparison to hierarchical structuring technique, the relational technique is much simpler. Data are represented in the form of two dimensional tables and the relationships can be established between these tables. Information contained in any two or more separate files (for example, one for the passport data and the other for the evaluation data) can be related or linked if there are common fields or descriptors in these tables or files. For instance, a unique identifier for each and every accession in germplasm collection e.g. the accession number, is such a common field. As a genebank preserves and provides genetic material for a multitude of purposes, the relational DBMS will be more appropriate in a genetic resources environment as it easily permits the establishment of linkages between different files and change of relationships at any moment.
There are probably a dozen DBMS packages (dBASE III PLUS, dBASE IV, FOXBASE, FOCUS, ORACLE, UNIFY, INGRESS, SYBASE, QBE, QUEL, RELATE/3000, etc.) currently marketed today. Most of these packages have simpler basic capabilities but some have additional features making the software more attractive. However, relational database management system, which is machine independent, globally accepted, easily maintained by data processing staff, has a good performance record, and has security measures for protecting data during hardware failure or abuse by unauthorized personnel could be selected. dBASE III PLUS or dBASE IV is an appropriate choice for a small to medium size databases and is used widely throughout the world. Oracle DBMS is certainly a powerful package in the relational technology and will be an appropriate choice for handling large databases.
Information types
A database is a combination of related records, and a record is a combination of fields or what we call 'descriptors' in genetic resources environment. The 'descriptor' is now widely accepted as the computer term for the character of a plant as well as for other units of information such as country of origin or the date of collecting. The 'descriptor state' is then the quantity or the quality of the plant character, or any country name or abbreviations or the actual day, month and year of collecting. These descriptors can be further distinguished from the point of view of use and content.
A genetic resources centre generally handles germplasm samples with all information associated with it and this information can be broadly classified into four major categories depending upon their use.
Passport data
This category includes information on the site where sample has been collected, including ecological and habitat data, altitude and climate, etc.
Characterization, preliminary and further evaluation data
Morphological and evaluation data on various collections, such as the extent of the variability observed in the field, agro-botanical and economic characters, quality traits, and reaction to various diseases and pests, etc.
Conservation management data
This includes details of each sample stored in the genebank, quantity, its placement in the genebank, germination and viability percentage when stored, period of storage, to whom the parts of the samples were supplied in the past, rejuvenation date and next probable date for further replenishment of seed stocks.
Exchange data
This includes information related to import and export of germplasm for inventory control.
In order to make an information system meaningful and more generally applicable, the data needs to be standardized in terms of terminology and measurement. All those involved in plant genetic resources work have recognised the need for an internationally accepted system to record, classify, communicate, correct or update information about accessions. In order to facilitate description of accessions, their evaluation and also to improve information and communication among scientists, there has been a continuing effort in several countries especially Germany to provide documentation standards acceptable to all interested individuals. The Thesaurus of terms is provided with exact (or nearly exact) equivalent for each term in some of the major languages, i.e., German, French, Spanish and Russian. By use of these terms, workers in various crops and various parts of the world can accurately describe their collections, and the results of their observations and tests, and be certain that the terminology will be understood by others.
The International Board for Plant Genetic Resources (IBPGR) has recognised the need for determining standards for data recording. The data standardization has made the communication of information easier and more effective. The IBPGR crop advisory committees have been developing minimum descriptors lists accompanied by standards for measurement techniques, units of measurement and data recording and encoding methods. The minimum descriptors list defines the minimum amount of information required to describe an accession. A minimum list of management descriptors for the management of genebank has been suggested as well (Konopka and Hanson, 1985).
The procedures for data preparation need careful consideration, as the future universal use of existing systems for data management will depend upon them. Each descriptor must have a clear definition so as to facilitate the meaningful exchange of information among the cooperating scientists. Before recording the data, the code dictionary should be prepared giving the detailed information for interpreting the coded data. Automatic data validation during the input phase essentially helps in ensuring the validity of the data in terms of permissible limits for each data item and whether they are of an alphabetical or numerical types.
Plant introduction reporters and crop inventories
Quarantine information, check lists, etc.
Passport information
Herbarium information
Field evaluation and cataloguing
Genebank information
Other published information
Trainings on documentation and information management
The NBPGR, since its inception in 1976, is catering to the needs of the plant genetic resources community in the country as a national nodal service for all activities related to plant exploration and collection, evaluation and its proper conservation for present and the future use (see Chapter 14 for details). In addition, the Bureau has an important function to collect all available information regarding the genetic diversity and the same properly documented and disseminated to all concerned breeders/curators in the country as well as outside the country. As a part of such activities, the Institute has generated a large amount of information and developed a number of Crop Catalogues, Plant Introduction Reporters, Crop Inventories, Information Bulletins and Scientific Monographs for the benefit of PGR community.
Since 1980, a considerable progress has been made as a follow up action of the recommendations of the First National Workshop on Documentation of Plant Genetic Resources convened in that year. The manual system of data recording/processing is being gradually replaced by a computer based system. The Institute initiated a project 'Genetic Resources Information Programme (GRIP)' in 1986. This has aimed at the creation and development of a computer based information for the efficient management of the national plant genetic wealth. Some of the documentation activities undertaken by the Institute and the future plans in this direction are given below.
The introduction of germplasm from exotic sources for use by the breeders in India is one of the regular activity of the Bureau (see Chapter 4 for details). Since 1940, when the first accession was registered in the National Accession Register, NBPGR has introduced germplasm (including trial material) of over 900,000 samples. Each accession is given EC (Exotic Collection) number at the time of its entry, and the other details of information accompanying the accessions, viz. botanical name, original identification number/names, source country and address, recipient name and address, number of samples, and special notes, if any, of the accessions are recorded in the National Register. The entire information is compiled and a Plant Introduction Reporter (PIR) is published on quarterly basis for circulation to Indian scientists and cooperating agencies. This compilation is now partially computerised.
The information on the germplasm of different crop introduced into India is compiled from time to time and published as inventories.
All plant introductions, when received, pass through plant quarantine and are assigned Import Quarantine (IQ) number. Country name, type of material, case number, consignee, quarantine history etc. are also recorded in import quarantine register along with the information on clearance/rejection, interceptions, salvaging techniques, post-quarantine treatment, etc. These records are maintained manually in the import register. Similar records, as stated for imports, are also maintained for material under export.
With a view to know beforehand the risks involved in introducing new pests and pathogens into the country while importing germplasm from exotic sources, check lists are prepared with the help of available literature. A number of such lists have been compiled by the Division of Plant Quarantine mainly for internal use. The procedure is completely manual.
Site data sheets are used for recording information on a set of passport descripors viz. collector name and number, latin name of the crop, common/cultivar name, provenance data including latitude, longitude and altitude structure, habitat, information on pests and pathogens, soil colour and texture, and other special attributes, if any (see Chapter 3).
Each accession is assigned an IC (Indigenous Collection) number before it is passed on to the Evaluation Division. In previous years, the passport data information was documented manually in the form of Plant Collection Reporter and disseminated to user agencies. Recently, some progress has also been made to computerise such information.
Variability in crop plants, their wild relatives and other economically important species are represented as dried plant specimens and seed samples in the National Herbarium of Cultivated Plants at the Bureau. Current holdings represent 2,200 species covering 950 genera and 180 families. Herbarium information is recorded for a set of descriptors e.g. collector number and name, botanical name, name of identifier, etc. The development of computer based herbarium information system is in progress.
The Germplasm Evaluation Division is entrusted with the responsibilities of preliminary evaluation and seed increase, characterization, documentation, preparation of catalogues, etc. As regards further evaluation, a limited number of characters related to agronomic and production traits, resistance to diseases and pests were selected by different plant breeders in the past. However, in the recent past, due emphasis has been given to follow the IBPGR list of descriptors as far as possible for recording the data. Such evaluation data are available with various crop based institutes/coordinated projects and agricultural universities. Besides, NBPGR has generated considerable evaluation data on different crops and the information has been compiled and documented in the form of 48 crop catalogues (Appendix I). Some of these catalogues give in detail the complete listing of evaluation data alongwith the available passport information, the estimates of variability and correlation and regression parameters, frequency distribution in respect of quantitative as well as qualitative traits, and answers to certain simple and complex queries regarding the database on useful traits or combination of traits.
From 1986 onward, the Bureau started computer processing of the evaluation data. A programme called GEIS (Germplasm Evaluation Information System) was developed for handling the information on such data. For better management of the data files, 8 major groups of crops, viz. grain legumes, cereals and pseudo-cereals, oilseeds, millets and minor millets, vegetables, horticultural crops/plants, medicinal and aromatic plants, and miscellaneous crops have been formed. Programmes have been developed in dBASE III PLUS. Recently, 9 crop catalogues, namely, on indigenous and exotic maize, foxtail millet, kodo millet, rice, guar, oats, okra and forage sorghum have been compiled and published. The processing of the data for these catalogues was completely computer based.
NBPGR presently holds over 135,000 accessions of various crops which have been stored in the National Repository for long-term conservation. Data is maintained on some of the important descriptors, viz. crop name, genus and species, identification number, germination percentage, moisture content, month and year of storage, etc. The database is being developed for these management descriptors and the information is monitored periodically. Genebank labels are also printed using the database. Information on samples stored under cryopreservation is also maintained and monitored.
The NBPGR regularly brings out Newsletters, Research Highlights and Annual Reports. These are source of useful information pertaining to all genetic resources activities undertaken and coordinated by the Bureau in association with its regional stations and the base centres.
Under the Genetic Resources Information Programme (GRIP) at NBPGR, the Institute has organised four computer appreciation courses in relation to database management of genetic resources since 1986 to train the scientists of this Institute as well as of the identified cooperating centres.
Database management activities are being further strengthened at the NBPGR under the USAID-PGR database management project. The project has been given high priority and sufficient funds have been allocated. It has been planned to equip all the 30 cooperating centres/data generating sites (see Chapter 14) with micro computers and necessary software. A mini computer system with a capacity of supporting more than 80 terminals will be located at the Bureau's new headquarters. The provision for the other infrastructure in terms of staff and other facilities, and training of the staff and the required consultancy has been made in the project. The development of hardware and software facilities at the headquarters and at the cooperating centres has begun and five of these centres have been equipped with Intel 80386 based micro computer systems supporting UNIX operating system. It is expected that the work regarding the creation of requisite data processing facilities will take 2 to 3 years.
As mentioned earlier, the NBPGR has plans for the development of PGR National Information System in the country. Since the National Information System has to cater to the needs of crop based institutes, coordinated projects, agricultural universities and other user agencies in India as well as abroad, a thorough system analysis is heeded in this respect in terms of identification of inputs and outputs of information and their generation and standardisation, information flow, security/protection against misuse, hardware and software, man power and expertise and communication facilities etc. Developing an on-line network of information is essentially a costly affair and requires considerable resources. However, the project can be initiated in a phased manner by adopting a policy of off-line network of information in the country and later on changing to on-line when all the cooperating institutions have developed their data processing and communication facilities. In India, there is specific need to develop a coordinated approach to perform this function eventually to facilitate smooth flow and effective dissemination of information.
The importance of a proper documentation system for the management of plant genetic resources has been realised for inventory purposes as well as for the utilisation of germplasm in breeding programmes. The chapter describes in brief, the earlier systems vis-a-vis present database management systems, descriptors and descriptor states, types of information and the need of standards for data preparation. The appropriateness of relational structuring techniques in the computer software, i.e., database management programmes, has been stressed for the development of information systems in a genetic resources environment owing to its ease and greater flexibility in the establishment of linkages among the tables/files at various points of time. The data standardisation must be treated as a pre-requisite for the development of effective information system. The significant achievements in Indian perspective, especially the efforts made by the NBPGR have been highlighted. The need to develop national information system based on a network approach, including all the institutions throughout the country which are involved in germplasm collection, conservation and utilisation, has been emphasized.
Hersh, G.N. and D.J. Rogers. 1975. Documentation and information requirements for genetic resources application, pp. 407-446. In Crop genetic resources for today and tomorrow (Eds., Frankel, O.H. and J.G. Hawkes). IBP 2, Cambridge University Press, Cambridge.
Konopka, J. and J. Hanson. 1985. Management data in a genebank, pp. 21-28. In Documentation of genetic resources information handling system for genebank management, IBPGR, Rome.
Rogers, D.J., B. Snoad and L. Seidewitz. 1975. Documentation and information requirements for genetic resources application, pp. 399-405. In Crop genetic resources for today and tomorrow (Eds., O.H. Frankel, and J.G. Hawkes). IBP 2, Cambridge University Press, Cambridge.
Yndgaard, Fleming. 1982. A documentation system for Nordic Genebank. IBPGR Plant Genetic Resources Newsletter. 49: 34-36.
|
Sl. No. |
Crop name |
Year of publication |
No. of accessions |
No. of descriptors |
Remarks |
|
1. |
Amaranth (Amaranthus spp.) |
1981 |
400 |
31 |
|
|
2. |
Barley (Hordeum vulgare) |
1983 |
259 |
35 |
|
|
3. |
Barley (Hordeum vulgare) |
1983 |
1155 |
27 |
CIMMYT material |
|
4. |
Barley (Hordeum vulgare) |
1984 |
742 |
15 |
CIMMYT material |
|
5. |
Barley (Hordeum vulgare) |
1985 |
217 |
15 |
CIMMYT/ICARDA material |
|
6. |
Barley (Hordeum vulgare) |
1986 |
280 |
8 |
CIMMYT material |
|
7. |
Cowpea (Vigna unguiculata) |
1981 |
707 |
34 |
|
|
8. |
Cowpea (Vigna unguiculata) |
1982 |
683 |
24 |
|
|
9. |
Foxtail Millet (Setaria italica) |
1987 |
736 |
52 |
|
|
10. |
French bean (Phaseolus vulgaris) |
1981 |
1773 |
16 |
|
|
11. |
Guar (Cyamopsis tetragonoloba) |
1981 |
1150 |
22 |
|
|
12. |
Guar (Cyamopsis tetragonoloba) |
1983 |
830 |
26 |
|
|
13. |
Guar (Cyamopsis tetragonoloba) |
1985 |
1540 |
24 |
|
|
14. |
Guar (Cyamopsis tetragonoloba) |
1989 |
1578 |
21 |
Delhi evaluation |
|
1989 |
984 |
18 |
Jodhpur evaluation |
||
|
15. |
Kodo millet (Paspalum scrobiculatum) |
1987 |
206 |
33 |
Shimla evaluation |
|
Kodo millet (Paspalum scrobiculatum) |
1987 |
186 |
33 |
Akola evaluation |
|
|
16. |
Lentil (Lens culinaris) |
1982-83 |
240 |
14 |
|
|
17. |
Maize (Zea mays) |
1984 |
380 |
25 |
CIMMYT material |
|
18. |
Maize (Zea mays) |
1985 |
768 |
12 |
CIMMYT material |
|
19. |
Maize (Zea mays) |
1986-87 |
462 |
21 |
Delhi evaluation (1986) |
|
Maize (Zea mays) |
1986-87 |
144 |
19 |
Bhowali evaluation (1986) |
|
|
20. |
Maize (Zea mays) |
1991 |
635 |
26 |
Delhi evaluation (1988) |
|
Maize (Zea mays) |
1991 |
304 |
19 |
Bhowali evaluation (1987) |
|
|
Maize (Zea mays) |
1991 |
581 |
19 |
Bhowali evaluation (1987) |
|
|
Maize (Zea mays) |
1991 |
230 |
19 |
Shillong evaluation (1989)\ |
|
|
21. |
Moth bean (Vigna aconitifolia) |
1980 |
285 |
17 |
|
|
22. |
Moth bean (Vigna aconitifolia) |
1981 |
848 |
17 |
|
|
23. |
Moth bean (Vigna aconitifolia) |
1983 |
829 |
20 |
|
|
24. |
Mung bean (Vigna radiata) |
1983 |
302 |
19 |
|
|
25. |
Oats (Avena spp.) |
1990 |
1000 |
81 |
|
|
26. |
Okra (Abelmoschus esculentus) |
1990 |
558 |
45 |
|
|
27. |
Oleiferous Brassicae (Brassicae spp.) |
1986 |
555 |
7 |
|
|
28. |
Opium Poppy (Papaver somniferum) |
1980 |
145 |
19 |
|
|
29. |
Rice (Oryza saliva) |
1988 |
102 |
56 |
|
|
30. |
Safllower (Carthamus tinctorius) |
1982 |
481 |
31 |
|
|
31. |
Sesame (Sesamum indicum) |
1982 |
297 |
22 |
|
|
32. |
Sesame (Sesamum indicum) |
1983 |
1393 |
29 |
|
|
33. |
Sesbania (Sesbania spp.) |
1982 |
54 |
31 |
|
|
34. |
Soybean (Glycine max) |
1983 |
2009 |
18 |
|
|
35. |
Sunflower (Helianthus annuus) |
1982 |
352 |
13 |
|
|
36. |
Tomato (Lycopersicum esculentum) |
1982 |
80 |
21 |
|
|
37. |
Trigonella (Trigonella spp.) |
1980 |
171 |
27 |
|
|
38. |
Wheat and Triticale (Triticum aestivum, Triticale) |
1982-83 |
1718 |
25 |
ICARDA material |
|
39. |
Wheat and Triticale (Triticum aestivum, Triticale) |
1984 |
1979 |
14 |
ICARDA material |
|
40. |
Wheat and Triticale (Triticum aestivum, Triticale) |
1984 |
2143 |
14 |
ICARDA & USA material |
|
41. |
Wheat and Triticale (Triticum aestivum, Triticale) |
1986 |
1529 |
8 |
CIMMYT & USA material |
|
42. |
Wheat and Triticale (Triticum aestivum, Triticale) |
1986-87 |
1718 |
8 |
CIMMYT & ICARDA material |
|
43. |
Wheat and Triticale (Triticum aestivum, Triticale) |
1987-88 |
2797 |
8 |
CIMMYT & ICARDA material |
|
44. |
Wheat and Triticale (Triticum aestivum, Triticale) |
1988-89 |
3592 |
8 |
CIMMYT & ICARDA material |
|
45. |
Wheat and Triticale (Triticum aestivum, Triticale) |
1989-90 |
3339 |
8 |
CIMMYT & ICARDA material |
|
46. |
Wheat (Triticum spp.) |
1983 |
568 |
25 |
|
|
47. |
Winged bean (Psophocarpus tetragonolobus) |
1983 |
1439 |
31 |
|
|
48. |
Winged bean (Psophocarpus tetragonolobus) |
1984 |
88 |
31 |
|
|
Cowpea (Vigna unguiculata) |
|
259 |
23 |
|
|
|
Redgram (Cajanus cajan) |
|
399 |
14 |
|
|
|
Horsegram (Macrotyloma uniflorum) |
|
403 |
12 |
|
|
|
Chillies (Capsicum spp.) |
|
102 |
9 |
|
|
|
Turmeric (Curcuma spp.) |
|
113 |
22 |
|
|
|
Yam (Dioscorea spp.) |
|
110 |
34 |
|
* A catalogue on Crop Genetic Resources.