Previous Page Table of Contents Next Page


13. Documentation of Plant Genetic Resources - R.L. Sapra


Introduction
Information systems
Data Base Management Systems (DBMS) and models
Databases, descriptors and descriptor states
Standards for data preparation
Documentation: PGR national efforts
Future considerations
Summary
References
Appendix I. List of catalogues published by NBPGR

Introduction

Plant genetic resources provide base material to plant breeders for the development of new and superior crop varieties. During the last three decades, a growing awareness has been witnessed to collect and conserve these fast depleting, irreplaceable resources for the good of the present and future generations. At the same time, it has been accepted that the success of the entire genetic resources activities is dependent upon the descriptive information of the conserved material which enables plant breeders to make decisions regarding the material to be used in breeding programmes. This dependence on information grows exponentially with the size of the collections.

Plant genetic resources management involves broadly five stages, viz. exploration and collection, characterization and evaluation, conservation, exchange and utilisation, and documentation. In addition, it is also concerned directly or indirectly with the plant quarantine. At each of the various stages in the above process, information about plant material is used for communication and decision making. It is estimated that the scientists and technicians spend at least 30 percent of their time in handling of data generated at various stages (Rogers et al., 1975). Documentation is, therefore, one of the most critical functions concerned with genetic resources.

During recent years, the term documentation is appropriately known as Information System. An information system is much more than simply documenting information. The information system has to be dynamic, vital and flexible and ensuring the reliability and integrity of the data and providing effective methods for their handling. In India, in view of the size of the country and because of the existing agricultural research system, there is a need to develop an information system based on a network approach including all the institutions throughout the country which are involved in germplasm collection, conservation and utilisation activities. In this chapter, the aspects of information handling, in general, and the efforts made by the NBPGR, and the future plans in this endeavour will be discussed.

Information systems

Hersh and Rogers (1975) have discussed information requirements for genetic resources application from the point of view of systems approach in which there is a dynamic dialogue between the system's design under consideration and the user. They have indicated how to establish such an approach for documentation for a global network of genetic resources centres. Some degree of progress was made towards this aim during the 1970s, when TAXIR (Taxonomic Information Retrieval) - a general purpose and computer assisted information system was developed at the Taximetrics Laboratory of the University of Colorado, USA. Later, EXIR (Executive Information Retrieval) system was evolved at the same University to meet specific needs of scientists involved in the data management of genebanks. These systems were used only in few genebanks in various places (Izmir, Turkey; Bari, Italy) because of difficulties in portability (especially the big size of computers required to run the programme) and have since been overtaken by fully supported commercial Database Management Systems (DBMS) available for a wide range of computers. The fast development of small sized and low priced personal computers (PCs) has supported or even enabled this development and revolutionised the entire information handling. Now a days, it has become a common practise to use computers in the management of information in genebanks. Because of wide range of options in software and hardware technology, it is not anymore required that everybody should use the same database management system to handle the genebank data as was the case some 15 years ago.

Presently, a good number of genebanks are operating in the entire world. Some of them have developed their own information system, fitting to their requirements and based on the availability of the computer system, and tailored Database Management System (DBMS) software used for other purposes. One of the front runners in the management of genetic resources data is certainly the Nordic Genebank at Weibullsholm Plant Breeding Institute in Sweden. The GRIN (Germplasm Resources Information Network) system developed in USA is quite capable of monitoring information on world's largest collection at the National Seed Storage Laboratory (NSSL), Fort Collins and the cooperating institutions within the USDA research system in USA. Similarly, IRRI, Philippines; CIMMYT, Mexico and several other International Agricultural Research Institutes have developed their own information systems for their respective mandate crops for handling the germplasm. The same holds for several national plant genetic resources programmes in addition to the ones already mentioned.

Yndgaard (1982) has suggested the following guidelines for the development of a genebank information system:

1. Ease of data input (registration) into a storage medium.
2. Data validation during input phase.
3. Flexible data storage and retrieval procedures.
4. Availability of data for multiple analysis and use.
5. Exchange of information with other genebanks.
6. Basing the system and its terminology on genetic and biological principles.
In addition to this, the system should be simple and user friendly thereby permitting the non-computer scientists to work with it. The system should also be economical and adoptable to the organisational environment.

Data Base Management Systems (DBMS) and models

One major factor affecting genetic resources information handling and exchange is the voluminous data associated with the germplasm collections. For instance, the National Plant Germplasm System (NPGS) in USA maintains over 400,000 accessions of germplasm in the form of seed and vegetatively propagated seeds. These accessions are primarily landraces and unimproved material from foreign sources. New accessions are added to the NPGS at the rate of 7,000 to 15,000 per year. IRRI, in Philippines, holds over 86,000 samples of rice. The data is alone recorded on over 75 traits generating around 6.4 million pieces of information to manage. The immense size of germplasm holdings becomes a challenge for the information management. Modern computers, with their ability to store vast amount of information and to retrieve the same with greater speed and accuracy have simplified the whole concept of database management. The DBMS is a software package that handles the difficult task associated with creating, accessing and maintaining database records. The programmes in a DBMS package establish an interface between the database itself and the users of database.

Two basic types of database management systems can be identified, namely, hierarchical and relational. A hierarchical system tends to be extremely complex because of superior-subordinate type of relationship between data elements in a hierarchical (tree) structure. In comparison to hierarchical structuring technique, the relational technique is much simpler. Data are represented in the form of two dimensional tables and the relationships can be established between these tables. Information contained in any two or more separate files (for example, one for the passport data and the other for the evaluation data) can be related or linked if there are common fields or descriptors in these tables or files. For instance, a unique identifier for each and every accession in germplasm collection e.g. the accession number, is such a common field. As a genebank preserves and provides genetic material for a multitude of purposes, the relational DBMS will be more appropriate in a genetic resources environment as it easily permits the establishment of linkages between different files and change of relationships at any moment.

There are probably a dozen DBMS packages (dBASE III PLUS, dBASE IV, FOXBASE, FOCUS, ORACLE, UNIFY, INGRESS, SYBASE, QBE, QUEL, RELATE/3000, etc.) currently marketed today. Most of these packages have simpler basic capabilities but some have additional features making the software more attractive. However, relational database management system, which is machine independent, globally accepted, easily maintained by data processing staff, has a good performance record, and has security measures for protecting data during hardware failure or abuse by unauthorized personnel could be selected. dBASE III PLUS or dBASE IV is an appropriate choice for a small to medium size databases and is used widely throughout the world. Oracle DBMS is certainly a powerful package in the relational technology and will be an appropriate choice for handling large databases.

Databases, descriptors and descriptor states


Information types

A database is a combination of related records, and a record is a combination of fields or what we call 'descriptors' in genetic resources environment. The 'descriptor' is now widely accepted as the computer term for the character of a plant as well as for other units of information such as country of origin or the date of collecting. The 'descriptor state' is then the quantity or the quality of the plant character, or any country name or abbreviations or the actual day, month and year of collecting. These descriptors can be further distinguished from the point of view of use and content.

Information types

A genetic resources centre generally handles germplasm samples with all information associated with it and this information can be broadly classified into four major categories depending upon their use.

Passport data

This category includes information on the site where sample has been collected, including ecological and habitat data, altitude and climate, etc.

Characterization, preliminary and further evaluation data

Morphological and evaluation data on various collections, such as the extent of the variability observed in the field, agro-botanical and economic characters, quality traits, and reaction to various diseases and pests, etc.

Conservation management data

This includes details of each sample stored in the genebank, quantity, its placement in the genebank, germination and viability percentage when stored, period of storage, to whom the parts of the samples were supplied in the past, rejuvenation date and next probable date for further replenishment of seed stocks.

Exchange data

This includes information related to import and export of germplasm for inventory control.

Standards for data preparation

In order to make an information system meaningful and more generally applicable, the data needs to be standardized in terms of terminology and measurement. All those involved in plant genetic resources work have recognised the need for an internationally accepted system to record, classify, communicate, correct or update information about accessions. In order to facilitate description of accessions, their evaluation and also to improve information and communication among scientists, there has been a continuing effort in several countries especially Germany to provide documentation standards acceptable to all interested individuals. The Thesaurus of terms is provided with exact (or nearly exact) equivalent for each term in some of the major languages, i.e., German, French, Spanish and Russian. By use of these terms, workers in various crops and various parts of the world can accurately describe their collections, and the results of their observations and tests, and be certain that the terminology will be understood by others.

The International Board for Plant Genetic Resources (IBPGR) has recognised the need for determining standards for data recording. The data standardization has made the communication of information easier and more effective. The IBPGR crop advisory committees have been developing minimum descriptors lists accompanied by standards for measurement techniques, units of measurement and data recording and encoding methods. The minimum descriptors list defines the minimum amount of information required to describe an accession. A minimum list of management descriptors for the management of genebank has been suggested as well (Konopka and Hanson, 1985).

The procedures for data preparation need careful consideration, as the future universal use of existing systems for data management will depend upon them. Each descriptor must have a clear definition so as to facilitate the meaningful exchange of information among the cooperating scientists. Before recording the data, the code dictionary should be prepared giving the detailed information for interpreting the coded data. Automatic data validation during the input phase essentially helps in ensuring the validity of the data in terms of permissible limits for each data item and whether they are of an alphabetical or numerical types.

Documentation: PGR national efforts


Plant introduction reporters and crop inventories
Quarantine information, check lists, etc.
Passport information
Herbarium information
Field evaluation and cataloguing
Genebank information
Other published information
Trainings on documentation and information management

The NBPGR, since its inception in 1976, is catering to the needs of the plant genetic resources community in the country as a national nodal service for all activities related to plant exploration and collection, evaluation and its proper conservation for present and the future use (see Chapter 14 for details). In addition, the Bureau has an important function to collect all available information regarding the genetic diversity and the same properly documented and disseminated to all concerned breeders/curators in the country as well as outside the country. As a part of such activities, the Institute has generated a large amount of information and developed a number of Crop Catalogues, Plant Introduction Reporters, Crop Inventories, Information Bulletins and Scientific Monographs for the benefit of PGR community.

Since 1980, a considerable progress has been made as a follow up action of the recommendations of the First National Workshop on Documentation of Plant Genetic Resources convened in that year. The manual system of data recording/processing is being gradually replaced by a computer based system. The Institute initiated a project 'Genetic Resources Information Programme (GRIP)' in 1986. This has aimed at the creation and development of a computer based information for the efficient management of the national plant genetic wealth. Some of the documentation activities undertaken by the Institute and the future plans in this direction are given below.

Plant introduction reporters and crop inventories

The introduction of germplasm from exotic sources for use by the breeders in India is one of the regular activity of the Bureau (see Chapter 4 for details). Since 1940, when the first accession was registered in the National Accession Register, NBPGR has introduced germplasm (including trial material) of over 900,000 samples. Each accession is given EC (Exotic Collection) number at the time of its entry, and the other details of information accompanying the accessions, viz. botanical name, original identification number/names, source country and address, recipient name and address, number of samples, and special notes, if any, of the accessions are recorded in the National Register. The entire information is compiled and a Plant Introduction Reporter (PIR) is published on quarterly basis for circulation to Indian scientists and cooperating agencies. This compilation is now partially computerised.

The information on the germplasm of different crop introduced into India is compiled from time to time and published as inventories.

Quarantine information, check lists, etc.

All plant introductions, when received, pass through plant quarantine and are assigned Import Quarantine (IQ) number. Country name, type of material, case number, consignee, quarantine history etc. are also recorded in import quarantine register along with the information on clearance/rejection, interceptions, salvaging techniques, post-quarantine treatment, etc. These records are maintained manually in the import register. Similar records, as stated for imports, are also maintained for material under export.

With a view to know beforehand the risks involved in introducing new pests and pathogens into the country while importing germplasm from exotic sources, check lists are prepared with the help of available literature. A number of such lists have been compiled by the Division of Plant Quarantine mainly for internal use. The procedure is completely manual.

Passport information

Site data sheets are used for recording information on a set of passport descripors viz. collector name and number, latin name of the crop, common/cultivar name, provenance data including latitude, longitude and altitude structure, habitat, information on pests and pathogens, soil colour and texture, and other special attributes, if any (see Chapter 3).

Each accession is assigned an IC (Indigenous Collection) number before it is passed on to the Evaluation Division. In previous years, the passport data information was documented manually in the form of Plant Collection Reporter and disseminated to user agencies. Recently, some progress has also been made to computerise such information.

Herbarium information

Variability in crop plants, their wild relatives and other economically important species are represented as dried plant specimens and seed samples in the National Herbarium of Cultivated Plants at the Bureau. Current holdings represent 2,200 species covering 950 genera and 180 families. Herbarium information is recorded for a set of descriptors e.g. collector number and name, botanical name, name of identifier, etc. The development of computer based herbarium information system is in progress.

Field evaluation and cataloguing

The Germplasm Evaluation Division is entrusted with the responsibilities of preliminary evaluation and seed increase, characterization, documentation, preparation of catalogues, etc. As regards further evaluation, a limited number of characters related to agronomic and production traits, resistance to diseases and pests were selected by different plant breeders in the past. However, in the recent past, due emphasis has been given to follow the IBPGR list of descriptors as far as possible for recording the data. Such evaluation data are available with various crop based institutes/coordinated projects and agricultural universities. Besides, NBPGR has generated considerable evaluation data on different crops and the information has been compiled and documented in the form of 48 crop catalogues (Appendix I). Some of these catalogues give in detail the complete listing of evaluation data alongwith the available passport information, the estimates of variability and correlation and regression parameters, frequency distribution in respect of quantitative as well as qualitative traits, and answers to certain simple and complex queries regarding the database on useful traits or combination of traits.

From 1986 onward, the Bureau started computer processing of the evaluation data. A programme called GEIS (Germplasm Evaluation Information System) was developed for handling the information on such data. For better management of the data files, 8 major groups of crops, viz. grain legumes, cereals and pseudo-cereals, oilseeds, millets and minor millets, vegetables, horticultural crops/plants, medicinal and aromatic plants, and miscellaneous crops have been formed. Programmes have been developed in dBASE III PLUS. Recently, 9 crop catalogues, namely, on indigenous and exotic maize, foxtail millet, kodo millet, rice, guar, oats, okra and forage sorghum have been compiled and published. The processing of the data for these catalogues was completely computer based.

Genebank information

NBPGR presently holds over 135,000 accessions of various crops which have been stored in the National Repository for long-term conservation. Data is maintained on some of the important descriptors, viz. crop name, genus and species, identification number, germination percentage, moisture content, month and year of storage, etc. The database is being developed for these management descriptors and the information is monitored periodically. Genebank labels are also printed using the database. Information on samples stored under cryopreservation is also maintained and monitored.

Other published information

The NBPGR regularly brings out Newsletters, Research Highlights and Annual Reports. These are source of useful information pertaining to all genetic resources activities undertaken and coordinated by the Bureau in association with its regional stations and the base centres.

Trainings on documentation and information management

Under the Genetic Resources Information Programme (GRIP) at NBPGR, the Institute has organised four computer appreciation courses in relation to database management of genetic resources since 1986 to train the scientists of this Institute as well as of the identified cooperating centres.

Future considerations

Database management activities are being further strengthened at the NBPGR under the USAID-PGR database management project. The project has been given high priority and sufficient funds have been allocated. It has been planned to equip all the 30 cooperating centres/data generating sites (see Chapter 14) with micro computers and necessary software. A mini computer system with a capacity of supporting more than 80 terminals will be located at the Bureau's new headquarters. The provision for the other infrastructure in terms of staff and other facilities, and training of the staff and the required consultancy has been made in the project. The development of hardware and software facilities at the headquarters and at the cooperating centres has begun and five of these centres have been equipped with Intel 80386 based micro computer systems supporting UNIX operating system. It is expected that the work regarding the creation of requisite data processing facilities will take 2 to 3 years.

As mentioned earlier, the NBPGR has plans for the development of PGR National Information System in the country. Since the National Information System has to cater to the needs of crop based institutes, coordinated projects, agricultural universities and other user agencies in India as well as abroad, a thorough system analysis is heeded in this respect in terms of identification of inputs and outputs of information and their generation and standardisation, information flow, security/protection against misuse, hardware and software, man power and expertise and communication facilities etc. Developing an on-line network of information is essentially a costly affair and requires considerable resources. However, the project can be initiated in a phased manner by adopting a policy of off-line network of information in the country and later on changing to on-line when all the cooperating institutions have developed their data processing and communication facilities. In India, there is specific need to develop a coordinated approach to perform this function eventually to facilitate smooth flow and effective dissemination of information.

Summary

The importance of a proper documentation system for the management of plant genetic resources has been realised for inventory purposes as well as for the utilisation of germplasm in breeding programmes. The chapter describes in brief, the earlier systems vis-a-vis present database management systems, descriptors and descriptor states, types of information and the need of standards for data preparation. The appropriateness of relational structuring techniques in the computer software, i.e., database management programmes, has been stressed for the development of information systems in a genetic resources environment owing to its ease and greater flexibility in the establishment of linkages among the tables/files at various points of time. The data standardisation must be treated as a pre-requisite for the development of effective information system. The significant achievements in Indian perspective, especially the efforts made by the NBPGR have been highlighted. The need to develop national information system based on a network approach, including all the institutions throughout the country which are involved in germplasm collection, conservation and utilisation, has been emphasized.

References

Hersh, G.N. and D.J. Rogers. 1975. Documentation and information requirements for genetic resources application, pp. 407-446. In Crop genetic resources for today and tomorrow (Eds., Frankel, O.H. and J.G. Hawkes). IBP 2, Cambridge University Press, Cambridge.

Konopka, J. and J. Hanson. 1985. Management data in a genebank, pp. 21-28. In Documentation of genetic resources information handling system for genebank management, IBPGR, Rome.

Rogers, D.J., B. Snoad and L. Seidewitz. 1975. Documentation and information requirements for genetic resources application, pp. 399-405. In Crop genetic resources for today and tomorrow (Eds., O.H. Frankel, and J.G. Hawkes). IBP 2, Cambridge University Press, Cambridge.

Yndgaard, Fleming. 1982. A documentation system for Nordic Genebank. IBPGR Plant Genetic Resources Newsletter. 49: 34-36.

Appendix I. List of catalogues published by NBPGR

Sl. No.

Crop name

Year of publication

No. of accessions

No. of descriptors

Remarks

1.

Amaranth (Amaranthus spp.)

1981

400

31


2.

Barley (Hordeum vulgare)

1983

259

35


3.

Barley (Hordeum vulgare)

1983

1155

27

CIMMYT material

4.

Barley (Hordeum vulgare)

1984

742

15

CIMMYT material

5.

Barley (Hordeum vulgare)

1985

217

15

CIMMYT/ICARDA material

6.

Barley (Hordeum vulgare)

1986

280

8

CIMMYT material

7.

Cowpea (Vigna unguiculata)

1981

707

34


8.

Cowpea (Vigna unguiculata)

1982

683

24


9.

Foxtail Millet (Setaria italica)

1987

736

52


10.

French bean (Phaseolus vulgaris)

1981

1773

16


11.

Guar (Cyamopsis tetragonoloba)

1981

1150

22


12.

Guar (Cyamopsis tetragonoloba)

1983

830

26


13.

Guar (Cyamopsis tetragonoloba)

1985

1540

24


14.

Guar (Cyamopsis tetragonoloba)

1989

1578

21

Delhi evaluation

1989

984

18

Jodhpur evaluation

15.

Kodo millet (Paspalum scrobiculatum)

1987

206

33

Shimla evaluation

Kodo millet (Paspalum scrobiculatum)

1987

186

33

Akola evaluation

16.

Lentil (Lens culinaris)

1982-83

240

14


17.

Maize (Zea mays)

1984

380

25

CIMMYT material

18.

Maize (Zea mays)

1985

768

12

CIMMYT material

19.

Maize (Zea mays)

1986-87

462

21

Delhi evaluation (1986)

Maize (Zea mays)

1986-87

144

19

Bhowali evaluation (1986)

20.

Maize (Zea mays)

1991

635

26

Delhi evaluation (1988)

Maize (Zea mays)

1991

304

19

Bhowali evaluation (1987)

Maize (Zea mays)

1991

581

19

Bhowali evaluation (1987)

Maize (Zea mays)

1991

230

19

Shillong evaluation (1989)\

21.

Moth bean (Vigna aconitifolia)

1980

285

17


22.

Moth bean (Vigna aconitifolia)

1981

848

17


23.

Moth bean (Vigna aconitifolia)

1983

829

20


24.

Mung bean (Vigna radiata)

1983

302

19


25.

Oats (Avena spp.)

1990

1000

81


26.

Okra (Abelmoschus esculentus)

1990

558

45


27.

Oleiferous Brassicae (Brassicae spp.)

1986

555

7


28.

Opium Poppy (Papaver somniferum)

1980

145

19


29.

Rice (Oryza saliva)

1988

102

56


30.

Safllower (Carthamus tinctorius)

1982

481

31


31.

Sesame (Sesamum indicum)

1982

297

22


32.

Sesame (Sesamum indicum)

1983

1393

29


33.

Sesbania (Sesbania spp.)

1982

54

31


34.

Soybean (Glycine max)

1983

2009

18


35.

Sunflower (Helianthus annuus)

1982

352

13


36.

Tomato (Lycopersicum esculentum)

1982

80

21


37.

Trigonella (Trigonella spp.)

1980

171

27


38.

Wheat and Triticale (Triticum aestivum, Triticale)

1982-83

1718

25

ICARDA material

39.

Wheat and Triticale (Triticum aestivum, Triticale)

1984

1979

14

ICARDA material

40.

Wheat and Triticale (Triticum aestivum, Triticale)

1984

2143

14

ICARDA & USA material

41.

Wheat and Triticale (Triticum aestivum, Triticale)

1986

1529

8

CIMMYT & USA material

42.

Wheat and Triticale (Triticum aestivum, Triticale)

1986-87

1718

8

CIMMYT & ICARDA material

43.

Wheat and Triticale (Triticum aestivum, Triticale)

1987-88

2797

8

CIMMYT & ICARDA material

44.

Wheat and Triticale (Triticum aestivum, Triticale)

1988-89

3592

8

CIMMYT & ICARDA material

45.

Wheat and Triticale (Triticum aestivum, Triticale)

1989-90

3339

8

CIMMYT & ICARDA material

46.

Wheat (Triticum spp.)

1983

568

25


47.

Winged bean (Psophocarpus tetragonolobus)

1983

1439

31


48.

Winged bean (Psophocarpus tetragonolobus)

1984

88

31


Cowpea (Vigna unguiculata)


259

23


Redgram (Cajanus cajan)


399

14


Horsegram (Macrotyloma uniflorum)


403

12


Chillies (Capsicum spp.)


102

9


Turmeric (Curcuma spp.)


113

22


Yam (Dioscorea spp.)


110

34


* A catalogue on Crop Genetic Resources.


Previous Page Top of Page Next Page