Needles in Haystacks: Getting to the Point Of Federal Records with Document Metadata and Electronic Document Management Systems

Owen Ambur, University of Maryland University College

November 23, 1996

Executive Summary

The right of the public to access to public information held by the Federal Government is set forth in the Freedom of Information Act, and the Electronic Freedom of Information Act Amendments of 1996 require that information be made available in electronic form. The Privacy Act restricts personal information from inappropriate release. The Paperwork Reduction Act contains various provisions to facilitate access to information, including authorization of the Government Information Locator Service (GILS). Under the Federal Records Act, proper documentation of agency activities must be created and maintained. The Information Technology Management Reform Act mandates interagency planning and the establishment of an on-line index of public information in electronic form, and the Government Performance and Results Act requires agencies to document program outcomes and link them to budgetary inputs.

The Black Forest Group has delineated the requirements for an enterprisewide electronic document management system (EDMS), and the Document Management Alliance (DMA) is developing standards for interoperability among open-systems EDMSs. A subgroup of the DMA has issued the Open Document Management Application Program Interface (ODMA) standard, providing for interoperability between EDMS software and applications programs, such as word processors, spreadsheets, and databases.

In addition to technical standards, metadata standards are needed to provide ready access to information. Such standards include the Machine-Readable Catalog (MARC), GILS, the X.500 Green Pages, and HTML and SGML metatags. Efforts have been initiated to reconcile the Z39.50 and GILS metadata standards in a core set of common elements, and to provide for technical interoperability between the Z39.50 protocol and the X.500 directory. When the technical interoperability issues have been addressed and a core set of metadata is widely used, the stage will be set for Federal agencies to provide ready access to their public information. The rise of Web-based search engines has raised questions about the need for other means, but regardless of the protocol that is used, metadata is still needed to refine searches and provide greater precision. Thus, these technologies are complementary, rather than mutually exclusive.

However, if agencies are to uphold their legal mandates to provide public access to their information, they must use EDMS software to manage their documents internally. When they do, not only can public access be enhanced but they will find improvements in their internal work processes to be an even greater benefit.

Freedom of Information and Right to Privacy

The right of the public to obtain information held by the Federal Government is summarized in a report published by the U.S. House of Representatives, entitled "A Citizens Guide on Using the Freedom of Information Act and the Privacy Act of 1974 to Request Government Records" (H.Rpt. 102-146), as follows:

The Freedom of Information Act (FOIA) establishes a presumption that records in the possession of agencies and departments of the Executive Branch of the United States government are accessible to the people. This was not always the approach to federal information disclosure policy. Before enactment of the FOIA in 1966, the burden was on the individual to establish a right to examine these government records. There were no statutory guidelines or procedures to help a person seeking information. There were no judicial remedies for those denied access. With the passage of the FOIA, the burden of proof shifted from the individual to the government. Those seeking information are no longer required to show a need for information. Instead, the "need to know" standard has been replaced by a "right to know" doctrine. The government now has to justify the need for secrecy.

More than secrecy, what has hampered public access to public information has been the sheer volume of it and thus the logistical difficulties of managing, sorting, cataloging, and retrieving it when and where needed. As further explained in the House report:

The FOIA sets standards for determining which records must be disclosed and which records can be withheld. The law also provides administrative and judicial remedies for those denied access to records. Above all, the statute requires federal agencies to provide the fullest possible disclosure of information to the public.

While logistical difficulties should not be taken as an excuse for agencies to refuse or fail to make information readily available, a legitimate concern is the right of citizens for privacy. The report continues:

The Privacy Act of 1974 is a companion to the FOIA. The Privacy Act regulates federal government agency record keeping and disclosure practices. The Act allows most individuals to seek access to federal agency records about themselves. The Act requires that personal information in agency files be accurate, complete, relevant, and timely... [It] also restricts the disclosure of personally identifiable information by federal agencies. Together with the FOIA, the Privacy Act permits disclosure of most personal files to the individual who is the subject of the files. The two laws restrict disclosure of personal information to others when disclosure would violate privacy interests.

The House report sums up the effect of the laws as follows:

The essential feature of both laws is that they make federal agencies accountable for information disclosure policies and practices. While neither law grants an absolute right to examine government documents, both laws establish the right to request records and to receive a response to the request. If a record cannot be released, the requester is entitled to be told the reason for the denial. The requester also has a right to appeal the denial and, if necessary, to challenge it in court. These procedural rights granted by the FOIA and the Privacy Act make the laws valuable and workable. As a result, the disclosure of federal government information cannot be controlled by arbitrary or unreviewable actions.

In the interest of full disclosure of public information and in recognition that information delayed is information denied, Congress recently passed and the President signed into law the Electronic Freedom of Information Act Amendments of 1996 (E-FOIA, P.L. 104-231). When E-FOIA was passed by the House, Representative Maloney (Congressional Record, September 17, 1996, page H10451) asserted:

... the bill ... forces agencies to exercise foresight when installing computer systems which must help expedite agency FOIA requests and operations, rather than impeding them... it would encourage agencies to offer online access to Government information, effectively transforming an individual's home computer into a Government agency's reading room.

Among the findings set forth in the E-FOIA is the assertion that "Government agencies should use new technology to enhance public access to agency records and information." Included among the purposes set forth in the bill are:

... ensuring public access to agency records and information;

improv[ing] public access to agency records and information; and

maximiz[ing] the usefulness of agency records and information collected, maintained, used retained, and disseminated by the Federal Government.

The E-FOIA mandates that agencies make certain records available by electronic means no later than November 1, 1997. Records that must be made available by that date include those created on or after November 1, 1996, that are not "promptly published" and "offered for sale" and which fall into the following categories:

final opinions, including concurring and dissenting opinions, as well as orders, made in the adjudication of cases [5 USC(a)(2)(A)]
statements of policy and interpretations which have been adopted by the agency and are not published in the Federal Register [5 USC(a)(2)(B)]
administrative staff manuals and instructions to staff that affect a member of the public [5 USC(a)(2)(D)]
all records, regardless of form or format, which have been released to any person [under a FOIA request] and ... are likely to become the subject of subsequent requests for substantially the same records [5 USC(a)(2)(E)]

With respect to the latter category, by the end of the century, December 31, 1999, E-FOIA also requires agencies to make available an on-line index -- a mandate that relates closely to the Government Information Locator Service (GILS) and the Green Pages of the X.500 directory.

Paperwork Reduction Act

Statutory authority for the establishment of GILS is provided in section 3511 of the Paperwork Reduction Act of 1995 (PRA, P.L. 104-13). Initially, the goal was to establish a "...distributed agency-based electronic [service to] identify the major information systems, holdings, and dissemination products of each agency." However, the House committee report (H.Rpt. 104-37, p. 28) accompanying the bill asserted: "Ultimately, this system should become a path to the holdings themselves." The Senate committee report (S.Rpt. 104-8, pp. 25-26) contained identical wording and went on to say:

As better standards for organizing and accessing databases are developed, agencies need to work toward common protocols that will make direct public access a practical reality. The goal of creating a means for agencies and the public to obtain, not merely locate, government-held information should guide the development of GILS. (emphasis added)

The committee's reference to the ability of agencies to "obtain" government-held information is insightful, for if they cannot retrieve it efficiently themselves, they cannot possibly make it readily available to the public! The wording of the law itself [Sec. 3506(b)(1)(C)] is equally explicit in requiring agencies to:

... manage information resources to ... improve the integrity, quality, and utility of information to all users within and outside the agency ... (emphasis added)

By this wording, Congress has debunked the myth that agencies can somehow uphold their obligation to the public without first effectively managing information internally. Nor can information be managed effectively if agencies cannot even determine without great effort what information resources they hold. Under paragraph 3506(b)(4), agencies are required:

... in consultation with [GSA and NARA to] maintain a current and complete inventory of the agency's information resources, including [GILS] directories ... (emphasis added)

Although many have chosen to ignore the guidance in the committee reports and interpret section 3511 merely to require that an inventory of information systems be made available via GILS, the intent of paragraph 3506(b)(4) is clear on it's face. It is not just an inventory of systems that is required under the law, but a complete inventory of "information resources". Lest there be any doubt about the meaning of that term, it is defined in the law [section 3502(6)] as follows: "the term 'information resources' means information and related resources, such as personnel, equipment, funds, and information technology..." (emphasis added) Clearly, it is the information itself that is the focal point, rather than the systems by which it is processed and maintained. Moreover, subparagraph 3506(d)(1)(B) requires agencies to:

... ensure that the public has timely and equitable access to the agency's public information [including] information maintained in electronic format ...

It is noteworthy that enactment of the E-FOIA came little more than a year after the PRA became law. Indeed, both were passed by the 104^th Congress. In the committee report (S.Rpt. 104-272) accompanying the bill, Senator Leahy explained why Congress saw the need for further action:

The efficient operation of the Freedom of Information Act has been hindered by 5 years of foot-dragging by the Federal bureaucracy... Curiously, it was often argued that the FOIA was not a primary program of the departments and agencies, a contention that sadly ignored the importance of Government information accessibility for the citizens of a democracy.

In remarks on the Senate floor (Congressional Record, September 17, 1996, p. S10715), Leahy added:

... failure to comply with the statutory time limits [for FOIA responses] ... breeds contempt by citizens who expect government officials to abide by, not routinely break, the law.

On the other hand, in the Senate report, Leahy approvingly noted the Committee's reference to GILS, calling it a "helpful tool" and noting its relationship to the Internet:

Significantly, many Federal agencies are also establishing sites on the World Wide Web to educate the public about their mission and facilitate access to information about the agency. Agencies should be encouraged to establish a FOIA requester section on the Web site homepage to facilitate on-line access ...

Agency Web sites are just one of many potential sources of Government information. In a memorandum dated September 29, 1995, Director Rivlin of the Office of Management and Budget set forth guidance for implementation of the information dissemination provisions of the PRA, including the following:

... the PRA makes agencies responsible for carrying out sound information dissemination practices ... One of the major goals of the Act is to encourage a diversity of sources for information based on government public information. It recognizes that State and local governmental entities, the information industry, libraries and educational institutions, and other entities are partners in promoting the use of government information for the maximum benefit of society.

Indeed, paragraph 3506(d)(4) of the PRA prohibits agencies from establishing any "... exclusive, restricted, or other distribution arrangement that interferes with the timely and equitable availability of public information." In addition, subparagraph 3506(d)(1)(B) explicitly provides:

... in cases in which the agency provides public information maintained in electronic format, [it shall provide] timely and equitable access to the underlying data...

Subsection 3506(f) specifies:

With respect to records management, each agency shall implement ... procedures ... for archiving information maintained in electronic format ...

Paragraph 3506(h)(3) requires each agency to:

... promote the use of information technology ... to improve the productivity, efficiency, and effectiveness of agency programs ...

In each case, it should be noted that these provisions do not merely suggest discretionary actions; they are legal mandates. As Senator Leahy noted, the issue is whether agencies will uphold their obligations to the public or whether they will continue to drag their feet and flaunt the law.

Federal Records Act

The Federal Records Act (FRA) governs the creation, management and disposal of Federal agency records. Among other things, the FRA requires that each agency shall:

make and preserve records containing adequate and proper documentation of the organization, functions, policies, decisions, procedures and essential transactions of the agency and designed to furnish the information necessary to protect the legal and financial rights of the Government and of persons directly affected by the agency's activities.

The FRA also prescribes the only manner in which Federal records can be disposed, including "machine readable" records:

made or received by an agency of the United States Government under Federal law or in connection with the transaction of public business and preserved or appropriate for preservation by that agency ... as evidence of the organization, functions, policies, decisions, procedures, operations, or other activities of the Government or because of the informational value of data in them.

Records which fall under the province of this statute may only be removed by the agency from the

agency's computers with the approval of the National Archivist. (Loundy, 1995)

Information Technology Management Reform Act

Several provisions of the Information Technology Management Reform Act (ITMRA, P.L. 104-106, Titles 50-57) are pertinent to the storage and accessibility of public information. For example, ITMRA requires:

The Office of Management and Budget (OMB) to establish an effective and efficient planning process, including consideration of common needs that should be served by interagency or Governmentwide systems [Sec. 5113(b)(3)].

Each Federal agency to identify investments that would provide shared benefits or costs for other Federal, State, or local governmental agencies [Sec. 5122].
The National Institute on Standards and Technology (NIST) to promulgate standards and guidelines, and such standards shall be compulsory to the extent necessary to improve efficiency, security, or privacy [Sec. 5131(a)(1)].

For any system from which information is disseminated to the public, ITMRA [Sec. 5403] states that an index should be included in the on-line directory maintained by the Government Printing Office (GPO) pursuant to 44 USC 4101. That section of the "public printing and documents" title of the U.S. Code of Statutes requires GPO to:

maintain an electronic directory of Federal electronic information;

provide on-line access to the Congressional Record, Federal Register, and other documents, as appropriate; and

operate an electronic storage facility for Federal electronic information.

To the extent practicable, GPO is required by 44 USC 4101 to accommodate any request by heads of departments and agencies to include information in the system. Building upon that mandate, the effect of ITMRA is to require each Federal agency to make available in GPO's system at least an index to information that it disseminates to the public, if not necessarily the information itself. Bearing in mind that E-FOIA now requires agencies to make available by electronic means any document that is released to anyone and is likely to be of interest to others, the scope of this requirement becomes virtually all-encompassing ... unless of course the assumption is made that Federal employees are producing documentation of interest to no one but themselves. And Congress has addressed that issue too -- through passage of the Government Performance and Results Act (GPRA). No longer is activity for its own sake sufficient to warrant the expectation of continued funding and support.

Government Performance and Results Act

As summarized by Alliance for Redesigning Government (ARG), the intent of GPRA is to shift the focus of Government officials and managers from program inputs toward program execution. Officials are expected to attend to the results (outcomes and outputs) being achieved, and to assess how well programs are meeting intended objectives. To bring about this shift, the GPRA sets out requirements for: 1) defining long-term general goals, 2) setting specific annual performance goals (targets) that are derived from the general goals, and 3) annual reporting of actual performance compared to the targets. As Federal managers are held more accountable for achieving results, they also are given more flexibility and discretion in how they manage programs. Finally, the legislation provides for tests of various performance budgeting concepts.

Pilot projects are required over the next several years to test and demonstrate annual performance plans, program performance reports, and managerial accountability and flexibility. Additional pilot projects on performance budgeting are to be conducted during fiscal years 1998 and 1999. Full-scale, government-wide implementation of strategic planning, annual program goal-setting, and annual program performance reporting of expenditures in the Federal budget begins in 1997.

ARG suggests that implementation of GPRA will be characterized, among other things, by:

Defining an agency's mission, and setting general goals and objectives are inherently budget and policy issues, and involving a broad group of agency, Congressional and public stakeholders in this process.
Intrinsically linking the annual performance plan to the President's budget, and having performance goals correspond to program resources requested...
Agencies having substantial discretion in defining annual goals and measures.
... early preparation of mission statements, and identification and development of performance measures so that performance goals (that will be based on trend data) can be properly set.
... preparation of annual financial statements under the Chief Financial Officers Act to build a foundation for and experience in performance measure in the departments and major agencies.

To use the phrase so oft repeated by Vice President Gore, each of these actions should contribute toward realization of a Government that "costs less and works better." However, it should be noted that all of them depend upon the efficient and effective documentation and management of information. Indeed, it is no exaggeration to suggest that documenting linkages between inputs and outcomes is the very essence of GPRA. A commonly understood truism is "you can't manage it if you can't measure it." With respect to documentation, a corollary is that "you can't manage it if you don't manage it." Explicit documentation of Government inputs and outputs clearly constitutes the kind of records to which the public is entitled. Providing quick and easy access to such records in readily comprehensible form is key to demonstrating what the taxpayers are getting for their money. As Senator Leahy says (S.Rpt. 104-272):

The American taxpayer has paid for the collection and maintenance of ... records and should get prompt access ... upon request. That is what the law requires and that is the standard of service Government agencies should meet. Long delays in access can mean no access at all.

EDMS, DMA and ODMA

While there is room for debate over what constitutes unreasonable delay, in the parameters of cyber space and time characterizing the emerging information age, any delay exceeding a few seconds is questionable. The "just-in-time" principle of Total Quality Management (TQM) must be viewed in a whole new light, meaning the expectation is that information will be delivered instantaneously, upon request or even automatically, without request, as needed. The utility principle of marketing has long been understood to mean that products and services are valueless unless they can be delivered where and when they are needed by the customer. However, the expectation is growing that information should be delivered anywhere, anytime -- in a matter of minutes, if not seconds, rather than hours, days, weeks, or months.

If Government agencies are to meet their requirements under the law, much less meet or exceed the expectations of their customers, they must first capture, process, and manage their documentation by electronic means. It should come as no surprise that the tool required to manage electronic documents is called "electronic document management systems" (EDMS) software. The requirements for enterprisewide EDMSs have been set forth by a group of academicians and business representatives (Black Forest Group, 1995) under the auspices of the Association for Information and Image Management (AIIM). In addition, all of the major vendors of commercial, off-the-shelf open-systems EDMS software have joined together in the Document Management Alliance (DMA) to develop and foster standards for interoperability.⁽¹⁾ The Department of Justice represents the Federal Government as a member of the DMA.

Central to the efficient and effective management of documents is capturing them from the instant of their creation and tracking action on them throughout their life cycles. Virtually all documents are created on computers and filed on PC and/or network file server hard drives. The problem is that they are not well ordered and managed as organizational assets, much less as assets to which the public is fully entitled. Indeed, in many cases, documents are not even readily available to their authors after a period of time, as hundreds and thousands are accumulated with little or no additional metadata beyond cryptic file names. To alleviate the technical end of this problem, the Open Document Management Application Program Interface (ODMA) group, a subgroup of the DMA, has agreed upon and published standards in order to ensure interoperability between open-systems EDMSs and the word processing, spreadsheet, and database applications software in which documents are created (e.g., WordPerfect, Word, Lotus 1-2-3, Paradox, etc.).

To more fully understand the benefits of EDMS software, it is instructive to consider several definitions of the term "document":

An organized view of information (PC DOCS spokesperson, 1995)
The unit of work within a defined business process (Black Forest Group)
A collection of related material, regardless of media, that conveys information. Documents can include paper, microfilm/fiche, wordprocessing documents, spreadsheets, electronic mail, digitized images, videos, voice mail and so on. (Cronin)

Yet another definition suggested by an unknown author is "data in context", i.e., information that has been selected, organized, and presented in a way intended to be meaningful to someone. With respect to the application of EDMS software, the term "document" is defined to include virtually anything that can be given a file name and stored on electronic media. Relative to the operations of the Federal Government, the term "document" might be considered to be synonymous with the term "record" under the FRA, and the provisions of the E-FOIA might be interpreted to mean that all Federal documents must be managed with EDMS software. Indeed, in Armstrong vs. Executive Office of the President, the court has ruled that even E-mail messages must be considered for record-keeping purposes, and it would be illogical to conclude that electronic "documents" are somehow excluded while less formal "messages" are covered. Clearly, as suggested by Senator Leahy, an attitude adjustment is needed by those on the public payroll who choose to ignore or deny their responsibility to the public.

Equally important are the tools to enable public employees to carry out their responsibilities efficiently and effectively, with little or no extra effort beyond that which is actually needed to process the information with which they are entrusted. Record-keeping is not, nor should it be addressed as a make-work, after-the-fact process requiring additional systems or activities. Rather, it should be accomplished as a natural adjunct within the very system by which the work is done throughout Federal departments and agencies. At the same time, previous assumptions regarding record-keeping should be reexamined in the context of EDMS capabilities. In particular, little, if any training should be required for employees to carry out their record-keeping responsibilities, and any perceived need for such training should be understood instead as an indicator of the need for improvements in the systems within which the work is done. For example, employees should not be expected to commit to memory any, much less all of the pertinent provisions of the Privacy Act, the Federal Records Act, nor any records "schedules". Instead, all such requirements should be addressed in a knowledgebase that is automatically invoked within the EDMS as appropriate times. Moreover, for many classes of documents that do not involve issues of privacy or valid secrecy concerns, the entire process of making them available to the public at the appropriate time can and should be fully automated.

The first step is for departments and agencies to gain management control of their own documents within their own organizational structure, through the use of EDMS software that complies with the DMA open-systems standard. As explained on its hompage on the World Wide Web, the DMA is a task force of AIIM dedicated to the realization of a uniform approach for creation and operation of enterprise-wide document-management systems. The primary product of DMA is a specification for an integration model and the interfaces by which applications and services from a myriad of sources can be integrated into a document-management solution. The DMA specifications are a progressive series of agreements that support a broad, practical vision of electronic documents serving enterprises in increasingly powerful and flexible ways.

The DMA Task Force and the DMA architecture exist because of a shared vision among users and vendors of document management systems. The DMA vision can be described as "Uniform Access to Documents Anywhere." With DMA-enabled systems, AIIM says organizations can:

Locate and use electronic documents, wherever they are and in whatever form they exist.
Enable workers in departments and organizations to operate document-management systems designed around specific areas of practice and strategic business applications. The local document-management environment can be accessed by other organizations for appropriate shared-use of important reusable materials.
Enable workers in different organizations can engage in collaborative activities involving shared use of some or all documents from their respective areas.
Preserve the document legacy of an enterprise over time and space. Documents remain accessible and usable in the face of technology substitutions, organizational and application growth, and changes of scale, technology, and distribution of document management in the enterprise.
Build document management systems that provide uniformity of access and integration with other systems over a wide range of scales, from desk top to department to enterprise and beyond to federations of systems among strategically linked organizations.
Create an energetic market in which innovations in document-centered applications, in document management systems and in document services are rapidly deployed and integrated into enterprise information systems. New capabilities are smoothly introduced and quickly effective.

Create a market wherein document management is as common and pervasive as electronic mail.

The focus of the X.500 directory on E-mail is testament to its growing ubiquity, but mob rule seldom leads to rational decision-making. The emphasis on E-mail is merely reflective of the reality of the current state of common understanding. However, with respect to the relative importance of various classes of information and the tools best designed to process and manage them, this certainly seems to be a case of the tail wagging the dog. E-mail is best used for informal communications, whereas all important information should be documented and processed by more formal means. Given sufficient resources, it is possible to "grow bananas on Pike's Peak" but that does not mean anyone should do it, particularly with other people's money, e.g., tax revenues. Likewise, E-mail systems can be tweaked and twisted to serve more formalized purposes, albeit at a cost in terms of lost utility for the purposes to which they are best suited. In publishing rules on record-keeping requirements for E-mail, the National Archives and Records Administration (NARA) noted one aspect of this problem:

If agencies fail to create and maintain on another format full documentation of their policies and activities under clear and specific recordkeeping requirements, e-mail could assume inflated importance. Agencies have the opportunity and responsibility to put e-mail in its proper context by issuing, where they are lacking, recordkeeping requirements that clearly state what records are to be created and maintained and on what medium.

Hope springs eternal. If a bastion of paper like NARA can make a simple yet astute observation like this, can others be far behind? Indeed, at least one agency, the Immigration and Naturalization Service, has issued policy prohibiting the use of E-mail for "official business" (Sikorovsky and Holmes). On the ODMA homepage, AIIM highlights the fact that document management has gained increasing momentum over the past five years. Once considered a utility program, AIIM asserts that document management has become a strategic technology. Today, in thousands of installations, EDMSs manage files created from an array of applications, allowing users to search and manage files with ease, speed and flexibility.

As EDMSs continue to grow, the ability to integrate these systems with an increasing number of desktop applications is critical. EDMSs are most effective when integrated seamlessly with the applications used by organizations. Examples of these applications may be as common as an office automation software suite. Or they may be as esoteric as a laboratory information management system. With all of these products, the primary job of the DMS is to intercept all activity on any file, including opening, editing, viewing, versioning, checking out, saving, printing, E-mailing. There is no need for any human being to be forced to carry out such a menial task as logging such information when computers are so very good at it. In the cyber age, such activities are the functional equivalent of sweat shops. They should be legally prohibited and a case can be made that the E-FOIA takes a long stride in that direction.

If records are created in electronic form and the E-FOIA now requires that they be made available to the public in electronic form, why would any self-respecting agency choose to degrade them to paper, thereby forcing its employees to jump through myriad, needless logistical hoops for paper-based record-keeping purposes -- only to be required by law to turn around and re-create those records in electronic form? Such a Rube Goldberg process gives a whole new meaning to the concept of waste, fraud, and abuse.⁽²⁾

Of course, if anyone chooses to obtain public information in paper form, they should be entitled, within reason, to receive it that way. Paper is often a good medium for the presentation of documents and the display of information. However, it is a lousy medium for the general storage, retrieval, and management of documents and information. In many cases, paper may be a better alternative than a computer monitor, but seldom if ever is it a good substitute for a computer hard drive or an optical storage device -- especially if public information is involved. EDMS software is the key to enabling departments and agencies to effectively manage public documents and to make them readily available when, where, and in whatever format the public chooses. And it is not just the documents themselves that are subject to record-keeping. Equally important is the evidence of who did what with them and when -- the very kind of information that is so difficult and needless for humans to accumulate and manage but so readily accommodated by computers.

The EDMS vendors have provided seamless integration for a number of popular products such as word processors and spread sheets, but they found it increasingly difficult to maintain customized integrations as well as provide integrations for new products. EDMS and application vendors discovered that developing and supporting these integrations was time-consuming and resource draining. As a result, not enough resources were devoted to their primary mission, enhancing their own products. Vendors needed a better method of linking applications to EDMSs. They needed a standard that would allow any document management system to seamlessly integrate with any application. By adhering to a standard conventional application, vendors could develop products that would easily integrate with EDMSs. Additionally, document management vendors would no longer be required to write a customized document management integration for each application. The standard must be simple to implement for both application vendors and document management vendors, and that is the aim and the effect of the ODMA standard. Any applications software acquired by Federal departments and agencies for the purpose of creating and/or processing public information should be required to comply with the ODMA standard.⁽³⁾

In addition to technical compatibility standards, metadata standards are needed to foster practical access to information. Metadata is data about data. In laymen's terms, it is the information that would be printed on a library index card in order to assist people in finding the documents they need. With EDMS software, the functionality is vastly enhanced. Not only, for example, can searches be conducted by author and title, but also by any other field on the electronic index "card" -- what is commonly called the document "profile" or the "properties" or "attributes" of the document. In addition to "fielded" searches, the full-text of the documents themselves can be indexed and searched, and the metadata for each document can be displayed to assist users in selecting the documents they need as they scroll through the search results list. Moreover, metadata (i.e., data about "outcomes") is key to the vision of GPRA, and the best way to capture outcome data is to capture metadata associated with documents -- documentation that is already being created in the normal course of business.

Various efforts are underway to establish metadata standards, including GILS, MARC, Z39.50, and X.500 Green Pages, as well as efforts related to HTML and SGML tagging. The appropriate metadata elements should be formalized, adopted as mandatory Governmentwide standards for efficient access to public information -- as suggested by ITMRA section 5131(a)(1). The initial, compulsory set of Governmentwide metadata elements should be small, but should be extended by consensus (or, if necessary, by edict) among affinity groups for specific types of documents and processes. Much of the necessary planning has already been done for static documents. However, the metadata requirements for the management of documents in process should also be considered, and implemented in agencywide EDMSs. Then the Governmentwide medatada dictionary should be continuously improved on a well-coordinated basis.⁽⁴⁾

GILS, MARC and Dublin Core

The GILS Core Elements are set forth and summarized in Appendix B. The Machine-Readable Cataloging (MARC) formats are standards for the representation and communication of bibliographic and related information in machine-readable form. Appendix C matches the GILS Core to the corresponding USMARC tags. The Dublin metadata workshop of March 1995 and the Warwick metadata workshop of April 1996 were convened to foster consensus on the description of network resources across a broad spectrum of stakeholders, including the computer science community, text markup, and librarians.

The result of the first workshop -- the Dublin Core Metadata Element Set (Dublin Core) -- represents a simple resource description record that has the potential to provide a foundation for electronic bibliographic description to improve structured access to information on the Internet and promote interoperability among disparate description models. Beyond the elements themselves, what is significant is the consensus achieved across the many disciplines represented at the workshop.

The Warwick workshop was a follow-on activity, intended to broaden the international scope of consensus and to identify impediments to deployment of a Dublin Core. The results of this workshop include a proposed syntax for the Dublin Core, the development of guidelines for application of the Core, and a framework (the Warwick Framework) for metadata that will promote modular, separately accessible, maintainable, and encryptable packages of metadata. Thus, the Dublin Core might be one of a number of sets of metadata, including sets for terms and conditions, archiving and preservation, content ratings, and others.

The Dublin Core is comprised of a set of thirteen metadata elements, proposed as the minimum number of metadata elements required to facilitate the discovery of document-like objects in a networked environment such as the Internet. The syntax was deliberately left unspecified as an implementation detail. The semantics of these elements was intended to be clear enough to be understood by a wide range of users:

Subject: Topic addressed by the work
Title: Name of the object
Author: Person(s) primarily responsible for the intellectual content of the object⁽⁵⁾
Publisher: Agent or agency responsible for making the object available⁽⁶⁾
Other Agent: Person(s), such as editors and transcribers, who have made other significant intellectual contributions to the work⁽⁷⁾
Date: Date of publication⁽⁸⁾
Object Type: Genre of the object, such as novel, poem, or dictionary⁽⁹⁾
Form: Data representation of the object, e.g., Postscript file or Windows executable file⁽¹⁰⁾
Identifier: String or number used to uniquely identify the object⁽¹¹⁾
Relation: Relationship to other objects
Source: Objects, either print or electronic, from which this object is derived, if applicable
Language: Language of the intellectual content
Coverage: Spatial locations and temporal durations characteristic of the object

Although the goal of the Dublin workshop was to develop a simple set of metadata elements, the elements also had to be defined in such a way that they could be mapped into more complex and highly controlled systems such as USMARC. These conflicting demands were reconciled in two ways: First, a set of metadata elements was created with definitions that could be understood easily without the need for user training, along with an approach to modifying the core set to meet the needs of specialized communities for more precise descriptions. Second, mechanisms were suggested for extending the core element set to describe items other than document-like objects.

The consensus reached at the Dublin and Warwick workshops would seem to provide a strong basis for progress toward full implementation of GILS. However, the foot-dragging of which Senator Leahy spoke is readily evident in the conclusions of a soon to be released evaluation conducted by OMB Watch (1996):

Very few agencies have taken the important step of providing access to real information and databases. Although some useful information may be available on their Web sites, few agencies have made the connection between GILS and these services. Meaningful access, being able to access useful information, is still lacking among the vast majority of agency GILS.

Even fewer agencies provide access to their GILS records through technology other than the World Wide Web.

The GILS will remain an agency nuisance that is only minimally useful to others if it is not integrated into existing information collections, dissemination and archival plans.

X.500 Green Pages, Z39.50 and WWW

In addition to the general statutory mandates regarding GILS contained in the PRA, the E-FOIA explicitly requires Federal agencies to establish an on-line index of records by the end of the century, the fast-approaching date of December 31, 1999. While no direct reference to the X.500 directory is readily apparent in the legislative history of this provision, on it's face, the wording is tantamount to a description of the Green Pages. A summary of the X.500 directory is provided by the Networks & Telecommunications Research Group, Computer Science Department, Trinity College Dublin (Trinity), including the CCITT's definition, as follows:

[the X. 500 directory is] a collection of open systems which co-operate to hold a logical data base of information about the set of objects in the real world. The users of the Directory, including people and computer programs, can read or modify the information, or parts of it, subject to having permission to do so.

Trinity points out that the CCITT named the Directory in the singular to reflect the fact that it constitutes a single logical directory. Basically, X.500 defines: 1) the structure, model, and addressing syntax of the directory; 2) a set of Directory services to it's users; and 3) the Directory protocols. X.500 does not define the user interfaces nor does it define the implementation technology. In practical terms, the X.500 offers access to a distributed, open, online directory. Thus, while the initial drive for X.500 is to provide directory support for the X.400 message handling systems (MHS) services, it also offers an open standard for a generalized directory capability, including support for: Open Systems Interconnection (OSI) File Transfer, Access and

Management (FTAM) protocol, electronic data interchange (EDI), and network or international directory service, especially in telephony; general White and Yellow Pages for a corporate directory or a national or international directory service, especially in telephony; and other business applications such as directories for inventories and authentication services.

While the X500 directory is similar in many ways to a general purpose database management system, it is not designed to be one, although the X.500 directory may itself be built on top of such a general-purpose database system. Although the X.500 standard is best known for it's aim to provide universal directory services for business quality E-mail, as envisioned in the M.U.S.E. report (1993) and by the General Services Administration (GSA) for implementation by the Federal Government, the scope is even broader than suggested by Trinity. For U.S. Federal implementation, the X.500 directory would be comprised of four segments: 1) White Pages for information related to people, 2) Blue Pages for office information, 3) Yellow Pages for services, and 4) Green Pages for documents. (Zurier, 1996) Of the Green Pages, the M.U.S.E. report said:

If the M.U.S.E. directory is going to help people find Federal offices according to their mission/program activities [i.e., the Blue Pages], why should those people have to go somewhere else to find Federal information? ... Almost everyone who has been exposed to the educational process in the United States has learned to use a library catalog. In it, the library holdings are listed three different ways: by author, by title, and by subject. The catalog is a simple listing that almost everyone can understand and use. It is an obvious model for the Green Pages ...

More recently, the X.500 plan (Booz, Allen & Hamilton, 1996) issued by GSA stated:

The Green Pages directory service provides a lookup facility for accessing bibliographic retrieval information. The purpose of the Green Pages directory is to provide directory users with retrieval information sufficient to enable them to identify an electronic document, determine the retrieval requirements, and retrieve it through an automated tool. The directory will support the ANSI Z39.50-to-X.500 Gateway protocol and the Government Information Locator Service (GILS) for integrated access to online Government document repositories.

The schema for the X.500 directory is set forth in Appendix D of the X.500 Detailed Design document (Booz, Allen and Hamilton, p. 1-12). It comprises more than a hundred object classes, with attributes ranging from one to dozens per class. Among the object classes are two called "Document" and "Document Series". Seven attributes are specified for the Document Series object class:

Common Name
Description
Locality Name
Organization Name
Organization Unit Name
See Also
Telephone Number

Twenty attributes are identified for the Document object class:

Keyword
Audio
Document Author
Document Location
Document Publisher
Document Title
Document Version
Information
Last Modified Time
Last Modified By
Photo
Manager
Unique Identifier
Common Name
Description
Document Identifier
Locality Name
Organization Name
Organization Unit Name
See Also

A list of X.500 implementors and their implementations can be found on Internic's Web site (Internic). With respect to the focus of the X.500 directory on E-mail, in many cases E-mail is merely an inefficient way to find the information (documentation, i.e., "documents") that people need.⁽¹²⁾ In many cases, what is needed is not more E-mail. Instead, the exchange of fewer "messages" should be required in order to locate, retrieve, and share needed information.

The same is true of telephone calls and voice mail. In many cases, callers don't really need to talk with anyone, nor can organizations afford anymore to have live people answering the phones. Often, callers just want information that already is or should already be documented. The Green Pages should help to cut down on needless E-mail and voice mail. The same is true of Web pages. However, until the information served on the Web and/or via the Green Pages is comprehensive, people will still feel the need to call and send E-mail to find what may be missing. And that is why the new requirements of the E-FOIA are so important. If Federal agencies make available by electronic means all of their information holdings that are of significant public interest, as required by E-FOIA, they will be in a position to simply point callers and E-mailers to such information via the Web, Z39.50, and/or Green Pages indices.⁽¹³⁾ Then E-mail and telephone communications can focus on the higher-value function of identifying additional information needs, rather than simply chasing each other around to find information that has already been documented.

An excellent overview of the Z39.50 Information Retrieval Standard is provided on a Web site maintained by the National Library of Canada (Turner):

Z39.50 is an American national standard for information retrieval. It is formally known as ANSI/NISO Z39.50-1995 - Information Retrieval (Z39.50): Application Service Definition and Protocol Specification. This document specifies a set of rules and procedures for the behaviour of two systems communicating for the purposes of database searching and information retrieval. As a network application standard, Z39.50 is an open standard that enables communication between systems that run on different hardware and use different software.

The Z39.50 standard was developed to overcome the problems associated with multiple database searching such as having to know the unique menus, command language, and search procedures of each system accessed. Z39.50 simplifies the search process by making it possible for a searcher to use the familiar user interface of the local system to search both the local library catalogue as well as any remote database system that support the standard.

The latest edition of Z39.50 was approved in 1995 by the National Information Standards Organization (NISO), the only organization accredited by the American National Standards Institute (ANSI) to approve and maintain standards for information services, libraries and publishers.

An international version of Z39.50, called the Search and Retrieve (SR) Standard, was approved by the International Organization for Standardization (ISO) in 1991. SR is a compatible sub-set of the Z39.50 standard and will interwork with systems based on Z39.50. However, Z39.50-1995 will shortly replace SR as the international information retrieval standard.

Additional information is provided on the Library of Congress' Web page for the Z39.50 Maintenance Agency, including documentation and information related to the development and ongoing maintenance of the standard (ANSI/NISO Z39.50-1992, ANSI/NISO Z39.50-1995, and development of future versions of Z39.50) along with information on the implementation and use of the Z39.50 protocol.

Hamer and Favaro address the interrelationship between Z39.50 and the World Wide Web, pointing out that the success of the Web and the increasing use of Web front-ends to library catalogs and other information systems have caused decision-makers to question whether the investments required to establish additional Z39.50 services are still warranted. They also suggest the increased versatility of the 1995-version of the Z39.50 protocol -- which enables it to provide powerful services outside of the strictly bibliographic application domain -- leads information specialists to wonder where the WWW and Z39.50 fit together in the evolving information infrastructure. To resolve these issues, Hammer and Favaro discuss the strengths and weaknesses of the Web as a "simple networked access system":

Using the forms-based interface of the World Wide Web (WWW) in conjunction with graphical clients or browsers such as Mosaic or Netscape has become an inexpensive and popular method of providing user-friendly access to on-line catalog systems. The tools required to publish information on the WWW in this fashion are inexpensive or even free, and are generally straightforward to use. The results are rewarding: It is a remarkably simple task to produce attractive, graphical interfaces which have similar appearances across many different desktop platforms. No specialized software beyond a normal WWW "browser" is required on the client side, and facilities for File Transfer (FTP) and simple searching (WAIS, etc.) are well-integrated into the WWW suite of protocols.

However, there are serious tradeoffs involved when using this approach. The individual WWW client has no knowledge of the application domain in which it operates. It receives a stream of graphical user interface primitives (such as buttons, text-entry fields, and formatted response data) from the server, and naively displays these to the user. The WWW inherits a problem that has haunted users since the first information systems went on-line using simple character terminals: No two information systems share the same interface characteristics. Each new system requires the user to master a new interface structure, and, with the advent of graphical interfaces such as the WWW, a new set of custom-designed icons and symbols.

Information systems often support the notion of a search "session", in which the results of previous queries can be re-used or refined. The HTTP (HyperText Transfer Protocol), which is at the core of the WWW, is inherently stateless: Numerous problems arise when the interface is adapted to host systems that have a notion of a continuous session between the client (user) and the server. There are currently efforts underway to add state-managing mechanisms to the underlying protocols, but the basic paradigm remains essentially a stateless one, which fits poorly with the session-oriented interfaces to most on-line information systems.

And they analyze the role of Z39.50 for searches on the Internet:

The Web is an ideal vehicle for organizations that are "vertically integrated," that is, which are owners of content that they can present to the user in a structure of their own choosing. That is why many media and entertainment companies are showing a great interest in the Web today. But when users must actively search the Web for information across organizations, they encounter a sea of largely unstructured data.

The library community has much to offer in the way of providing structure to information resources on the Internet. The Z39.50 standard is a concrete representation of this fact. Currently the search engines and indices of Web resources suffer from the same weaknesses as the interfaces to library systems. No two are alike, and there is no way to make structured use of the data that they return. With the current growth of the Web, the search engines are becoming increasingly important -- a significant portion of the Web community now spends more time looking at search engine output than on any other type of Web page. However, it may eventually become impossible for any one organization to index it all in a useful way. We will need more well-structured access methods to allow searching across multiple indices. Here the power of Z39.50 as a true, mature information retrieval protocol becomes evident.

The Z39.50 standard specifies an abstract information system with a rich set of facilities for searching, retrieving records, browsing term lists, etc. At the server side, this abstract system is mapped onto the interface of whatever specific database management system is being used. The communication taking place between the server and the client application is precisely defined. The client application is unaware of the implementation details of the software hiding behind the network interface, and it can access any type of database through the same, well-defined network protocol. On the client side, the abstract information system is mapped back onto an interface which can be tailored to the unique requirements of each user: a high-school student may require a simple, graphical interface with limited functionality, while an information specialist may need a complex, highly configurable information retrieval engine. Finally, casual users may prefer an interface which blends in smoothly with their word processor, database software, or, indeed, WWW browser.

Hammer and Navaro sum up the essential power of Z39.50 as allowing diverse information resources to look and act the same to the individual user, while at the same time allowing each information system to assume a different interface for every user, perfectly suited to his or her particular needs. Hammer and Navaro conclude with a discussion of the strength of the Web for navigation between resources:

Z39.50 was born as a point-to-point, client/server mechanism. It provides very powerful means of locating records within one or more databases on a single server. The problem that remains is that of navigation between servers or information resources:

How do we find the server and the database that has the information we are looking for?

How do we learn about the contents of a server?

For learning about new servers or information providers, the Explain facility of Z39.50 is an important resource. Explain provides a structured mechanism for the information provider to publish information about the capabilities of the server software, and about the characteristics of the information stored in each database on the server. The rich set of information elements defined by the Explain facility includes contact information for the host institution, as well as specifications of the available access points (indices) for searching. The rigid structuring of the information allows the client software to automatically configure itself and adapt to each server system, while the uniform interface to the descriptive information about the database helps the user quickly orient himself to the contents of a new information resource.

The truly difficult issue, however, is establishing an infrastructure between servers that allow users to locate the right servers for their purpose in an easy way. The Z39.50 URLs are useful in this respect, because they make Z39.50 servers appear to be "just another kind of document" in the Internet space. People can collect and categorize collections of servers the same way they do other kinds of documents or information resources. WWW search engines can even be used to discover new Z39.50 servers.

Our preferred approach would be to use Z39.50 itself to find Z39.50 servers. That is what locator services can do. GILS defines an application profile for Z39.50 that is useful for locating information resources (although admittedly, GILS is optimized for US government documents, and as such it is probably less than ideal for some other purposes). These documents can be anything -- from books to reports to archives of photographs to on-line databases to WWW-documents (and since a WWW-document can be a Z39.50 server, the locator service can be used for exactly the purpose we have in mind).

With a slightly simpler and more general profile than GILS, Z39.50 could become a very powerful tool for accessing indices of information resources. In effect, we are postulating that we replace or supplement all of the existing WWW-crawlers with Z39.50 servers. In that way, we would be able to access all of the different indices with a uniform interface, and because the access structure is fully standardized, it would be simple to gateway or replicate information between servers -- we would potentially only need a single starting point to search for any kind of information anywhere in the world. Indeed, this is an important part of the vision behind the Global Information Locator Service currently being investigated by the G-7 Group of industrial nations.

Again, static documents containing Z39.50 URLs will provide an increasingly important means of discovering and accessing information resources, as WWW-browsers with Z39.50 client-capabilities become commonplace. When these documents are, themselves, served or located by Z39.50-aware systems, the circle is complete.

Hammer and Navaro believe there is a strong potential for a profitable and synergetic relationship between the WWW and Z39.50. They see the two worlds merging together, with each one growing stronger by using the best elements of the other: hyperlinks between systems and document types from the WWW, and structured searching and document discovery from Z39.50.

Christian (February 6, 1996) has also addressed this issue in some detail. His proposal for "fitting Z39.50 to the Web" is contained in Appendix D. Perhaps it is no coincidence that the acronym for the Global Information Locator Service contemplated by the G-7 is the same as the Government Information Locator Service mandated by U.S. law. However, regardless of what it is called, the concept is the same -- and its time is nearing.

Conclusion

Hammer and Navaro suggest that a simplified version of the GILS profile be used with the Z39.50 protocol for accessing documents on the Web, and the need for a simplified profile has become evident to others as well. (See, for example, Foresman, Porter, and Wiggins.) Efforts have been initiated to bring the Z39.50 and GILS profiles into accord with each other (Christian, October 12, 1996) and to provide for an interface between the X.500 Green Pages and Z39.50 directories (Finley, 1996). If and when the document metadata elements associated with these initiatives are consolidated and implemented in EDMS software by Federal agencies, the stage will be set for universal, worldwide access to documents created and managed by the employees of the U.S. taxpaying public. In short, these efforts will facilitate realization of the promise and the mandates of the Paperwork Reduction Act and the Freedom of Information Act.

While the EDMS technical interoperability issues must be addressed, getting Federal agencies to agree upon and implement document metadata standards, much less having Federal employees actually adhere to them, will be the real challenge. However, the disciplined use of such standards is an essential requirement in order to enable the American public to obtain the information they need or desire -- information for which they have paid! With the technology now available, it is simply unacceptable to expect people to have to sift through haystacks to get to the point of the information they need. Nor is it acceptable to think they should be required to "mine" for nuggets of truth that should be at their fingertips.⁽¹⁴⁾ The time is clearly approaching for the universal use of EDMS software and a standard core set of metadata for all documents created and managed by Federal employees. It is the least the taxpayers have a right to expect of those of us privileged to be in public service.

Moreover, through the laws it has passed, Congress could scarcely have made its intentions clearer. The only question is whether Federal agencies will uphold their obligations to the American taxpaying public or, as Senator Leahy laments, whether they will continue to flaunt the law. While the mistakes of the past cannot be corrected nor can EDMS technology be implemented overnight, there is no excuse for Federal agencies pleading ignorance or poverty, while shedding crocodile tears. The lack of a perfect and complete solution is no defense for failure to take measured steps in the right direction, with available resources. It is a little like the desire to go to heaven. We don't expect nor even necessarily want to get there right away, but in everything we do, we should keep our eyes on the goal. Each and every agency is making investments in information technology each and every year. They are simply failing to take their obligation to the public into account. As required by FOIA, FRA, ITMRA, GPRA, PRA and, most recently, by E-FOIA, provisions for public access should be an inherent consideration in each and every acquisition of technology with which public information will be created and processed.

Armstrong (November 14, 1996) has made the common sense observation that there is no need for agencies to incur additional expense for additional systems to provide public access to public information.⁽¹⁵⁾ They need only to give the public access to the systems they are already using to process such information.⁽¹⁶⁾ If they do, inevitably they will need to use EDMS software to manage their documents and control access to them, and when they do that in a well-coordinated way, they will find that improvements in their internal work processes are of equal or greater benefit than enhanced public access to their information.

It should not be necessary to haul agencies into court to force them to uphold their obligations to the public, much less to operate efficiently. However, should that be necessary, there can be little doubt of the outcome on the basis of law. Under E-FOIA, Congress has established November 1, 1996, as information access V-day for the American public.

End Notes

1. A tabulation of the market-leading EDMS vendors is contained in Appendix A, and a list of the members of the DMA may be found at: http://www.aiim.org/dma/industry/mem/

2. Is "fraud" too strong a term to apply? According to Webster's New Collegiate Dictionary (1974), the word may be defined as: "DECEIT, TRICKERY ... intentional perversion of the truth in order to induce another to part with something of value or to surrender a legal right" or "an act of deceiving or misrepresenting : TRICK ..." While those who purport to use paper-based systems for record-keeping may argue they have no intent to defraud the public of its right to access to public information, they are certainly fools if they really believe that any system that relies upon paper can truly afford the kind of access to which the public is entitled by law. Not only is it a practical impossibility, but it is now also contrary to the dictates of the E-FOIA. Thus, "foolishness" or "fraud" are the choices to describe the actions of those who choose not to use EDMS software.

3. Paragraph 5131(a)(1) of ITMRA requires NIST to promulgate standards and "such standards shall be compulsory to the extent necessary to improve efficiency, security, or privacy." (emphasis added)

4. For information on national and international metadata standardization efforts, see Newton, 1996.

5. In the context of Government service, the term "Lead" might be a more appropriate descriptor for this element since we Government employees do not work for ourselves nor do we "own" our output. Instead, we are either leaders or members of teams employed by the taxpayers.

6. For Government documents that are made available by electronic means, a more appropriate descriptor than "publisher" would be "agency". This element should be brought into conformance with the organizational portion of the X.500 directory.

7. "Other agents" are the members of the "Team" that worked on the document.

8. With electronic documents, the term "last modified" is more appropriate than "date of publication". Indeed, if different versions of the same document are made available under the same profile, each will have a different date of "publication" or "last modification".

9. In EDMS terms, "document type" is equivalent to the "object type" element.

10. The "form" element is equivalent to the "file type" element in EDMSs, and through the ODMA standard, EDMS software will automatically and seamlessly invoke the necessary applications software to view and/or edit the file.

11. EDMS software automatically assigns a unique identifier to each document.

12. Likewise, the use of an X.500 directory to provide "strong authentication" of E-mail "messages" for EC (electronic commerce) is like signing the envelope rather than its contents, i.e., an electronic form, contract, or other document. It would be more efficient to focus authentication and encryption activities on electronic forms automation (E-forms) and documents in EDMSs, rather than on E-mail messages.

13. The success of GILS, as an information locator, will be measured by the degree to which it provides direct access to information and becomes "invisible" as an "inventory" of "systems". Whether GILS lives on as a systems inventory and evolves to serve the agencywide and multiagency planning requirements of ITMRA for systems management purposes is another matter. In fact, that is one of the ways that the Department of the Treasury plans to use its GILS records. (Myatt)

14. "Data mining" is becoming a term of art, generally meaning that automated tools are used to sift through volumes of information in order to glean the elements that are meaningful among an undifferentiated mass . While the concept is appropriately applied to data that is compiled in vast quantities by automated means, it makes little sense to apply it to documents that are created and processed one-at-a-time by Federal employees. Failing to capture the appropriate metadata associated with documents as they are created and processed is functionally equivalent to taking a gold nugget and burying it deeply in the earth so that anyone who seeks to find it later must use heroic means in order to do so. While mining is certainly appropriate and necessary for raw materials, it has no place when it comes to locating and using highly processed products like documents produced with tax revenues paid by the American public.

15. Scott Armstrong is the investigative reporter who filed and won the famous Armstrong v. Executive Office of the President lawsuit requiring that Ollie North's E-mail messages be released and that E-mail generally must be considered for record-keeping purposes.

16. The classic way to perpetrate fraud is to maintain two sets of "books".

References

Alliance for Redesigning Government, Summary of Implementation of the Government Performance and Results Act of 1993 (GPRA), National Academy of Public Administration. Available at: http://www.clearlake.ibm.com/Alliance/clusters/op/gprabref.html

Armstrong, S., Information Trust, November 14, 1996. In a panel discussion on "Users' Experiences" at the GILS Conference conducted at NARA II, College Park, Maryland, a member of the audience argued that agencies cannot make information more readily available to the public without additional funding. In response, Mr. Armstrong suggested that no additional systems or funding should be required. Instead, agencies should simply make their existing systems (for which the taxpayers have paid) available to the public.

Black Forest Group, Requirements for an Enterprise Document Management System, Association for Information and Image Management, January 1995.

Booz, Allen & Hamilton, [X.500 planning document, title unknown],Center for Electronic Messaging, General Services Administration, July 30, 1996, pp. 4-3 and 4-4. Available at: ftp://ftp.fed.gov/emailpmo/X500/guidance

Booz, Allen & Hamilton, Detailed Design For A Government Electronic Directory, Center for Electronic Messaging, General Services Administration, July 30, 1996. Available at: ftp://ftp.fed.gov/emailpmo/X500/design

Boeyen. S. X.500 Services for Integrated Applications, Messaging Magazine, Bell-Northern Research. Available at: http://www.ema.org/html/pubs/mmv1n2/x5serv.htm

Christian, E., Proposal for Fitting Z39.50 to the Web, February 6, 1996. Available at: http://library.adelaide.edu.au/m/syslibs/1996/0006.html

Christian, E., U.S. Geological Survey, personal communication via E-mail with Margaret St. Pierre and Owen Ambur, October 12, 1996. In this exchange, Christian suggested that version 3 of the GILS profile be brought into conformance with version 3 of the Z39.50 profile in order to foster interoperability. St. Pierre enthusiastically concurred.

Cronin, J.L., An Introduction to Electronic Document Management, Wang Federal, Inc., May 1, 1995.

Document Management Alliance (DMA), Homepage on the Web, Association of Information and Image Management (AIIM). Available at: http://www.aiim.org/dma/

http://www.aiim.org/dma/industry/mem/

Dublin Core Elements. Information available at: http://www.oclc.org:5046/oclc/research/conferences/metadata/dublin_core_report.html http://www.oclc.org:5046/research/dublin_core/ http://purl.org/metadata/dublin_core_elements

Federal Register, Electronic Mail Systems (final rule), National Archives and Records Administration, August 28, 1995, p. 44636.

Finley, J., General Services Administration, personal communications with Eliot Christian and Owen Ambur, September 25, 1996. In this exchange, Finley indicated that Booz-Allen (GSA's contractor for planning Governmentwide implementation of X.500 Directory) has prepared a report on the feasibility of gateway services between the X.500 directory and other known directories, e.g., Z39.50.

Foresman, T., Porter, D.L., and Wiggins, H.V., Metadata Myth: Misunderstanding the Implications of Federal Metadata Standards, Department of Geography, University of Maryland Baltimore County, April 1996. Available at: http://www.nml.org/resources/misc/metadata/proceedings/author_index.html

FTAM, definition available at: http://alpha.biophys.uni-duesseldorf.de/htbin/helpgate/HELP/DECNET_OSI/FTAM http://www-dept.cs.ucl.ac.uk/teaching/dcnds/d52/notes3/node54.html

GILS Core Elements and Corresponding USMARC Tags. Available at: http://www.dtic.dla.mil/gils/documents/naradoc/appendc.html

Government Information Locator Service (GILS) Core Elements, May 2, 1994. Available at: http://www.usgs.gov/gils/gilsappa.html

Hammer, S. and Favaro, J., Z39.50 and the World Wide Web. Available at: http://130.228.5.168/webz.html

H.Rpt. 102-146, A Citizens Guide on Using the Freedom of Information Act and the Privacy Act of 1974 to Request Government Records, U.S. House of Representatives, July 10, 1991. Available at: http://hi-tec.twc.state.tx.us/general/foia.htm

IETF information is available at: http://www.apps.ietf.org/apps/workareas.html http://www.apps.ietf.org/apps/procedures.html http://www.apps.ietf.org/~hta/ietf/apps/ http://sun1000e.pku.edu.cn/on_line/misc/html/HTML-WG/Charter-draft.html http://services.bunyip.com:8000/services/ietf.html

Internic, List of X.500 Implementors and Implementations. Available at: http://www.internic.net/projects/x500catalog/impselect.html

Loundy, David J., E-Law 3.0.1: Computer Information Systems Law and System Operator Liability in 1995. Information on the Federal Records Act. Available at: http://www.leepfrog.com/E-Law/E-Law/Part_VIII.html

MARC information is available at: http://lcweb.loc.gov/marc/ http://portico.bl.uk/nbs/marc/commarcm.html http://www.tkm.mb.ca/microcat-manual/c_records.html http://www.fsc.follett.com/data/compare.html http://infoshare1.princeton.edu/katmandu/marc/guidtoc.html

M.U.S.E. Report: A Unified Federal Government Electronic Mail Users' Support Environment, 1993. Available at: ftp://ftp.fed.gov/emailpmo/muse

Myatt, G., Department of the Treasury, panel discussion on "Managerial Issues" at the GILS Conference held at NARA on November 13, 1996. In her comments, Ms. Myatt reported that Treasury will use its GILS records for information systems management and planning purposes, as well as for records management. If GILS records are limited to providing an inventory of information systems, instead of pointing directly to the underlying information itself, it will be more useful for systems management purposes than as an information locator. While few agencies currently plan to use GILS records in that way, it is encouraging that Treasury does. Moreover, other agencies may follow, since section 5402 of ITMRA requires an inventory of all computer equipment in each agency and it will be difficult, if not impossible, to carry out the capital planning and investment requirements of section 5122 ITMRA without such an inventory.

Newton, J., Application of Metadata Standards, Information Systems Architecture Division, Computer Systems Laboratory, National Institutes of Standards and Technology, 1996. Available at: http://www.nml.org/resources/misc/metadata/proceedings/newton/paper.html

OMB Watch, Report on the Implementation of the GILS (Draft), October 1996.

Open Document Management API (ODMA) Homepage. Available at: http://www.aiim.org/odma/odma.htm

Rivlin, Alice M., Director, Office of Management and Budget, Implementing the Information Dissemination Provisions of the Paperwork Reduction Act of 1995, memorandum, September 29, 1995. Available at: http://www.law.vill.edu/chron/articles/ombdon.htm#intro

Sikorovsky, E. and Holmes, A., Archives decides when e-mail must be retained, Federal Computer Week, August 28, 1995, p. 1.

Smallwood, B., Document Management: Understanding the Document Management Marketplace -- A Segmented Approach, Document Management, September/October 1996, pp. A1-A6.

S.Rpt. 104-272, Electronic Freedom of Information Improvement Act of 1995, Committee on the Judiciary, United States Senate, May 15, 1996.

Trinity College Dublin, Definition of X.500. Available at: http://ganges.cs.tcd.ie/4ba2/x500/katie/x5model_p1.html

Turner, F. Overview of the Z39.50 Standard Information Retrieval Standard, National Library of Canada , June 1996. Available at: http://www.nlc-bnc.ca/ifla/VI/5/op/udtop3.htm

X.500 information available at: http://www.ema.org/html/pubs/mmv1n2/x5serv.htm http://www.internic.net/projects/x500catalog/impselect.html http://bbs.itsi.disa.mil:5580/E4189T2837 http://www.internic.net/projects/x500catalog/impselect.html http://ganges.cs.tcd.ie/4ba2/x500/katie/x5model_p1.html http://ganges.cs.tcd.ie/4ba2/x500/katie/x5model_func.html

Z39.50 information available at: http://lcweb.loc.gov/z3950/agency/ http://www.nlc-bnc.ca/ifla/VI/5/op/udtop3.htm http://130.228.5.168/webz.html http://library.adelaide.edu.au/m/syslibs/1996/0006.html http://www.uregina.ca/~library/z39.html http://ulmo.stud.slu.se:8001/kurser/it-kurs-vt96/z3950.html

Zurier, S., Governmentwide e-mail: Agencies perform feats of integration with disparate systems and protocols, Government Computer News, August 26, 1996, pp. 67 - 74.

Appendix A

Market-Leading EDMS Vendors

The following table shows the market leading EDMS vendors by number of separate installations and "seats" sold, according to Document Management magazine. (Smallwood, 1996)

Company Vertical Market Strength RDBMS Server/Client Support Inter/

IntraNet
Total Installs Seats

PC DOCS Legal, Gov't Oracle, Sybase, SQL Server Motif, Mac, NT, NetWare, Vines

3.1/95/Mac
Beta 3,000+ 400,000+

IDI Gov't, Legal, Mfg Proprietary Unix (Sun, H-P, IBM, DEC), VMS, MVS Yes 2,200 1,000,000

Saros Oil & Gas, Financial Services, Utilities, Telecomm. Oracle, Sybase, SQL Server, DB2 Unix, NT, NetWare, Vines, IBM & MS LAN Manager

3.1/95/Motif/Mac
Yes

50

installs
500 500,000

(sold)

SoftSolutions Legal Proprietary NetWare, Unix (Sun, H-P, IBM) No N/A 250,000

Interleaf Mfg. Oracle Unix (Sun, H-P, IBM)

NT
Yes 400+ 20,000

Sherpa Aerospace, Defense Oracle, Ingres Unix (Sun, H-P, SGI)

3.1/95/NT, SunOS and Unix
Yes 250+ 50,000

Formtek Aerospace, Defense Oracle Unix (Sun, H-P, IBM, SGI)

3.1/95
Yes

Orian
150 50,000

Documentum Pharmaceutical, Hi-Tech Mfg.,

Aerospace
Oracle, Sybase,

SQL Server
Unix (Sun, H-P, IBM); NT

3.1/95
Yes

10

installs
150 N/A

NovaSoft Mfg. Oracle Unix (Sun, H-P, IBM, DEC); VMS, VMS Open, NT

3.1/95/NT
Yes 100+ 15,000

Intergraph Gov't, Mfg., Utilities Oracle, Sybase, Informix NT, Unix, (H-P, IBM, Sun) Yes 100 4,000

Appendix B

GILS Core Elements

GILS Core Locator Records consist of a number of GILS Core Elements that contain information to identify and describe Federal information resources. The term "mandatory" as used in this Profile applies to administration of the subset of GILS Locator Records that have been identified by the record source as participating in the GILS Core. GILS servers are not required to distinguish "mandatory" from other elements.

TITLE (Mandatory, Not Repeatable): This element conveys the most significant aspects of the referenced resource and is intended for initial presentation to users independently of other elements. It should provide sufficient information to allow users to make an initial decision on likely relevance. It should convey the most significant information available, including the general topic area, as well as a specific reference to the subject.

CONTROL IDENTIFIER (Mandatory, Not Repeatable): This element is defined by the information provider and is used to distinguish this locator record from all other GILS Core locator records. The control identifier should be distinguished with the record source agency acronym as provided in the U.S. Government Manual.

ABSTRACT (Mandatory, Not Repeatable): This element presents a narrative description of the information resource. This narrative should provide enough general information to allow the user to determine if the information resource has sufficient potential to warrant contacting the provider for further information. The abstract should not exceed 500 words in length.

PURPOSE (Mandatory, Not Repeatable): This element describes why the information resource is offered and identifies other programs, projects, and legislative actions wholly or partially responsible for the establishment or continued delivery of this information resource. It may include the origin and lineage of the information resource, and related information resources.

ORIGINATOR (Mandatory, Not Repeatable): This element identifies the information resource originator, named as in the U.S. Government Manual where applicable.

ACCESS CONSTRAINTS (Mandatory, Not Repeatable): This element is a grouping of subelements that together describe any constraints or legal prerequisites for accessing the information resource or its component products or services.

GENERAL ACCESS CONSTRAINTS (Mandatory, Not Repeatable): This subelement includes any access constraints or legal prerequisites applied to assure the protection of privacy, and any other special restrictions or limitations on obtaining the information resource. Guidance on obtaining any users' manuals or other aids needed for the public to reasonably access the information must also be included here. This element in some cases may contain the value "None."

ORIGINATOR DISSEMINATOR CONTROL (Optional, Not Repeatable): This subelement contains specifics determined by the originator of the information resource pertaining to the control of access to or dissemination of this resource.

SECURITY CLASSIFICATION CONTROL (Optional, Not Repeatable): This subelement contains specifics pertaining to the security classification associated with the information resource.

USE CONSTRAINTS (Mandatory, Not Repeatable): This element in some cases may contain the value "None." It describes any constraints or legal prerequisites for using the information resource or its component products or services. This includes any use constraints applied to assure the protection of privacy or intellectual property and any other special restrictions or limitations on using the information resource.

AVAILABILITY (Mandatory, Repeatable): This element is a grouping of subelements that together describe how the information resource is made available.

DISTRIBUTOR (Mandatory, Not Repeatable): This subelement consists of the following subordinate fields that provide information about the distributor:

DISTRIBUTOR NAME

DISTRIBUTOR ORGANIZATION

DISTRIBUTOR STREET ADDRESS

DISTRIBUTOR CITY

DISTRIBUTOR STATE

DISTRIBUTOR ZIP CODE

DISTRIBUTOR COUNTRY

DISTRIBUTOR NETWORK ADDRESS

DISTRIBUTOR HOURS OF SERVICE

DISTRIBUTOR TELEPHONE

DISTRIBUTOR FAX

RESOURCE DESCRIPTION (Optional, Not Repeatable): This subelement identifies the resource as it is known to the distributor.

ORDER PROCESS (Mandatory, Not Repeatable): This subelement is a grouping of the following subordinate fields that provide information on how to obtain the information resource from this distributor.

ORDER INFORMATION (Mandatory, Not Repeatable): This subelement provides information on how to obtain the information resource from this distributor, including any fees associated with acquisition of the product or use of the service, order options (e.g., available in print or digital forms, PC or Macintosh versions), order methods, payment alternatives, and delivery methods.

COST (Optional, Not Repeatable): This subelement indicates whether or not there is a cost associated with this resource.

COST INFORMATION (Optional, Not Repeatable): This subelement contains textual information about the cost associated with this resource.

TECHNICAL PREREQUISITES (Optional, Not Repeatable): This subelement describes any technical prerequisites for use of the information resource as made available by this distributor.

AVAILABLE TIME PERIOD (Optional, Repeatable): This subelement provides the time period reference for the information resource as made available by this distributor, in one of two forms:

TIME PERIOD -- STRUCTURED: Time described using the USMARC prescribed structure.

TIME PERIOD -- TEXTUAL: Time described textually.

AVAILABLE LINKAGE (Optional, Repeatable): This subelement provides the information needed to contact an automated system made available by this distributor, expressed in a form that can be interpreted by a computer (i.e., URI). Available linkages are appropriate to reference other locators, facilitate electronic delivery of off-the-shelf information products, or guide the user to data systems that support analysis and synthesis of information.

AVAILABLE LINKAGE TYPE (Optional, Repeatable): This subelement occurs if there is an Available Linkage described. It provides the data content type (i.e., MIME) of the object identified in the referenced URI to give the user an indication of what is being connected to (e.g., document, image).

POINT OF CONTACT FOR FURTHER INFORMATION (Mandatory, Not Repeatable): This element identifies an organization, and a person where appropriate, serving as the point of contact plus methods that may be used to make contact. This element consists of the following subelements:

CONTACT NAME

CONTACT ORGANIZATION

CONTACT STREET ADDRESS

CONTACT CITY

CONTACT STATE

CONTACT ZIP CODE

CONTACT COUNTRY

CONTACT NETWORK ADDRESS

CONTACT HOURS OF SERVICE

CONTACT TELEPHONE

CONTACT FAX

RECORD SOURCE (Mandatory, Not Repeatable): This element identifies the organization, as named in the U.S. Government Manual, that created or last modified this locator record.

DATE OF LAST MODIFICATION (Mandatory, Not Repeatable): This element identifies the latest date on which this locator record was created or modified.

RECORD REVIEW DATE (Optional, Not Repeatable): This element identifies a date assigned by the Record Source for review of this GILS Record.

AGENCY PROGRAM (*, Not Repeatable): This element identifies the major agency program or mission supported by the system and should include a citation for any specific legislative authorities associated with this information resource.

* This element is mandatory if the resource referenced by this GILS Core locator record is a Federal information system.

SOURCES OF DATA (*, Not Repeatable): This element identifies the primary sources or providers of data to the system, whether within or outside the agency.

* This element is mandatory if the resource referenced by this GILS Core locator record is a

Federal information system.

SCHEDULE NUMBER (*, Not Repeatable): This element is used to record the identifier associated with the information resource for records management purposes.

* This element is mandatory when the GILS Core entry is intended to meet the obligation of Federal agencies to inventory automated information systems or other records series for records management purposes.

CONTROLLED VOCABULARY (Optional, Repeatable): This element is a grouping of subelements that together provide any controlled vocabulary used to describe the resource and the source of that controlled vocabulary:

INDEX TERMS -- CONTROLLED (Optional, Not Repeatable): This subelement is a grouping of descriptive terms drawn from a controlled vocabulary source to aid users in locating entries of potential interest. Each term is provided in the subordinate repeating field:

CONTROLLED TERM

THESAURUS (Optional, Not Repeatable): This subelement provides the reference to a formally registered thesaurus or similar authoritative source of the controlled index terms. Notes on how to obtain electronic access to (e.g., a URI) or copies of the referenced source should be provided, possibly through a Cross Reference to another locator record that more fully describes the standard and its potential application to locating GILS information.

LOCAL SUBJECT INDEX (Optional, Not Repeatable): This element is a grouping of descriptive terms to aid users in locating resources of potential interest, but the terms are not drawn from a formally registered controlled vocabulary source. Each term is provided in the repeating subelement:

LOCAL SUBJECT TERM

METHODOLOGY (Optional, Not Repeatable): This element identifies any specialized tools, techniques, or methodology used to produce this information resource. The validity, degree of reliability, and any known possibility of errors should also be described.

SPATIAL DOMAIN (Optional, Not Repeatable): This element is a grouping of subelements that together provide the geographic areal domain of the data set or information resource. Geographic names and coordinates can be used to define the bounds of coverage. Although described here informally, the spatial object constructs should be as defined in FIPS 173, "Spatial Data Transfer Standard."

BOUNDING COORDINATES (Optional, Not Repeatable): This subelement limits the coverage of a data set expressed by latitude and longitude values in the order western-most, eastern-most, northern-most, and southern-most. For data sets that include a complete band of latitude around the earth, the West Bounding Coordinate shall be assigned the value: -180.0, and the East Bounding Coordinate shall be assigned the value: 180.0. The following subelements comprise the Bounding Coordinates:

WEST BOUNDING COORDINATE: Western-most coordinate of the limit of the coverage expressed in longitude. Domain: -180.0 <= West Bounding Coordinate <= 180.0

EAST BOUNDING COORDINATE: Eastern-most coordinate of the limit of coverage expresses in longitude. Domain: -180.0 <= East Bounding Coordinate <= 180.0

NORTH BOUNDING COORDINATE: Northern-most coordinate of the limit of coverage expressed in latitude. Domain: -90.0 <= North Bounding Coordinate <= 90.0; North Bounding Coordinate >= South Bounding Coordinate

SOUTH BOUNDING COORDINATE: Southern-most coordinate of the limit of coverage expressed in latitude. Domain: -90.0 <= South Bounding Coordinate <= 90.0; South Bounding Coordinate <= North Bounding Coordinate

PLACE (Optional, Repeatable): This subelement identifies geographic locations characterized by the data set or information resource through two associate constructs:

PLACE KEYWORD: The geographic name of a location covered by a datat set or information resource.

PLACE KEYWORD THESAURUS: The name of a formally registered thesaurus or similar authoritative source of Place Keywords.

TIME PERIOD OF CONTENT (Optional, Repeatable): This element provides time frames

associated with the information resource, in one of two forms:

TIME PERIOD -- STRUCTURED: Time described using the USMARC structure.

TIME PERIOD -- TEXTUAL: Time described textually.

CROSS REFERENCE (Optional, Repeatable): This element is a grouping of subelements that together identify another locator record likely to be of interest.

CROSS REFERENCE TITLE (Optional, Not Repeatable): This subelement provides a human readable textual description of the cross reference.

CROSS REFERENCE LINKAGE (Optional, Repeatable): This subelement provides the machine readable information needed to perform the access (i.e., URI).

CROSS REFERENCE TYPE (Optional, Repeatable): This subelement occurs if there is a CROSS REFERENCE LINKAGE and provides the data content type (i.e., MIME) of the object identified in the referenced URI to give the user an indication of what is being connected to (e.g., document, image).

ORIGINAL CONTROL IDENTIFIER (Optional, Not Repeatable): This element is used by the record source to refer to another GILS locator record from which this locator record was derived.

SUPPLEMENTAL INFORMATION (Optional, Not Repeatable): Through this element, the record source may associate other descriptive information with the GILS Core locator record.

Appendix C

GILS Core Elements and Corresponding USMARC Tags

(mandatory GILS elements appear in bold)

GILS USMARC Tag

1. Title 245$a

2. Originator 710$a

3. Controlled Vocabulary (see subelements below)

Index Terms-Controlled 650

Thesaurus 650 1st indicator

650$

4. Local Subject Index 653$a

5. Abstract 520

6. Purpose 500

7. Agency Program 500

8. Spatial Reference (see subelements below)

Bounding Rectangle 255$c

Western-most 034$d

Eastern-most 034$e

Northern-most 034$f

Southern-most 034$g

Geographic Name (see subelements below)

Geographic Keyword Name 651

Geographic Keyword Type 655

9. Time Period of Content (see subelements below)

Time Period-Structured 045$c

Time Period-Textual 513

10. Availability

Distributor

Distributor Name 270$p

Distributor Organization 270$p

Distributor Street Address 270$a

Distributor City 270$b

Distributor State 270$c

Distributor Zip Code 270$e

Distributor Country 270$d

Distributor Network Address 270$m

Distributor Hours of Service 301$a

Distributor Telephone 270$k

Distributor Fax 270$l

Resource Description 037$f

Order Process 037$c

Technical Prerequisites 538

Available Time Period-Structured 045$c

Available Time Period-Textual 037$n for non-electronic resource

856$z for electronic resource

Available Linkage 856$u

Available Linkage Type 856 1st indicator

856$2

11. Sources of Data 500

12. Methodology 567

13. Access Constraints 506

14. Use Constraints 540

15. Point of Contact 856$m for electronic resources

Contact Name 270$p

Contact Organization 270$p

Contact Street Address 270$a

Contact City 270$b

Contact State 270$c

Contact Zip Code 270$e

Contact Country 270$d

Contact Network Address 270$m

Contact Hours of Service 301$a

Contact Telephone 270$k

Contact Fax 270$l

16. Supplemental Information 500

17. Cross Reference (see subelements below)

Cross Reference Title 787$t

Cross Reference Linkage 787$w

Cross Reference Type 856 1st indicator

856$2

18. Schedule Number 583$a,$b

19. Control Identifier 001

20. Record Source 040

21. Original Control Identifier 035

22. Date of Last Modification 005

Appendix D

Proposal for Fitting Z39.50 into the Web

February 6, 1996 Draft

Eliot Christian <echristi@usgs.gov>

US Geological Survey, 802 National Center, Reston VA 22092

Most activity in the World Wide Web today is centered on Web browsers gaining access to information resources on servers through the Hypertext Transfer Protocol (HTTP). Just as the same resource is often made available at the server through multiple protocols such as HTTP, gopher and FTP, this proposal is to make the resource searchable at the server end by adding support for the Z39.50 protocol. (More ubiquitous Z39.50 client software for agents and end users, as through Java or other mechanisms, is addressed separately.)

In essence, Z39.50 provides a common computer-to-computer search protocol between diverse information resources and diverse information access mechanisms. A range of software to implement Z39.50 in this way is available, from freeware to various commercial offerings worldwide.

Because Z39.50 does not dictate the way information is managed at the server end, providers can support various data and information management approaches yet make all the information commonly searchable. Because Z39.50 does not dictate how information is presented at the client end, intelligent software agents are enabled and user interfaces can be customized (in hardware, software, language, sophistication, graphical design, etc.) for each particular market.

In developing a new collection of information for a particular market, a provider can search the contents of other resources via Z39.50 and create pointers to just those portions most relevant to their specific market. If the provider also adds Z39.50 support onto the new collection, the resource gains exposure to seekers of information outside of the targeted audience.

How does Z39.50 improve the Web?

1. Different players have a common problem

Content Seekers sometimes want to include many disparate sources of information in their searching -- not just Web pages, not just the resources of one provider, not just things in the English language, and not just snippets of ASCII text. Better search mechanisms are desperately needed due to the sheer size and diversity of information that people would like to take into account. The Internet has huge amounts of content itself and increasingly acts as a pointer mechanism to the vast information stores of off-line media. However, just as in libraries centuries ago, the Internet has incredible diversity of content but lacks basic agreements on how to tag information objects so they can be found.

Content Owners want their products to be found by all potentially interested seekers. Today, the only recourse is to somehow acquire advertising space from all of the intermediaries (e.g., "I'll pay you to point to my page from your page").

Intermediaries must support non-exclusive distribution arrangements and are finding new roles as brokers connecting particular groups of seekers to the best sources for their needs.

Research and development efforts in advanced information discovery need a common protocol for interoperability to deploy next generation solutions.

2. The client-server model is crucial for progress

Server-based searching is inherently limited. If searching is done at the server, the server designer must package the search for the particular target audience (e.g., what information is included, what language(s) does the user know, is the search simplistic or robust). Particular servers can only be comprehensive for their narrowly defined target audience, because they only provide a "packaged view" of the content. So, to reach seekers outside of the narrow-cast, the content must be exposed to unanticipated searching.

Intelligent software agents will become increasingly important acting as gatherers of information tailored to very specific interests. Designers of software agents, such as Web crawlers, are frustrated by presentation protocols because the agent has no human driver to interpret the wide variations among packaged information. Consequently, Web crawlers can only deal with bits and pieces of Internet content that happen to be in text form. And, Web crawlers cannot handle content behind interface programs (e.g., CGI scripts, Java applets, database access or search forms, etc.) Lacking distributed search mechanisms, the crawler is also constrained to find only those pages that happen to have a unbroken trail of links back to the starting points.

Support of a search protocol with client software allows for next generation software agents. These intelligent agents will characterize the content of information sources and perform distributed searches for those who need periodic updating of volatile information.

3. Z39.50 is the strategic choice for client-server search.

Z39.50 is already adopted widely to provide access to important classes of information, including: existing bibliographic catalogs for libraries, museums, and archives worldwide; government information at the national level in several countries and increasingly at the state and other government levels; environmental information at all levels in the U.S. and internationally; all kinds of geo-referenced (map) data and information.

Hundreds of resources representing information valued in the tens of billions of dollars are already freely accessible through Z39.50 -- more is available on a fee basis. There are also hundreds of Z39.50 WAIS databases available, and thousands more WAIS databases are maintained behind HTTP servers. (Unfortunately, most Web browsers are not enabled to handle the WAIS Z39.50 protocol directly as search clients.)

Increasingly important to address global markets, Z39.50 incorporates the agreed international standards to address multi-language support. Z39.50 can also be expected to provide a path toward the handling of information search at the semantic level, to finally fulfill the goal of finding data and information based on what its content actually means rather that just the text in which it is represented.

The Z39.50 protocol has also demonstrated extensibility to support search based on generalized pattern-matching techniques. These techniques will be increasingly important for finding abstract information such as chemical configurations, gene sequences, fingerprints, faces, video imagery, and numeric trend data.

The Z39.50 protocol is implemented on OSI networks as well as TCP/IP, and its implementation is defined through the Abstract Syntax Notation to enhance interoperability. As a binary protocol exchanging data structures rather than merely passing commands, Z39.50 is relatively more secure than other Internet protocols.

In addition to free software for Z39.50 servers, there are freeware and commercial implementations of gateways to resources such as X.500 and SQL databases, as well as to HTTP.

The Z39.50 standard is extensive in specifying how optional features can be implemented, though it also allows for quite simplistic implementations. By requiring a subset of features in specific implementation contexts, the Z39.50 Profiles greatly improve interoperability and simplify server implementation. Clients can be optimized for access to Z39.50 servers supporting a specific profile yet still enjoy basic search capability on all other Z39.50 servers.

Though already quite sophisticated, the base Z39.50 standard and focused profiles are evolving ever greater power through an effective international standards process with full involvement of dozens of major corporate implementors, tied to ISO and IETF, and connected with very active research at dozens of major universities and programs of national governments worldwide.

Company	Vertical Market Strength	RDBMS	Server/Client Support	Inter/ IntraNet	Total Installs	Seats
PC DOCS	Legal, Gov't	Oracle, Sybase, SQL Server	Motif, Mac, NT, NetWare, Vines 3.1/95/Mac	Beta	3,000+	400,000+
IDI	Gov't, Legal, Mfg	Proprietary	Unix (Sun, H-P, IBM, DEC), VMS, MVS	Yes	2,200	1,000,000
Saros	Oil & Gas, Financial Services, Utilities, Telecomm.	Oracle, Sybase, SQL Server, DB2	Unix, NT, NetWare, Vines, IBM & MS LAN Manager 3.1/95/Motif/Mac	Yes 50 installs	500	500,000 (sold)
SoftSolutions	Legal	Proprietary	NetWare, Unix (Sun, H-P, IBM)	No	N/A	250,000
Interleaf	Mfg.	Oracle	Unix (Sun, H-P, IBM) NT	Yes	400+	20,000
Sherpa	Aerospace, Defense	Oracle, Ingres	Unix (Sun, H-P, SGI) 3.1/95/NT, SunOS and Unix	Yes	250+	50,000
Formtek	Aerospace, Defense	Oracle	Unix (Sun, H-P, IBM, SGI) 3.1/95	Yes Orian	150	50,000
Documentum	Pharmaceutical, Hi-Tech Mfg., Aerospace	Oracle, Sybase, SQL Server	Unix (Sun, H-P, IBM); NT 3.1/95	Yes 10 installs	150	N/A
NovaSoft	Mfg.	Oracle	Unix (Sun, H-P, IBM, DEC); VMS, VMS Open, NT 3.1/95/NT	Yes	100+	15,000
Intergraph	Gov't, Mfg., Utilities	Oracle, Sybase, Informix	NT, Unix, (H-P, IBM, Sun)	Yes	100	4,000