Memorandum

August 7, 2001

To:         Faye Lee, Office of the CIO, Department of the Interior

From:     Owen Ambur, Co-Chair, XML Working Group

Subject:  OMB Information Quality Guidelines

Faye, per Jim Tate's request in my conversation with him yesterday morning, here are some comments on the draft guidelines from the perspective of the CIO Council's XML Working Group. I am also sharing them with Nancy Kaplan, who staffs the CIO Council's Interoperability Committee, to which the XML Working Group reports.

Summary

The gist of my comments can be summed up as: management = metadata. Within the context of electronic information systems, the amount of metadata associated with each record indicates how much management has been applied to it. In turn, the amount of management applied to each record affects its quality. The question is how much management (i.e., metadata) each class (series) of records warrants prior to passing the point of diminishing returns.(1) Regardless of the amount of management applied to any particular class of records, Extensible Markup Language (XML) is a powerful new enabler of the efficient and effective use of metadata on the Internet.

General Comments

Implicit in the need to ensure the quality of information "disseminated" is the need to "manage" such information more effectively. With hard-copy media, management entails various physical processes. However, with electronic records, most of the manual functions can be automated. That is, they can be carried out by computer software and hardware - based upon the metadata associated with each record.(2) Moreover, metadata also provides the record of the mental and social processes involved in managing information. Without metadata, there is no way of knowing, much less assuring the quality of any particular record. Most, if not all, of the necessary metadata is known during the routine processes by which each record is created, reviewed, edited, and otherwise manipulated. Unfortunately, current systems and procedures commonly do not capture the necessary metadata for preservation with the record.

DoD Standard 5015.2 embodies legal, logical, and technical requirements that are applicable to all U.S. federal agencies with respect to the management of records. Included in the standard is a basic set of metadata. (See listing at http://mysite.verizon.net/ambur/RMmetadata.htm#5015.2) The standard also provides for "user definable fields" (i.e., metadata elements defined by the user organization). The 5015.2 standard does not yet provide for interoperability among records management systems. However, XML provides a powerful new opportunity to facilitate interoperability - specifically, by capturing and managing E-records metatada in XML metatags embedded within each record.

Paralleling DoD Std. 5015.2 at the international level is ISO 15489, Information and documentation - Records management. Also worthy of note is the ISO 9000 series, which defines international standards for Quality Management Systems (QMS). (For more information on ISO 9000, see http://mysite.verizon.net/ambur/ISO9000.htm.) However, most pertinent are the four attributes of a record, as set forth in ISO 15489 - Authenticity, Reliability, Integrity, and Usability - as compared to the four statutory terms specified for information "quality" dissemination products in the guidelines: Quality, Utility, Objectivity, and Integrity. The following table relates the attributes of a "record" to the terms specified for information product dissemination "quality" in the draft guideline:
 
Record Dissemination Product Quality Comment
Integrity Integrity Same. Means complete and unaltered.
Usability Utility Appear to be the same. Focus is external as well as internal.
Authenticity Can be proven to be what it purports. Implicit in concept of "quality". Can be made explicit by digital signatures.
Reliability Trustworthy as a full and accurate representation of the transactions, activities, or facts to which they attest.
Quality Concept is unclear without reference to the more basic terms defining the attributes of a record. Using it to define "product dissemination quality" is circular logic.
Objectivity OMB suggests the "sources" of information may call into question its "objectivity". However, the real issue is whether the results are "reproducible upon independent analysis," as OMB also notes. Reproducible results are dependent upon documentation having the characteristics of a record. With limited exceptions, the Government should not put itself in the position of censoring which records citizens should be allowed to see. In particular, a process that discriminates on the basis of authorship is inherently subject to manipulation and calls into question its own objectivity.

OMB may wish to consider adopting the definitions in ISO 15489 and the relevant procedures specified in the ISO 9000 series. While ISO 9000 has been criticized as being too complex for many organizations, many others have been certified under it. As such it does represent a set of best practices toward which Government agencies should be striving. Why would we aim to reinvent a Government-unique standard for the management of quality when an international standard already exists? Doing so would seem to be contrary to the thrusts of OMB Circulars A-76 and A-119. A related question for agencies choosing not to use DoD-certified E-records management systems is how they justify ignoring the legal requirements those systems embody and/or how they justify spending the taxpayers' money reinventing those systems.

Also relevant is the definition of an "information dissemination product" in the guidelines, as compared to the definition of "records" in the Federal Records Act [44 USC 3301]:
 
Records Information Dissemination Product
... all books, papers, maps, photographs, machine readable materials, or other documentary materials, regardless of physical form or characteristics, made or received by an agency of the United States Government under Federal law or in connection with the transaction of public business and preserved or appropriate for preservation by that agency or its legitimate successor as evidence of the organization, functions, policies, decisions, procedures, operations, or other activities of the Government or because of the informational value of data in them... (emphasis added) ... any book, paper, map, machine-readable material, audiovisual production, or other documentary material, regardless of physical form or characteristic, an agency disseminates to the public. This definition includes any electronic document, CD-ROM, or web page. (emphasis added)

The critical distinctions between the two terms are whether the "documentary materials" in question are "preserved" and/or "disseminated" to the public. Many records are not appropriate for dissemination. Conversely, it is possible that some dissemination products are not worthy of preservation. However, it is difficult to imagine how information may have qualities warranting dissemination while lacking those warranting preservation for some period of time. (The notion conjures up thoughts not only of the waste of the taxpayers' money but also of rumors, innuendo, and other forms of "tacit" knowledge.) Thus, it appears logical to conclude that "dissemination products" are a subset of all records "made or received by an agency of the United States Government ... in connection with the transaction of public business ..." The logical options are:
 

Records Information Dissemination Product
Preserve but Don't Disseminate Disseminate but Don't Preserve
Preserve AND Disseminate Disseminate AND Preserve

The second alternatives in each column are functionally equivalent, and the first alternative in the second column is of dubious wisdom. Consequently, regardless of the definition of "documentary material," there are only two logical alternatives for its treatment:

The desired actions - dissemination and retention for an appropriate period time - should be determined by the attributes (qualities) of the "materials." Those attributes should be captured as metadata, which may be represented in XML metatags. More specifically, no record should be "disseminated" unless the metadata that has been associated with it embodies the qualities warranting dissemination. For example, two such qualities might be: a) the name of the author and/or person who disseminated the material, and b) the period of time for which the record will be available. Moreover, no dissemination product should be destroyed (or "lost" or otherwise mismanaged) except in accordance with a records disposition schedule approved by NARA.(3)

The issue of how much metadata (i.e., how much "management") is enough should be addressed not only in agency IT architecture and capital planning processes but also in the context of obtaining NARA approval of records disposition schedules. Indeed, with reference to GPRA as well as the Clinger-Cohen Act, the processes by which agencies select their IT systems should be inherently aligned with how they classify their records, which in turn should stem directly from the objectives specified in their strategic plans.(4) The relationship can be depicted as follows:

The management of business information as records is often left out of the equation. Another common problem is that IT systems are developed as "stovepipes" or "information silos," which is to say they cannot share electronic records easily with each other and, as a result, may contain needlessly redundant records.

It should also be noted that the term "disseminate" is somewhat outmoded in the Internet age. For example, is making information available on a Web site a form of "dissemination"? According to the definition included in the draft guideline, it is. However, there is much to be said for the use of plain language and OMB's definition seems to be at odds with the common meaning of disseminate: "to scatter or spread widely; promulgate extensively". (Ideally, each unique record would exist in only one place. The Internet makes the ideal a practical possibility.) A more appropriate way to define the concept implicit in OMB's definition of "dissemination" would be to identify the metadata (attributes) associated with the target audience. For example, if the audience is blind, braille or an audio interface may be the most appropriate format for dissemination of the record. Or if a group of individuals has indicated by "subscribing" to a Web site or page that they would like to be informed of any additions or changes to it, they may prefer to be notified by E-mail.

Indeed, the most basic issue with respect to dissemination is whether citizens really want what their Government is imposing upon them and whether the best time to deliver it is on the Government's schedule, rather than one that citizens establish for themselves. In the information age, being inundated with too much information is often a bigger problem than having too little.

In any event, the metadata associated with the target audience(s) should define the means of dissemination as well as the quality of the records to be made available. In some instances, some recipients may not wish to see anything that has not been personally approved by the agency head. However, in most instances, citizens probably will want the best information currently available on the topics of interest to them. Moreover, they probably would prefer to specify their own interests as well the time and place to receive information relevant to those interests - as opposed to having a "bureaucrat" determine what will be delivered, along with where and when.

In the case of Internet-enabled users, "dissemination" may be nothing more than providing a hypertext link (URL) to a record on the agency's Web site, which could be done automatically, without any human intervention other than the action of the user him or herself to subscribe to the pertinent record or series. Moreover, many citizens may prefer to minimize the amount of information "pushed" to them via E-mail or any other means, in favor of periodically searching and retrieving the specific records of interest to them at any particular point in time. In either the push or the pull paradigm, XML enables reuse of the very same records in whatever format needed or desired by citizens, regardless of the modes of their abilities. (Disability is a relative term. We are all disabled in our own ways.) In particular, the use of XML metatags can vastly improve the precision of Internet search services such as FirstGov. At the same time, it also provides for distribution by other means, including print publications and audio interfaces.

Such a metadata-driven, user-focused definition of dissemination would be very much in line with the Administration's "citizen-centered E-government action plan". It would also comport with GPRA, which requires agencies to consult with their stakeholders, which in turn means they must first identify who their stakeholders are. Under such a records management regime, not only would citizens be at the center of each and every act of dissemination but, whenever possible, they should be empowered to define their own interests - by subscribing to records and records series having the qualities they need or desire ... qualities that are specified in terms of metadata.

Specific Comments

In the summary of the proposed guidelines, OMB notes that agencies would be directed to develop procedures for reviewing and documenting the quality of information that is disseminated. While policies and procedures are important, they are meaningless and doomed to failure unless they are implemented in working systems, i.e., the means by which the work is actually done. Of course, that does not mean systems should be implemented willy nilly, without regard to policies and procedures embracing the requirements of the business processes in question. However, policies are often issued without regard to what can actually be accomplished, including consideration of the means and resources required. Meanwhile, business goes on "as usual". Indeed, OMB acknowledges the need for flexibility, common sense, and workability.

Toward that end, XML and DoD-certified E-records management systems are key enablers of the policies set forth in the draft. The real issue may be whether sufficient direction and resources will be provided to ensure effective implementation of those means. Otherwise the proposed guidelines may turn out to be only so many words on paper, as has been common with many policy statements in the past. Flexibility is no substitute for clarity about what needs to be done. At some point, it becomes yet another excuse for failure to do that which is clearly needed.

OMB also notes, in the background section of the draft guidelines, that the CIO must certify to OMB that the agency is using information technology to the maximum extent practicable to reduce the information collection burden and improve data quality. While accountability is good, scapegoating is not. Notwithstanding their titles, the authorities and abilities of CIOs to control what occurs within their agencies is often limited, as it should be in a government supporting a free and democratic citizenry. The objective should not be to try to set up the CIOs at the top of a steep hierarchy of the sort that current management practice calls for eliminating. To do so would be to doom the CIOs to failure and, thus, to set them up as scapegoats. Instead, the aim should be to implement DoD-certified E-records management systems that:

a) enable everyone to do their jobs efficiently and effectively, while

b) capturing and associating with each record the appropriate elements of metadata required to ensure the quality of the process (i.e., accountability).

Each and every one of us should be accountable for our own actions - within a system that meets the underlying requirements for the business-quality treatment of the records we are creating in the normal course of our business processes. As OMB Deputy Director O'Keefe has suggested, establishment of an IT Czar may have the unintended consequence of absolving others from responsibility. Likewise, suggesting that CIO's should act as the "mother hen" to everyone who gathers and uses information within their agency may add little real value to the process. A more appropriate role for the CIOs and for OMB is to ensure that adequate direction and resources are provided to ensure prompt and effective implementation of records management systems that have been certified to meet the underlying requirements.

OMB specifically requests comments on how to make clearer and less ambiguous the four terms used to define the qualities of information to be disseminated. Having already addressed those terms, I will merely reiterate my suggestions that:

a) OMB consider using the terms specified for the attributes of a record in ISO 15489, and

b) "information dissemination" and "quality" be defined in operational terms, based upon the metadata associated with each record and with its stakeholders.

The draft guidelines call for agencies to report annually on the number, nature, and resolution of complaints regarding perceived and confirmed failures to comply. In the information age, annual reporting is unacceptable. Stakeholders, including OMB and Congress, should be able to retrieve and analyze the necessary data anytime they choose. Again, XML is an enabler and extensions like eXtensible Business Reporting Language (XBRL) are demonstrating the practical reality of near real-time reporting on the Internet. Moreover, reporting perceived or confirmed failures is no substitute for resolving them, as expeditiously as possible. The aim should be to design and implement systems that eliminate the causes of failure - not to establish procedures simply to record them.

OMB asks whether they should devote particular attention to the types of information to be disseminated. The answer is no. Even as the agency CIOs are not mother hens, nor is OMB. Instead, what OMB should do is ensure that agencies have explicit direction and adequate resources to implement records management systems that appropriately classify, capture, and manage all of the records they are creating in the course of their business processes. As previously stated, classification and management require metadata, and DoD-certified E-records management systems provide for the association of metadata with records. Rather than being distracted with trying to micromanage agency business processes and thus their records, OMB should focus its attention on ensuring that all agencies are using records management systems that meet the underlying requirements, as specified in DoD Std. 5015.2. By definition, any other approach fails to minimize the risks associated with using systems that have not been proven to meet the requirements for the business-quality management of records.

OMB asks whether they should develop specific guidelines to address information "disseminated" on agency Web sites. Again, the answer is, no, OMB should not attempt to devise and impose from the top-down the "mother of all records schedules" for the entire U.S. federal government, or even for the myriad Web sites hosted by agencies. However, OMB should insist that agencies use systems that meet the requirements for managing not just the records on their Web sites but all of the records they create, receive, and process - regardless of whether they are made available to the public. Such systems may incorporate different workflow management requirements for different classes of records, including those warranting "publication" and "dissemination" by any means. For example, some records may require digital signatures prior to release whereas as others may be distributed freely, with relatively little authoritative metadata.

If and when a registry of "inherently governmental" XML data elements, DTDs, and schemas is operational at xml.gov, OMB should work with NARA to incorporate it into the records scheduling process. Specifically, the registry should be used by agencies not only to indicate the retention periods associated with each of their records series, but also to specify the metadata elements by which each series will be classified and, thus, may be queried and retrieved. The Freedom of Information Act requires requesters to "reasonably describe" the records they seek. [5 USC 552(a)(3)(A)(i)] Metadata describe records. Conversely, a description of the records sought is a set of metadata. For example, a citizen may wish to receive only those documents authored or signed by a particular official concerning a particular subject within a specified range of dates. Each of those parameters is an element of metadata. In a citizen-centered Government, agencies would specify in advance the metadata elements by which their records can be searched and retrieved on the Internet, thereby relieving citizen of the burden of having to guess or speculate about the inner organizational mysteries of myriad governmental bureaucracies.

In addition, a citizen-centered eGovernment would:

a) identify the generic characteristics of its myriad stakeholder groups,

b) render those characteristics in XML metatags, and

c) associate those metatags with each of its records series - thereby relieving individual citizens of the burden of having to guess or being expected to know which of those records may be of interest to them.

In effect, functionally speaking, that is what agencies already do when they "disseminate" information. They simply don't do it very well, which is to say they don't do it efficiently or effectively in a well-focused manner. They do it with their own interest and perspectives foremost in mind, and they fail to capture the essence of their stakeholders' interests in terms of metadata.

Within a truly citizen-centered eGovernment, agencies would not receive and report annually on "complaints". Instead, they would immediately correct any factual errors. They would also act as expeditiously as possible to accommodate "information enhancement requests" and they would report promptly any such requests they deem to be worthy of action but lacking necessary resources. Parameters for prioritizing such unmet needs would be established - in terms of records management metadata - and such needs would be addressed in the annual budget process. As has been oft stated, "justice delayed is justice denied." A corollary truism is that a complaint divorced from the process by which resources are allocated might as well be a whistle in the wind.

Conclusion

DoD Std. 5015.2 does not currently provide for interoperability among E-records management systems, nor does it address the requirements of the Electronic Freedom of Information Act (E-FOIA) or for security, privacy, or the dissemination of information to stakeholders who may need or desire it. However, XML holds the potential to address each of those requirements. How XML can be used to enhance the 5015.2 standard along those lines will be the focus of the September 19, 2001, meeting of the XML Working Group.

It would be good for OMB and NARA to join DoD in issuing clear guidance on the need to use systems that have been certified to meet the requirements for records management, and also to join in the pursuit of XML-related enhancements to the 5015.2 standard. It would also be good for OMB to ensure that GSA and NIST have adequate resources to acquire and maintain the XML registry/repository, that agencies are directed to use it, and that NARA provides explicit guidance on its use for records classification and disposition scheduling. However, by whatever means metadata are captured, represented, and associated with records within any agency or office of Government, they are the key to the quality of "information dissemination products."



1. Metadata is data about data. For example, the information in a library card catalog is metadata about the books on the shelves.

2. The elimination of needless steps in the supply chain is the essence of improving the efficiency of processes, including those deemed to be "inherently governmental" in nature.

3. The Federal Records Act does not require that records be preserved in perpetuity, merely that agencies think through the logic of how long each series should be maintained based upon the business requirements and risks associated with it. Agencies commonly fail to do so, particularly with respect to records created in electronic systems whose life cycles may be much shorter than the records created, received, and processed in them.

4. The term "classify" is another synonym for "metadata" that is placed in the larger context of a taxonomy. A taxonomy is a logical organizational structure. To classify something is to assign it to a category within a taxonomy. The names assigned to each category are metadata associated with each record within that category. While taxonomies are commonly hierarchical in nature, information technology - particularly relational database technology - enables other types of organizational structures to be applied. Current management trends call for the elimination of needless and outmoded hierarchies. By virtue of XML's "extensibility," XML metatags enable individuals and organizations to avoid making needless either/or choices among taxonomic structures. Both hierarchy, where appropriate, as well as myriad other relationships can readily be represented in XML, thereby enabling each organization as well as each individual to "have it your way."