Document Management System Interoperability - The Need, The Answer
A White Paper for Federal Agency CIOs and IT Architects
February 1998

This paper was authored by Dan Schneider, U.S. Department of Justice, 202-514-4318, schneidd@justice.usdoj.gov. The Department of Justice is a DMA user organization member. Mr Schneider welcomes reader questions or comments.


Document repositories - their design and operation - will become a CIO Critical Success Factor over the next several years. Their importance in information technology (IT) architectures will be enormous. While the looming Year 2000 demands may overshadow immediate ventures into this relatively new IT area, agency planners who wait until after the Year 2000 to give careful attention to them do so at great risk. This paper was written to help agency officials and planners prepare, regardless when they start writing system specifications.

The immediate impetus for this paper was the recent adoption by the industry-wide Document Management Alliance of a "DMA 1.0 Specification," the industry's first standard enabling document management systems from different vendors to interoperate. The paper covers document management systems generally and their role in architecture, and why the DMA accomplishment is so important.

What are (Electronic) Document Management Systems?

EDMSs, or Document Management Systems (DMSs) as they're also called - the terms are used interchangeably, are commercial-off-the-shelf (COTS) software package cousins to DBMSs. Whereas DBMSs are designed to store structured data - principally data elements and data records - relieving application programmers of data storage and retrieval tasks, DMSs store, retrieve and manage unstructured information objects - files, text, spreadsheets, images, sound clips, multi-media, and compound documents - giving non-IT end-users rich storage and retrieval of all of them without anyone in the end-user's organization having to do any programming. An IT staff's involvement is to configure and install the package, train the users, provide help-desk support, and keep the servers running smoothly. In this sense, DMSs are more like word processing and email packages.

What are some of these DMSs?

Like their DBMS cousins that range from Microsoft's Access to Oracle's mainframe products, DMSs range widely in richness and scalability. They're all proprietary and don't interoperate. Some of the current players are Documentum, FileNet, Altris, Lotus, Interleaf, NovaSoft, Open Text, and PC Docs. It's a very competitive marketplace, with new entries, mergers and acquisitions every year, and it's growing by leaps and bounds as business processes unshackle from paper.

Who's using them, and why haven't I heard more about them?

They're already well-installed outside the Beltway across the manufacturing, commercial, and financial spectrum in the pharmaceutical, chemical, automotive, financial, insurance, and aerospace industries, plus many other private sector communities. Fortune 500 enterprises are recognizing that DMSs have an important role to play in IT architectures. Their use in the public sector has progressed more slowly.

Why has public sector adoption been slower?

Three reasons. First, one of the big drivers in the private sector is reducing time-to-market and development costs for new products and services. The entire product life-cycle, and especially its creation and introduction processes, is very document/object intensive, involving written proposals and communications, drawings and specifications, laboratory and prototyping reports, market surveys, financial analyses, problem analyses, performance evaluations, etc. Effective management of all of these in electronic form rather than on paper helps compress the product development cycle and lower all product life-cycle costs. Until recently, the public sector hasn't shared these pressures.

Second, because they replace the current way end-users are accustomed to keeping and managing their computer files, DMS installations mean a big change for end-users. Using a DMS involves at least some business process reengineering together with lots of initial training and hand-holding while the users get accustomed to the new way that their files are stored and retrieved. DMSs are a major cultural change in how information is handled, and have a big impact on worker training and adjustment.

Third, the impending Year 2000 wave doesn't favor rocking the applications boat for either public or private sectors, but competitive pressures don't give the private sector breathing space for slowing its drive away from paper. DMS implementation in the public sector will probably continue to go slowly until the Year 2000 storm has been safely weathered.

Why are DMSs important to me?

They mean big returns on investment and big gains in service to citizens and customers because they're the key to changing most of today's paper-based processes to electronic. The dollar and performance improvement benefits can be huge. It's often said that around 90 percent of what the government does is on paper. However, when the paper gives way to electronics, as it is doing with e-mail, voice-mail, maps and drawings, instruction manuals, policy formulations, applications for benefits, and regulatory filings, to mention only some, those electronic objects must be stored and retrieved in a well-managed way. If not, the CIO, as accountable official, may face personally a very uncomfortable situation. Already, we're seeing the tip of the iceberg in the recent court cases on e-mail and word processing files. DMSs are the only way to set up and operate well-managed electronic filing cabinets that meet legal requirements.

Does that mean that these capabilities will help solve my recordkeeping and FOIA problems?

Absolutely! In the final analysis, they are the only long-term answer for electronic recordkeeping. Everything else is a temporary fix. There are several relatively inexpensive COTS software packages designed specifically for records management and archiving, and to support FOIA requests to their contents. However, none provide for getting the records into their data bases. That's where everything breaks down. Therefore, they have done the only thing possible; they have tied their packages to one or more DMS products.

By tying records management and archiving to DMSs, an organization takes care of one end - the back end - of the records "life cycle." What remains is the other end, the front end, and the process of getting into the DMS what will later need to be preserved. That includes e-mail messages, word processing documents, and even voice-mail messages. A complete case file needs to include all these, and more, so the commercial DMS products are concentrating on the front-end integrations to make these captures seamless and transparent. The front-end marriages with office suites, e-mail systems, etc., are one of the characteristics that vary among the DMS products. In fact, for many organizations, the front-end integration is the most important consideration in DMS product selection.

In connection with recordkeeping, we find a mundane but surprisingly large nugget of gold - big enough in some cases to pay for much of the costs of going into DMSs and records management products. That nugget is the dollars involved in getting rid of file cabinets, file rooms, their associated floor space and their associated clerical staffing. The dollar savings may even extend to off-site storage of records that are retired but not yet destroyed.

Another big nugget will come in faster, more reliable processing of FOIA requests, and in something that isn't obvious because we're so accustomed to paper, namely the ability of many people in many locations to be looking at the same file and records, at the same time. When active records are in DMSs and preserved records are in the DMS-associated recordkeeping systems, all the contents of both are available to all authorized parties 24 hours a day, seven days a week, every week of the year.

There is a caveat, however. Unless an agency activity goes 100 percent electronic it must continue to deal with incoming papers that will stay papers. This leads to mixed-media files, and they pose their own records management challenges - challenges that DMSs can help to meet, particularly when used in combination with records management products.

Can DMSs help my document exchange problems?

Document exchange has been largely a "push" activity, in the form of e-mail "attachments" or computer fax. This has been a terrific headache for many people trying to exchange word processing documents across agencies, so that the documents can be read and also entered into receivers' word processing systems for further use or revision.

DMSs enable the exchange of documents with a "pull" approach. In this approach, documents are put into the originator's DMS in multiple forms (called "renditions") and/or an Internet-standard-tagged form (probably the new XML), and the intended receivers or users are notified of the document availability, probably by a simple e-mail message saying, "Come and get it." Each user can then select the rendition that works for that user. Final documents might be in a rendition that preserves page appearance, images might be in one or more standard encodings, and textual documents undergoing revision might be in both a proprietary word processing package rendition and one that is a non-proprietary standard. Compound documents might include all of these, plus embedded spreadsheets, etc.

Certainly, there are many aspects to the operational use of this pull approach, especially dealing with communications and access. It may sound a bit unreal to business-people not accustomed to heavy Internet use, but it's old hat to scientists, engineers, and others who have already been using the "ftp" (file transfer protocol) Internet capability for at least a decade. As those users know, a file is put into an FTP Server and its specific address is given to the intended users who then can download it whenever they want. Each version of the file can have its own unique address.

The DMS-based pull approach will probably find its initial use in groupware environments, intranets, and extranets. Its broader use will be accelerated by the DMA interoperability standard which envisions being able to directly access a specific document with its unique address, in a very similar way to the World Wide Web's manner of accessing individual documents on Web servers.

Microsoft's product strategy includes building DMS features into coming NT releases. Will that do the job?

Microsoft targets the mass market, and we're seeing initial client-side appearances in the "Outlook" product, and also some document indexing and searching services in Web site products. What's anticipated in NT are modest server-side ("BackOffice") DMS functions at a modest price increment, suitable for a global server market that includes small businesses. Don't be surprised to see these functions tied to Microsoft's messaging, groupware and workflow capabilities. Large organizations will want much richer DMS functionality, flexibility, scalability, etc., for organization-wide use, and will be willing to pay the incremental cost. That's where the other COTS DMS products will find their niche for many years to come.

What's an example of where I'd want the richer DMS capabilities?

Branch versioning of compound documents is one easily understood through an example. An agency's investigator at Headquarters in Washington is leading a case effort with an investigator in the Chicago Regional office, in cooperation with an Illinois state government investigator. As part of the wrap-up, the Headquarters lead investigator drafts a final summary and report. It contains text, of course, plus photographs, drawings, images of bank checks, data tables, and embedded URLs to interviews and intercept transcripts that are stored on a secure intranet host in the agency. It's a "compound" document because the embedded objects - images, spreadsheets, etc., are themselves separate documents (objects) in the DMS. The draft report and all of its contents are confidential, highly sensitive, and will be subjected to challenge and cross-examination at trial.

The lead investigator sends the draft final report simultaneously to the regional and state investigators and invites their corrections. (That's the branching.) They review it independently and concurrently, with the regional investigator adding another embedded object while the state investigator notes a correction to a different embedded object. They send their recommended revisions back to the Headquarters lead investigator, who first "checks them in" (records them in the DMS) on their separate branches, and then melds them into a final consolidated version. The Headquarters DMS not only keeps track of all this, including the security aspects, but provides also the record-keeping environment needed in the legal arena by keeping the various versions, by recording the case audit trail, and by guaranteeing archival integrity.

What are the interoperability considerations here?

Headquarters, in this example, is where the final case files are brought together and maintained. It needs lots of sophistication and scalability. The Chicago regional office has only a LAN with a couple modest servers, and doesn't need all the bells, whistles and horsepower of the headquarters system. The Illinois environment is the responsibility of that state's IRM executives, independent of Washington. It's integrated with the overall IT architecture of the state, which is affected by such applications as taxes, driving licenses, roads, and health care reimbursements. Thus all three repositories in this example could be on different platforms, and all three could be using different DMSs, from different vendors. Yet, the cooperative mission activities require their interoperability. The interoperability needed is at least two-way between headquarters and region, and probably three-way to include the state case file. Each is a client to the others and each one's repository is a server to the others.

So what's the answer?

The DMS vendor community knows that interoperability is likely to affect the future success of many products, including such related ones as imaging, e-mail, voice-mail, groupware, workflow, and even printing. Industry knows that repositories - where and how things are stored and managed - will be the nexus of all of these. All know that interoperability comes only with standards.

ODMA

A few years ago, document management standards efforts were started at two levels. One was focused on a simple application programming interface (API) to let any kind of client interact with a DMS that also implemented the API, for the purpose of storing and retrieving files. Desktop applications like word processing and spreadsheet packages are on those clients and must interact with the DMS to store and retrieve the files created with those packages. In that sense, the DMS replaces the Windows file system/directory. With this standard, the client must know the specific design, construction, capabilities, etc., of the DMS in order to use it, including its proprietary document structuring, indexing, and query facility. Because all this knowledge is inside the client, the API itself is simple and inexpensive, yet so valuable because it makes the power of a proprietary DMS available to a wide range of desktop applications.

This API standard, called ODMA (for Open Document Management API) has been built into many different kinds of clients, and is used widely today. It can be viewed as a many-to-one standard, for many different clients to interact with each proprietary DMS in each DMS's own proprietary way. Because each client must be intimately knowledgeable in advance of each DMS with which it will interact, it does a portion of the interoperability job needed by our example, but falls far short of the whole job.

DMA

In parallel, a second, more ambitious standards effort was launched to create interoperability across different proprietary DMSs regardless of the platforms on which they reside and regardless of the networks in which they exist, and without requiring clients to have advance intimate DMS knowledge. The goal is to have uniform access to any document stored in any format, anywhere, at any time. This standard can be combined with the ODMA standard for inexpensive universal client access, and adds what's needed for completely vendor-independent cross-repository interoperability. It's called DMA, for the Document Management Alliance that's creating it, and is a middleware specification for what is truly many-to-many interoperability. That's many clients to many DMSs, regardless of platforms and networks. Because it accommodates international multi-language conventions, it's even language-independent.

Needless to say, the DMA effort is ambitious and sophisticated, because it means that any conforming client, including Web clients, can interact with any conforming DMS without having to know in advance the specific commands and characteristics of each DMS. It enables a client to use its own user interface and command set (look and feel) to store and retrieve objects from different-vendor DMSs, and to discover DMS characteristics when a request is first sent. There are specifications for objects, querying, versioning, containment, check-in/check-out, compound document support, content-based searching, and other aspects of repository management. Most of these are in the DMA 1.0 specification which was formally approved in December 1997. (None of these are included in the ODMA specification.) The priorities for the DMA 2.0 and later levels of the specification will be determined by ongoing user feedback.

Can you give me an analogy to clarify this?

Think of people, and today's file rooms or records centers that store lots of cabinets containing lots of different files holding tons of paper. The people coming into the rooms, either to store or retrieve, are the clients. The ODMA specification creates a big, well-lit door and entranceway through which anyone can enter, whether on foot, crutches or in a wheelchair, and regardless of gender, race, or nationality. However, to use the file room, ODMA expects that each will know before-hand the file scheme, the rules of the file room, and be able to read and understand the cabinet and file labeling. Each file room is unique, and ODMA expects each user to understand its uniqueness. Because it's an entranceway specification, the ODMA spec is simple and basic.

The DMA specification lets all those unique file rooms be used without requiring advance knowledge of each room, or even the ability to understand the language in which the labels are written. If three different state agencies were give an access to a Federal investigator, the DMA specification lets them say, "We three State agents are going to let a particular Federal agent (to whom we've given permission) use information in our three different States' file rooms without the agent having to know in advance how the contents are organized or labeled in the different file rooms, or even the procedures set up in the rooms." The Federal agent can use a single client computer program - either the Federal agency's proprietary client or an Internet browser - to use all the file rooms simultaneously, despite the differences among the rooms in their organization, labeling, and procedures. In effect, DMA lets the different file rooms look and operate the same way to the Federal agent despite their underlying differences.

Now that's interoperability! One can imagine its power in regulatory activities wherein a government regulatory agency would be able to access regulatees' documents while letting each have the freedom to architect its own document management environment. (Speaking of architecture, all the Federal Government agency IT architecture documents could be made directly accessible to all Federal agency CIOs despite the agencies operating on different platforms, with different office application packages, and using different DMS COTS products.) That's the promise of DMA.

Where do the Internet protocols and technologies play in all this?

The Internet explosion has contributed significantly to the rapid deployment of DMSs in the private sector. First, Internet technology has reduced the cost of deploying DMSs by enabling the use of Internet browsers instead of proprietary client software for end-user desktops, for organizations whose needs can be met in this way. Today, many DMS vendors offer support for both proprietary clients on private networks and also Internet connectivity through Web gateways.

Second, as organizations put documents on their intranets, it spotlights the need for internal control processes for managing the change status, recordkeeping, obsolescence, and disposal of the documents. Putting documents on the Internet's Web magnifies these concerns even more. DMSs are the way to manage document life-cycles, with or without internets, and the introduction of enterprise intranets spurs the need for the interoperability standards to bridge the systems with the intranet sites as well as with one another. Fortunately, the DMA specification meets this need for intranet/Internet host-DMS bridging.

In a related vein, the Internet engineering community is pursuing a method (WebDAV) by which Web pages created by one person in one location with one authoring tool could be revised by different persons in different locations using different authoring tools. Because the method includes "checking out" a Web page and tracking versions of Web pages, some have wondered whether this competes or conflicts with the DMA specification. Actually, the two are extremely complementary, and a collaboration has been established between the two communities of engineers to ensure that the standards align with one another so that users can benefit from them both, together.

Why be concerned with the DMA effort now, when I won't be buying products until I'm over the Year 2000 hurdle?

A very big reason for endorsing the DMA effort now is to accelerate benefits in the related application areas of groupware and workflow. Both of these are strong, getting stronger, and expected to play major roles in IT architectures. The Workflow Management Coalition of vendors and users has developed a standard reference model to be fleshed out, and along the way it will need to address the repository aspects of where workflow objects and processes are stored and retrieved. There will be powerful architectural benefits to user enterprises if those repositories conform to the DMA specification. Similarly, the adoption of the DMA interoperability specification will foster interoperability with groupware products. And as noted above, it also can foster interoperability with records management and archiving products. For all of these, the DMA agency architectural benefits will be powerful.

So the bottom line isn't just interoperability among DMSs, but also interoperability with groupware, workflow, imaging, printing, messaging, records management, and many other information handling products. If it's going to really work, that interoperability must begin where the information objects are stored and managed. The DMA 1.0 Specification is that beginning. Users can support it now in new system and architecture plans, to be reflected subsequently in RFP specifications.

How can I be sure that conforming products will be available to me when I want them?

Because vendors will only build what the market wants, if government CIOs want it in 2001 they must start now to send the message to the vendor community. The message has to be that their agencies will be requiring the specification in procurement actions, for product deliveries beginning in 1999, and that they anticipate communicating with the DMA to prioritize expanded functions for the next levels of the DMA specification.

Agencies that have established relationships with affected vendors can do as several private sector users are doing, namely informing their vendors that implementation of the specification will be required in future versions, releases, upgrades, etc., of their products. Agencies that are developing architectures probably will be identifying several interoperability specifications they expect to be seeking in their future product acquisitions, and the DMA specification can appear on their lists. Agencies that are conducting research can ask about availability of the DMA specification in any Requests for information that they issue.

Regardless where an agency stands for procurement of conforming products, it can express its needs and desires within the DMA as a user member. Participation in the DMA can be a way to influence both the delivery of conforming products and the enhancements made in future releases of the specification. The DMA vendors have already identified several candidate additions and improvements but cannot do them all at once. Users and marketplace feedback will set the priorities.



For more information about the DMA, including membership and technical materials on the 1.0 specification, see <http://www.aiim.org/dma>.