Printed documents may be saved in various digital formats and the most popular amongst them are PDFPortable Document Format, followed by OpenXPSOpen XML Paper Specification. Both are standardized, capable of storing of any static content, offer basic interactivity and support for digital signing. However, what’s the difference between them? And why use one over another?

The focus of this article is to explain differences between the standards from technical point of view. For more general information please see relevant Wikipedia article.

PDF File

PDF

PDF was first introduced in 1993 by Adobe Systems as a way to enable document sharing among users working on different platforms with mutually incompatible application software. Since 1994 Adobe offered two software applications to work with PDFs – Adobe Acrobat to read and create documents and Adobe Reader only to read. The first mentioned at a price, the latter for free. In 2008 the PDF version 1.7 became an ISO‘s standard ISO 32000-1:2008 and thereafter further development is conducted by ISO’s workgroup with Adobe Systems’ participation.

The PDF is the most widely used and accepted format by governments, enterprises and consumers, offers more dynamic functionality than just static document description and is readable almost on any platform using official and free reader applications.

Technically, PDF files are text files interlaced with binary parts, describing a collection of objects organized into a tree structure where each node of the tree is an object. There are 8 classes of these objects: boolean values, numbers, strings, arrays, dictionaries, name objects and streams. These objects contain metadata and data that describe the document. PDF files contain four sections: header, body, cross reference table and trailer. We’ll explain their contents in our Hello World.pdf example:

Header section contains only simple version information:

%PDF-1.4

Body section contains objects that describe the document. An object can be defined either on global level or inside another object’s definition. If defined globally, it’s description begins with object’s ID, VersionNumber and obj keyword. Object type is defined by syntax. For example: <<key value key value>> represents a dictionary (key-value pairs), /name is a name object, [value value value] is an array and ID VersionNumber R block is a reference to another object. For more information about PDF file syntax, I recommend a series of articles by IDR solutions from which I borrowed this example.

First we define a catalog (object 1) containing a page tree (object 2), containing a single page (object 3). The defined page (object 3) is set to be of size 500×800, to contain resources (object 4) and contents (object 6).

1 0 obj <</Type /Catalog /Pages 2 0 R>>
endobj
2 0 obj <</Type /Pages /Kids [3 0 R] /Count 1>>
endobj
3 0 obj<</Type /Page /Parent 2 0 R /Resources 4 0 R /MediaBox [0 0 500 800] /Contents 6 0 R>>
endobj

Now we have to specify the resources (object 4). We add a single font resource (object 5) identified as “F1”. The font is just referenced in this case, not embedded, which will cause problems if the document is displayed on a computer that doesn’t have the font installed.

4 0 obj<</Font <</F1 5 0 R>>>>
endobj
5 0 obj<</Type /Font /Subtype /Type1 /BaseFont /Helvetica>>
endobj

Finally we add the “Hello World!” string to the page (object 3) in it’s contents stream (object 6) using font F1 (object 5) on position 175,720.

6 0 obj
<</Length 44>>
stream
BT /F1 24 Tf 175 720 Td (Hello World!)Tj ET
endstream
endobj

Cross reference table, third part of a PDF document, contains a list of all objects in the document (including some helper entries) with their byte positions from the beginning of the document and with their versions. This enables quick jumping to a desired object without having to traverse the file.

xref
0 7
0000000000 65535 f
0000000009 00000 n
0000000056 00000 n
0000000111 00000 n
0000000212 00000 n
0000000250 00000 n
0000000317 00000 n

Trailer, the final part of a PDF document, contains some information about the PDF file – e.g. number of objects or byte position of the Cross reference table from the beginning of the document.

trailer <</Size 7/Root 1 0 R>>
startxref
406
%%EOF

The Hello World.pdf file can be downloaded here.

OpenXPS File

OpenXPS

OpenXPS (OXPS) was developed by Microsoft Corporation since 2003 as a unified printing spool file format and as an alternative to Adobe’s (at the time) proprietary PDF. The format was first introduced in 2006 in Windows Vista as Microsoft XPS (MSXPS). Since then, Microsoft offered free MSXPS virtual printer drivers, viewer programs and APIs for all it’s OSs starting from Windows XP. In 2009 the format was standardized by ECMA International as ECMA-388 with light modifications under the name OpenXPS. The modifications caused incompatibility with the former Microsoft XPS format and its software. Support for OpenXPS is, however, built-into Windows 8 and is planned for future versions.

OpenXPS format is used or accepted by some government agencies and enterprises. The advantages of OpenXPS over PDF have weakened over time (most notably, the support by Microsoft Office is virtually equivalent in the latest version of the suite), but some still persist – since the format uses standard XML technologies, it’s easier to use for developers, especially those who work with Microsoft technologies, and the presence of free official print drivers is also a notable advantage.

Documents in OpenXPS format are based on Open Packaging Conventions (OPC) – a container-file technology that is part of Office Open XML (OOXML) standards (ECMA-376 and ISO/IEC 29500). Usage of OPC means that OpenXPS files are in fact ordinary ZIP files containing collections of XML and non-XML files. Any OpenXPS file can therefore be simply opened by changing it’s extension to “.zip”. The content of OpenXPS files (metadata files, folders, content-description files, resource files) is defined by both OPC and OpenXPS standards. Documents and their contents are described using a XAML-based syntax, while resources like images or fonts are attached as referred files.

Creating a Hello world in OpenXPS isn’t as simple as creating a single file. Both OPC and OpenXPS prefer division of information into separate files within a ZIP archive. Even creation of one simple page therefore requires us to create a bunch of files. Our Hello World.oxps ZIP archive will contain the following files:

/[Content_Types].xml
/FixedDocumentSequence.fdseq
/_rels/.rels
/Documents/1/FixedDocument.fdoc
/Documents/1/Pages/1.fpage
/Documents/1/Pages/_rels/1.fpage.rels
/Documents/1/Pages/Resources/Fonts/3FB560F5-A56A-4FC7-A3F8-3E5AE1DB8896.odttf

The embedded font is obfuscated and has a GUID name. This is currently necessary for OpenXPS Viewer program to open OpenXPS files without error. According to the OpenXPS standard, however, it should not be required and non-obfuscated fonts should be possible to use.

Description of Hello World.oxps package files follows:

/[Content_Types].xml [OPC standard] Contains explicit specification of used extensions’ content types. In our case it specifies types for OpenXPS files and two other extensions (rels, odtff).

<?xml version="1.0" encoding="UTF-8"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml" />
<Default Extension="fdseq" ContentType="application/vnd.ms-package.xps-fixeddocumentsequence+xml" />
<Default Extension="fdoc" ContentType="application/vnd.ms-package.xps-fixeddocument+xml" />
<Default Extension="fpage" ContentType="application/vnd.ms-package.xps-fixedpage+xml" />
<Default Extension="odttf" ContentType="application/vnd.ms-package.obfuscated-opentype" />
</Types>

/_rels/.rels [OPC standard] This is the first file usually opened by an OPC parser as it contains references to package-level files and specification of their types. In our case it contains reference to FixedDocumentSequence.fdseq which is the root file of OpenXPS format.

<?xml version="1.0" encoding="UTF-8"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="R0" Type="http://schemas.openxps.org/oxps/v1.0/fixedrepresentation" Target="FixedDocumentSequence.fdseq" />
</Relationships>

/FixedDocumentSequence.fdseq [OpenXPS standard] Contains a list of specification files of documents contained in the OpenXPS package.

<?xml version="1.0" encoding="UTF-8"?>
<FixedDocumentSequence xmlns="http://schemas.openxps.org/oxps/v1.0">
<DocumentReference Source="Documents/1/FixedDocument.fdoc" />
</FixedDocumentSequence>

/Documents/1/FixedDocument.fdoc [OpenXPS standard] Contains a list of page specification files of a document.

<?xml version="1.0" encoding="UTF-8"?>
<FixedDocument xmlns="http://schemas.openxps.org/oxps/v1.0">
<PageContent Source="Pages/1.fpage" />
</FixedDocument>

/Documents/1/Pages/1.fpage [OpenXPS standard] Contains a page content specification. In our case it places a “Hello World” text on an A4 page. The used font must be always included in the OpenXPS package and referenced in the FontUri attribute.

<FixedPage Width="793.76" Height="1122.56" xmlns="http://schemas.openxps.org/oxps/v1.0" xml:lang="en">
<Glyphs Fill="#ff000000"
FontUri="../Resources/Fonts/3FB560F5-A56A-4FC7-A3F8-3E5AE1DB8896.odttf"
FontRenderingEmSize="20"
OriginX="100"
OriginY="100"
UnicodeString="Hello World!" />
</FixedPage>

/Documents/1/Pages/_rels/1.fpage.rels [OPS standard] Contains information about dependencies of the 1.fpage file. In our case it specifies that the page depends on the used font file.

<?xml version="1.0" encoding="UTF-8"?>
<Relationships xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="R0" Type="http://schemas.openxps.org/oxps/v1.0/required-resource" Target="../Resources/Fonts/3FB560F5-A56A-4FC7-A3F8-3E5AE1DB8896.odttf" />
</Relationships>

The Hello World.oxps file can be downloaded here.

Advertisements