Introducing the Office (2. Open XML File Formats.
The Office XML Formats introduce or improve on many types of solutions involving documents that you can build. You can access the contents of an Office document in the new file formats by using any tool or technology capable of working with ZIP archives. You can then manipulate the document content using any standard XML processing techniques, or, for parts that exist as embedded native formats (such as images), process using any appropriate tool for that object type. In addition, being able to open the container file of a 2. Microsoft Office system document manually as a ZIP archive has some interesting benefits for developers.
For example, when building Office- based solutions, you can examine the contents and structure of a document without having to write any code. This facility can be very helpful in solution design and when building prototypes. After you are inside a 2. Microsoft Office system document, the structure makes it easy to navigate a document's parts and its relationships, whether it is to locate information, change content, or remove elements from a document. Having the use of XML, along with the published Office reference schemas, means you can easily create additional documents, add data to existing documents, or search for specific content in a body of documents. The rest of this article explores some scenarios in which Office XML Formats enable document- based solutions. These few are only part of an almost endless list of possibilities: Data Interoperability.
Content Manipulation. Content Sharing and Reuse. Document Assembly. Document Security. Managing Sensitive Information.
Introduction I often receive request from people to help them with cracking a particular binary file format. It seems that many software vendors are not willing to.
The following is a list of Microsoft Office filename extensions, used in Microsoft Office software suite as of January 2016. Word Legacy Legacy filename extensions.
Note; Do not confuse the Office XML Formats with the Microsoft Windows XML Paper Specification format. Office XML Formats use the Open Packaging Conventions, also. Recovery software for Word provides full preview of the Word file after scan process is completed. Three preview options – Full document, Filtered text and Raw text. Microsoft Word is a word processor developed by Microsoft. It was first released on October 25, 1983 under the name Multi-Tool Word for Xenix systems.
Document Styling. Document Profiling. Data Interoperability. The emergence of XML as a popular standard for data exchange means the new Office XML Formats make document- based data more accessible among heterogeneous systems. Whether users are sharing document data across a department, or two organizations are trading business data, XML as a default file format for Microsoft Office documents means Office applications can participate in business processes without the limitations previously imposed by the binary formats. The openness of the new file formats unlocks data and introduces a broad, new level of integration beyond the desktop. For example, you could refer to the published specification of the new file formats to create data- rich documents without using an Office application.
Server- side applications could process documents in bulk to enable large- scale solutions that mesh enterprise data within the familiar, flexible Office applications. You could use standard XML protocols, such as XPath (a common XML query language) and Extensible Stylesheet Language Transformations (XSLT) to retrieve data from documents or to update the contents inside a document from external data. One such scenario could involve personalizing thousands of documents to distribute to customers. You could insert information programmatically into a standard document template by using a server application that uses XML that you extracted from an enterprise database or customer relationship management (CRM) application. Creating these documents is highly efficient because there is no requirement to run Office applications; yet the capability still exists for producing high- quality, rich Office documents.
The use of custom schemas in Office is another way you can use documents to share data. Information that was once locked in a binary format is now easily accessible and, therefore, documents can serve as openly exchangeable data sources. Custom schemas not only make insertion or extraction of data simple, they also add structure to documents and are capable of enforcing data validation. Content Manipulation. Editing the contents of existing Office documents is another valuable example where Office XML Formats enhance a process. The edit could involve updating small amounts of data, swapping entire parts, removing parts, or adding new parts altogether.
By using relationships and parts, the new file formats make content easier to find and manipulate. The use of XML and XML schema means you can use common XML technologies, such as XPath and XSLT, to edit data within document parts in virtually endless ways. One scenario might involve the need to edit text in the header of a Word document.
Of course, it is not logical to automate that task for one document. But, in another scenario, what if a company merged and needed to update their new company name in the header of hundreds of different pieces of documentation? A developer could write code that loops through all the documents, locates the header part in the Word file structure, and performs an XPath query to find the old text. Then it could insert the new text, replace the header part, and repeat the process until every document is updated. Automation could save a lot of time, enable a process that might otherwise not be attempted, and prevent potential errors that might occur during a manual process.
Another scenario might be one in which an existing Office document must be updatedâ€”by changing only an entire part. In an Excel 2. 00. This kind of updating also applies to binary parts. You could swap an existing image, or even an OLE object, out for a new one, as necessary. You could update a Microsoft Office Visio drawing embedded as an OLE object in Office documents, for example, by overwriting that binary part. You could update URLs in hyperlinks to point to new locations. Following are some additional application- specific scenarios.
Content Manipulation in Word 2. It is a common business practice to incorporate "boilerplate" text inside a Word document. For example, an official legal disclaimer or a disclosure of terms and conditions can be required in every public document created by an organization. Another typical example of boilerplate is a "Company Overview" section that is used in authoring sales proposals or public releases of company announcements. Word offers features, such as Auto. Text, that are capable of accomplishing the insertion of formatted text, but this feature is limited in scale because it requires either Word automation or direct user interaction. Word 2. 00. 7 offers a very flexible alternative for you to insert content into a document.
The Word XML Format allows you to add document parts, called document building blocks, that are referred to by the overall document when it opens in Word. This means you can build a library of document building blocks, which you can derive from document formats that Word is capable of rendering, and programmatically reuse them as needed in Word document solutions. This broader ability to manipulate Word content offers some interesting scenarios, such as server- side document assembly. Going back to the example given previously, you can automatically insert a legal disclaimer into a document created on a server. Imagine a multinational company that requires that all of its documents contain a legal disclaimer in local languages.
The company could create the appropriate language- specific disclaimers as . An application that is constructing documents can insert the corresponding document fragment for the language required as a part inside the document container. This fragment is then rendered as a seamless part of a Word document. Content Manipulation in Excel 2. To optimize loading and saving performance and file size, Excel 2. Excel file. To do so, Excel implements a shared string table in a document part specified by the target of the http: //schemas. Shared. Strings relationship.
Each unique text value found within a workbook is listed once in this part. Individual worksheet cells then reference the string table to derive their values. While this process optimizes the Excel XML Format, it also introduces some interesting opportunities for additional content manipulation solutions.
Developers in a multinational organization could use the shared string table to offer a level of multilanguage support. Instead of building unique workbooks for each language supported, a single workbook could use string tables that correspond to different languages. Another possibility is to use string tables to search for keyword terms inside a collection of workbooks. Processing a single, text- only XML document of strings is faster and simpler than having to manipulate the Excel object model over many worksheets and workbooks. Content Manipulation in Power.
Point 2. 00. 7When a Power. Point 2. 00. 7 presentation is stored using the Power. Point XML Format, the content remains highly accessible. Because this is the first version of Power. Point to offer an XML format, it opens up many scenarios not possible in previous versions. You now have full access to slides and slide notes as text. Solutions that require searching, indexing, and creating presentation content are now possible.
You can easily produce data- driven presentations using XML. And, you can access slide masters and slide layouts through XML parts to programmatically format existing or new Power. Point presentations. You can take a different approach to assembling or reusing content from Power.
Point presentations by building an application that uses a catalog of slides stored independently of existing presentations. Slides are represented as individual XML parts, therefore, a solution can optimize the way an organization stores and manages Power. Point 2. 00. 7 slides as data. You can even write a slide "viewer" that allows a user to discover and select slides to build a presentation from outside Power. Point. The application can even be Web- based to allow centralized management. Content Sharing and Reuse.
The modularity of Office XML Formats opens up the possibility for generating content once and then repurposing it in a number of other documents. As a developer, you can imagine building a number of core templates and reusing portions as building blocks for other documents.