Wednesday, May 5, 2010

Content Controls and Open Xml 2.0 SDK

 

I’ve been working on Microsoft Word automation, Open Xml, Microsoft.Office.Interop.Word and Open Xml 2.0 SDK. In this blog I’ll focus on Content Controls and Open Xml 2.0 SDK from the experience I gained in last 2 months

In this blog I’ll discuss the points mentioned below

  • Add Custom Xml part to WordprocessingDocument
  • Get Custom Xml part from WordprocessingDocument
  • Each content control contains a unique ID that is assigned by Word upon creation of the content control (Issues this may cause and how it can be handled)
  • Convert in-memory Document to Bytes without saving to a File


Add Custom Xml part to WordprocessingDocument:

1. Get the MainDocumentPart

MainDocumentPart mainPart = doc.MainDocumentPart;

2. Define a root element for the Custom Xml part

string customXmlPartNamespace = “http://schemas.microsoft.com/Test.Sample”;
string rootNodeName = “TestCoverageRoot”;
XName rootName = XName.Get(rootNodeName, customXmlPartNamespace);
XElement rootElement = new XElement(rootName);

3. The method displayed in code snippet below does the rest

public static CustomXmlPart AddCustomXmlPart(MainDocumentPart mainPart, XElement rootElement)
{
CustomXmlPart customXmlPart = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml);

using (StreamWriter sw = new StreamWriter(customXmlPart.GetStream()))
{
sw.Write(rootElement.ToString());
sw.Close();
}
return customXmlPart;
}

 

Get Custom Xml part from a WordprocessingDocument:

The code snippet displayed below assumes that namespace is unique for each CustomXml part. If this is true I just check for the root node namespace only as displayed below


string namespaceUri= “http://schemas.microsoft.com/Test.Sample”;
 
public static CustomXmlPart GetCustomXmlPart(MainDocumentPart mainPart, string namespaceUri) { CustomXmlPart result = null; foreach (CustomXmlPart part in mainPart.CustomXmlParts) { using (XmlTextReader reader = new XmlTextReader(part.GetStream(FileMode.Open, FileAccess.Read))) { XmlNodeType nodeType = reader.MoveToContent(); bool exists = reader.NamespaceURI.Equals(namespaceUri); reader.Close(); if (exists) { result = part; break; } } } return result; }

 

Each content control contains a unique ID that is assigned by Word upon creation of the content control:

As every content control will have unique ID so you can associate that Content Control with a Custom Xml part and achieve a cool functionalities otherwise impossible through Custom Xml. But then everything has a negative side which may not affect in 95% of the cases but in 5% it may cause some issues. I’ll discuss about one of the issue I faced and then a approach that worked.

I was implementing a lot of Word automation related tasks e.g. copy/pasting content controls, merging documents having content controls and suddenly one of the test case while doing merge operation failed. When I drilled further I found that both the documents were having different Content Controls with same ID’s. So during merge (Library was using Microsoft.Office.Interop.Word 12.0) we are doing as displayed in code snippet below

//Using Microsoft.Office.Interop.Word 12.0, where Range can be Selection.Range, Document.Range etc.
string fileName = "testFileToInsert.docx";
range.InsertFile(fileName, ref m_Missing, ref m_Missing, ref m_Missing, ref m_Missing);

In this scenario for any Controls having same ID in file we are inserting Word automatically assigns them a new ID to make the Control ID unique across the document. As we had Custom Xml parts associated to Content Controls in both the documents I was not able to map the data to the Custom Xml part now i.e. if duplicate Control ID’s are 10, 20 and Word now assigns 23356 and 45556 I was not able to figure out if 10 corresponds to 23356 or 45556. As I was not able to map previous Id to new Id I was not able to extract the information I had in Custom Xml part.

As I could not find any solution what I decided was to use the Tag property of Content Control. So instead of relying on Control ID I decided to assign a unique GUID for every content control and save that in the Tag property. The only drawback in this case is that if you set “ActiveDocument.ToggleFormsDesign = True” or “Design Mode” in Developer tab in MS Word is activated you will see those Tags now.

As I didn’t had any functional limitation (Developer mode was disabled) in this case I proceed with this solution.

In brief the solution was

  1. Get the Range from the Document where you want to insert the .docx file
  2. Read the Custom Xml part associated with the file to be inserted
  3. Call Range.InsertFile method
  4. From the Custom Xml part that you read in step 2 as per your business logic add data in the Custom Xml part associated with the Document (Range.Document) into which we inserted.
  5. As Tags were unique (GUID’s) so for any automatic rename that would had happened for duplicated it will not affect our functionality.

This issue may appear while doing Copy/Paste operations and the approach listed above may work.

Convert in-memory Document to Bytes without saving to a File:

Here I’ll list down one approach that worked in my case where I had to convert in-memory document to Bytes without saving to a File. This particular Document was loaded in some other module(process) using Microsoft.Office.Interop.Word 12.0 and from there we had to pass a byte stream without saving document to file.

The code snippet below is implemented in Open Xml 2.0 so for that I passed the Outer Xml of MainDocumentPart as string and this method returns me the byte array.

public static byte[] GetDocumentStream(string mainDocumentPartOuterXml)
{
byte[] output = null;

if (string.IsNullOrEmpty(mainDocumentPartOuterXml))
{
return output;
}

string packageNodeName = "pkg";
string packageUri = "http://schemas.microsoft.com/office/2006/xmlPackage";
string partNameSpaceUri = "http://schemas.microsoft.com/office/2006/xmlPackage";

XmlNamespaceManager namespaceManager = new XmlNamespaceManager(new NameTable());
namespaceManager.AddNamespace(packageNodeName, packageUri);
XPathDocument xpathDocument = new XPathDocument(new StringReader(mainDocumentPartOuterXml));
XPathNavigator navigator = xpathDocument.CreateNavigator();           
XPathNodeIterator iterator = navigator.Select("//pkg:part", namespaceManager);

using (MemoryStream ms = new MemoryStream())
{
using (Package pkg = Package.Open(ms, FileMode.Create))
{
while (iterator.MoveNext())
{
Uri partUri = new Uri(iterator.Current.GetAttribute("name", partNameSpaceUri), UriKind.Relative);

if (pkg.PartExists(partUri))
pkg.DeletePart(partUri);

PackagePart part = pkg.CreatePart(
partUri
, iterator.Current.GetAttribute("contentType", partNameSpaceUri));

XElement elem = XElement.Parse(iterator.Current.InnerXml);

byte[] buffer = null;
string elementToWrite = elem.FirstNode.ToString();

//Handled for Content Type = binaryData e.g. images
//May need to handle for other content types
if (elem.Name.LocalName.Equals("binaryData", StringComparison.OrdinalIgnoreCase))
{
buffer = Convert.FromBase64String(elementToWrite); 
}
else
{                           
buffer = Encoding.UTF8.GetBytes(elementToWrite);
}

part.GetStream().Write(buffer, 0, buffer.Length);
}
pkg.Flush();
pkg.Close();
}
ms.Position = 0;
output = new byte[(int)ms.Length];
ms.Read(output, 0, (int)ms.Length);
ms.Flush();
ms.Close();
}
return output;
}
 

Summary:

Whatever solutions I have listed worked in my case, it may or may not work for some functional requirements. Also there may be better ways to implement the same which I did not find due to lack of time, lack of experience in MS Word automation etc. as I only worked for 2 months in OpenXml 2.0, Microsoft.Office.Interop.Word while migrating and application from Custom Xml to Content controls. I’m providing the reference that helped me a lot

References:

http://msdn.microsoft.com/en-us/library/ff433638(office.14).aspx

http://blogs.technet.com/gray_knowlton/archive/2010/01/15/associating-data-with-content-controls.aspx

1 comment: