I’ve been working on Microsoft Word automation, Open Xml, Microsoft.Office.Interop.Word and Open Xml 2.0 SDK. In this blog I’ll focus on Content Controls and Open Xml 2.0 SDK from the experience I gained in last 2 months
In this blog I’ll discuss the points mentioned below
- Add Custom Xml part to WordprocessingDocument
- Get Custom Xml part from WordprocessingDocument
- Each content control contains a unique ID that is assigned by Word upon creation of the content control (Issues this may cause and how it can be handled)
- Convert in-memory Document to Bytes without saving to a File
Add Custom Xml part to WordprocessingDocument:
1. Get the MainDocumentPart
MainDocumentPart mainPart = doc.MainDocumentPart;
2. Define a root element for the Custom Xml part
string customXmlPartNamespace = “http://schemas.microsoft.com/Test.Sample”; string rootNodeName = “TestCoverageRoot”; XName rootName = XName.Get(rootNodeName, customXmlPartNamespace); XElement rootElement = new XElement(rootName);
3. The method displayed in code snippet below does the rest
public static CustomXmlPart AddCustomXmlPart(MainDocumentPart mainPart, XElement rootElement) { CustomXmlPart customXmlPart = mainPart.AddCustomXmlPart(CustomXmlPartType.CustomXml); using (StreamWriter sw = new StreamWriter(customXmlPart.GetStream())) { sw.Write(rootElement.ToString()); sw.Close(); } return customXmlPart; }
Get Custom Xml part from a WordprocessingDocument:
The code snippet displayed below assumes that namespace is unique for each CustomXml part. If this is true I just check for the root node namespace only as displayed below
string namespaceUri= “http://schemas.microsoft.com/Test.Sample”;public static CustomXmlPart GetCustomXmlPart(MainDocumentPart mainPart, string namespaceUri) { CustomXmlPart result = null; foreach (CustomXmlPart part in mainPart.CustomXmlParts) { using (XmlTextReader reader = new XmlTextReader(part.GetStream(FileMode.Open, FileAccess.Read))) { XmlNodeType nodeType = reader.MoveToContent(); bool exists = reader.NamespaceURI.Equals(namespaceUri); reader.Close(); if (exists) { result = part; break; } } } return result; }
Each content control contains a unique ID that is assigned by Word upon creation of the content control:
As every content control will have unique ID so you can associate that Content Control with a Custom Xml part and achieve a cool functionalities otherwise impossible through Custom Xml. But then everything has a negative side which may not affect in 95% of the cases but in 5% it may cause some issues. I’ll discuss about one of the issue I faced and then a approach that worked.
I was implementing a lot of Word automation related tasks e.g. copy/pasting content controls, merging documents having content controls and suddenly one of the test case while doing merge operation failed. When I drilled further I found that both the documents were having different Content Controls with same ID’s. So during merge (Library was using Microsoft.Office.Interop.Word 12.0) we are doing as displayed in code snippet below
//Using Microsoft.Office.Interop.Word 12.0, where Range can be Selection.Range, Document.Range etc. string fileName = "testFileToInsert.docx"; range.InsertFile(fileName, ref m_Missing, ref m_Missing, ref m_Missing, ref m_Missing);
In this scenario for any Controls having same ID in file we are inserting Word automatically assigns them a new ID to make the Control ID unique across the document. As we had Custom Xml parts associated to Content Controls in both the documents I was not able to map the data to the Custom Xml part now i.e. if duplicate Control ID’s are 10, 20 and Word now assigns 23356 and 45556 I was not able to figure out if 10 corresponds to 23356 or 45556. As I was not able to map previous Id to new Id I was not able to extract the information I had in Custom Xml part.
As I could not find any solution what I decided was to use the Tag property of Content Control. So instead of relying on Control ID I decided to assign a unique GUID for every content control and save that in the Tag property. The only drawback in this case is that if you set “ActiveDocument.ToggleFormsDesign = True” or “Design Mode” in Developer tab in MS Word is activated you will see those Tags now.
As I didn’t had any functional limitation (Developer mode was disabled) in this case I proceed with this solution.
In brief the solution was
- Get the Range from the Document where you want to insert the .docx file
- Read the Custom Xml part associated with the file to be inserted
- Call Range.InsertFile method
- From the Custom Xml part that you read in step 2 as per your business logic add data in the Custom Xml part associated with the Document (Range.Document) into which we inserted.
- As Tags were unique (GUID’s) so for any automatic rename that would had happened for duplicated it will not affect our functionality.
This issue may appear while doing Copy/Paste operations and the approach listed above may work.
Convert in-memory Document to Bytes without saving to a File:
Here I’ll list down one approach that worked in my case where I had to convert in-memory document to Bytes without saving to a File. This particular Document was loaded in some other module(process) using Microsoft.Office.Interop.Word 12.0 and from there we had to pass a byte stream without saving document to file.
The code snippet below is implemented in Open Xml 2.0 so for that I passed the Outer Xml of MainDocumentPart as string and this method returns me the byte array.
public static byte[] GetDocumentStream(string mainDocumentPartOuterXml) { byte[] output = null; if (string.IsNullOrEmpty(mainDocumentPartOuterXml)) { return output; } string packageNodeName = "pkg"; string packageUri = "http://schemas.microsoft.com/office/2006/xmlPackage"; string partNameSpaceUri = "http://schemas.microsoft.com/office/2006/xmlPackage"; XmlNamespaceManager namespaceManager = new XmlNamespaceManager(new NameTable()); namespaceManager.AddNamespace(packageNodeName, packageUri); XPathDocument xpathDocument = new XPathDocument(new StringReader(mainDocumentPartOuterXml)); XPathNavigator navigator = xpathDocument.CreateNavigator(); XPathNodeIterator iterator = navigator.Select("//pkg:part", namespaceManager); using (MemoryStream ms = new MemoryStream()) { using (Package pkg = Package.Open(ms, FileMode.Create)) { while (iterator.MoveNext()) { Uri partUri = new Uri(iterator.Current.GetAttribute("name", partNameSpaceUri), UriKind.Relative); if (pkg.PartExists(partUri)) pkg.DeletePart(partUri); PackagePart part = pkg.CreatePart( partUri , iterator.Current.GetAttribute("contentType", partNameSpaceUri)); XElement elem = XElement.Parse(iterator.Current.InnerXml); byte[] buffer = null; string elementToWrite = elem.FirstNode.ToString(); //Handled for Content Type = binaryData e.g. images //May need to handle for other content types if (elem.Name.LocalName.Equals("binaryData", StringComparison.OrdinalIgnoreCase)) { buffer = Convert.FromBase64String(elementToWrite); } else { buffer = Encoding.UTF8.GetBytes(elementToWrite); } part.GetStream().Write(buffer, 0, buffer.Length); } pkg.Flush(); pkg.Close(); } ms.Position = 0; output = new byte[(int)ms.Length]; ms.Read(output, 0, (int)ms.Length); ms.Flush(); ms.Close(); } return output; }
Summary:
Whatever solutions I have listed worked in my case, it may or may not work for some functional requirements. Also there may be better ways to implement the same which I did not find due to lack of time, lack of experience in MS Word automation etc. as I only worked for 2 months in OpenXml 2.0, Microsoft.Office.Interop.Word while migrating and application from Custom Xml to Content controls. I’m providing the reference that helped me a lot
References:
http://msdn.microsoft.com/en-us/library/ff433638(office.14).aspx
Good stuff Stuff Atul. Keep Posting.
ReplyDelete