Featured Post

SQL Query in SharePoint

The "FullTextSqlQuery" object's constructor requires an object that has context but don't be fooled. This context will no...

Showing posts with label beginners. Show all posts
Showing posts with label beginners. Show all posts

Thursday, December 5, 2013

Understanding word style manipulation

To edit your document's styles you can click on the bottom right hand corner of the styles section on the home page.
When manipulating styles the most important thing to note are where styles are referenced. All styles are referenced by id so if you want to change the properties of a style this is quite a simple change and can be done on the specific style node applicable.
So the most obvious place this style could be referenced is inside the document and any headers/footers as you can see in the image below I have a paragraph with the text "Coding Recipes" that is linked to "Demo Style"

There are two other places this style could be referenced. If you look at the style xml is has a "w:basedOn" node pointing to the "Normal" style. It is possible to base a style on any other style so this is another place it could be referenced. The last place is inside the numbering but that is only if your style is linked to a custom number.

By adding a number to my style you can see that the style xml now has some numbering information.
If you used a built in number for your style than you wouldn't have to worry about any other references however if your style was linked to a custom number then a numbering element would get created in numbering.xml and this would have a reference to your style as well as shown in the image below
I'll explain more about numbering in my next post but if you are wondering how the two are referenced you can determine this by finding the w:num element in numbering.xml that has the id "1" (referenced in the w:numid node of the style). You'll then see that numid "1" is linked to abstractNumId "0" which you can see in the image above.



Tuesday, December 3, 2013

Understanding basic office document manipulation

This post for anyone who wants to know on a very base level how office documents work regarding their xml structure.

So the first thing you should know is that all office documents are actually a collection of xml files. The names of the files and folders inside this collection can be different depending on what type of office document you are working with (word, powerpoint, excel, etc). However in this tutorial we are going to be working with a word document.

Opening a very basic word document using a decompression tool (I used WinRAR in this example) you will see the following contents.

the _rels folder contains all the relationship files. These relationships are used to map xml files together.




So if you go into the word folder you will see a few different xml files.

If you go inside the _rels folder you will find a corresponding xml file to document.xml called document.xml.rels.






I added a chart to my document and you can see some new folders have been added. Opening document.xml you will see the following xml structure.

The main node is "document" and directly underneath this you can find the "body". This is the standard structure for this file. The body will mainly contain paragraphs (w:p) however there will be a few other nodes as you can see at the bottom there is a "w:sectPr" node. This is a section property node which contains information about the page (size, margin, columns, header, footer, etc). This node will always be found at the bottom of the body node. If you insert a section break inside your document then you will find other nodes like this inside the body node.

In this example I have inserted a chart. What this has done is inserted a w:drawing element which contains information about the chart. The data for the actual chart image however is stored elsewhere. To reference this data there is a r:id node on the c:chart element with a value of "rId5". If I then open the document.xml.rels file I can then see this id then points to the file charts/chart1.xml
So when you open this file in word it will deserialize these xml files into COM objects and show the document. If the xml markup does not correspond to the objects then you will get a corruption error in word. For example if I delete the charts folder and try to open the file I will get the following error:
You could also modify the contents of the xml file manually by extracting it from the docx modifing the contents and dragging it back into WinRAR. This is quite handy when you are trying to troubleshoot.