A little bit of POSH for document conversion


This is a completely non-SQL Server post. I had to recently convert a large number of word documents into Web Archive (.MHT) format. I would not mind doing that for one or two documents but when I have over 50 documents to perform this exercise on, it could get cumbersome and monotonous! This is where PowerShell came to the rescue.

The most common example available on the web is to convert Word documents to PDF. What I needed for the work that I was doing was a way to convert the Word documents to the Web Archive format. After a few Bing searches, I was able to determine that the Web Archive format enumeration number was 9. The following link has information about all the enumeration values: http://msdn.microsoft.com/en-us/library/office/bb238158(v=office.12).aspx

The PowerShell script below allows you to traverse a folder, pick all the word documents in the folders recursively and then convert each of those word documents in a .MHT file with the same name in the same location.

The script can be downloaded from OneDrive also.


<#

#################################################################################
    
Script Name: ConvertToWord                        
    Author: Amit Banerjee                            
    Date: April 28, 2014                            
    
Description:                                 
    This script takes a folder as an input and then converts the docx files present in the folder to web archive documents        
#################################################################################

This Sample Code is provided for the purpose of illustration only and is not 
intended to be used in a production environment. THIS SAMPLE CODE AND ANY 
RELATED INFORMATION ARE PROVIDED "AS IS" WITHOUT WARRANTY OF ANY KIND, EITHER 
EXPRESSED OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE IMPLIED WARRANTIES OF 
MERCHANTABILITY AND/OR FITNESS FOR A PARTICULAR PURPOSE. We grant You a 
nonexclusive, royalty-free right to use and modify the Sample Code and to 
reproduce and distribute the object code form of the Sample Code, provided that 
You agree: (i) to not use Our name, logo, or trademarks to market Your software 
product in which the Sample Code is embedded; (ii) to include a valid copyright 
notice on Your software product in which the Sample Code is embedded; and (iii) 
to indemnify, hold harmless, and defend Us and Our suppliers from and against 
any claims or lawsuits, including attorneys fees, that arise or result from the 
use or distribution of the Sample Code.

    The enumeration for the various document types that you can save as from Microsoft Word when you use the SaveAs option
    Const wdFormatDocument                    =  0
    Const wdFormatDocument97                  =  0
    Const wdFormatDocumentDefault             = 16
    Const wdFormatDOSText                     =  4
    Const wdFormatDOSTextLineBreaks           =  5
    Const wdFormatEncodedText                 =  7
    Const wdFormatFilteredHTML                = 10
    Const wdFormatFlatXML                     = 19
    Const wdFormatFlatXMLMacroEnabled         = 20
    Const wdFormatFlatXMLTemplate             = 21
    Const wdFormatFlatXMLTemplateMacroEnabled = 22
    Const wdFormatHTML                        =  8
    Const wdFormatPDF                         = 17
    Const wdFormatRTF                         =  6
    Const wdFormatTemplate                    =  1
    Const wdFormatTemplate97                  =  1
    Const wdFormatText                        =  2
    Const wdFormatTextLineBreaks              =  3
    Const wdFormatUnicodeText                 =  7
    Const wdFormatWebArchive                  =  9
    Const wdFormatXML                         = 11
    Const wdFormatXMLDocument                 = 12
    Const wdFormatXMLDocumentMacroEnabled     = 13
    Const wdFormatXMLTemplate                 = 14
    Const wdFormatXMLTemplateMacroEnabled     = 15
    Const wdFormatXPS                         = 18

#>

# Replace with the correct folder path
# Remove the -recurse option if you only want to convert the documents in the first level folders
# Retrieve the list of documents
$Files = Get-ChildItem "C:\Windows\*.docx" -recurse 

foreach ($File in $Files)
{
    # Create the name of the new document
    $Name = $File.FullName.replace(“docx”,”mht”)
    
    if (Test-Path $Name)
    {
        # Check if the file already exists
        # If it does then do not do anything
        Write-Host "Skipping conversion for " $Name  
    }
    else 
    {
        # Save the file as a web archive if it does not exist
        Write-Host "Creating file " $Name
        ConvertToMHT $File.FullName
    }
}

# Function to convert the file
function ConvertToMHT ($FileName)
{        
    # Create a word document object
    $Word=NEW-OBJECT –COMOBJECT WORD.APPLICATION
    # Open the word document
    $Doc=$Word.Documents.Open($FileName)
            
    # Replace with appropriate document format type using the enumeration provided above in the comments
    # Save the document in the required format in the same location
    [ref]$SaveFormat = "System.Object" -as [type]
    $Doc.saveas([ref] (($FileName).replace(“docx”,”mht”)),  [ref]9)
    # Quit word after closing the document
    $Doc.close()
    $Word.Application.Quit()
}

Reference:
http://blogs.technet.com/b/heyscriptingguy/archive/2013/03/24/weekend-scripter-convert-word-documents-to-pdf-files-with-powershell.aspx

Advertisements

It is always good to hear from you! :)

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s