# Word: cutting a Word file

## Initial Request

The initial request which led me to reflect on this subject came from a colleague.

My colleague receives a large Word document from his ERP containing on each page a letter for a different recipient.

In order to be able to integrate it into EDM (Electronic Document Management) he had the following needs:

• Cut the large file into several files (1 page = 1 file)
• Retrieve data from documents to name the PDF
• Save new documents in PDF

Not knowing much about manipulating Word via PowerShell, I cut my thinking, as usual, into small steps.

## Processing the request

### Opening and cutting the file

The first step is therefore to open the Word document in question. To do this, instantiate a Word object and open the document

$word = New-Object -ComObject word.application$word.Visible = $True$doc = $word.Documents.Open($inputFile)


I leave “$word.Visible” at True for debugging time then I would put it in False so that the processing is completely transparent.$ImputFile is a variable that contains the path to my Word file to process.

Once the document is open, we get to the heart of the matter: How to split this document from X pages into X documents of one page?

I searched a lot on the net before finding a solution which, perhaps is not the most elegant, but which has the merit of working ;-).

The principle is quite simple in fact: we take one page after another of the document and we copy / paste it into a new document that we save.

$pages =$doc.ComputeStatistics([Microsoft.Office.Interop.Word.WdStatistic]::wdStatisticPages)
$rngPage =$doc.Range()

for ($i = 1;$i -le $pages;$i += $pageLength) { [Void]$word.Selection.GoTo([Microsoft.Office.Interop.Word.WdGoToItem]::wdGoToPage,

[Microsoft.Office.Interop.Word.WdGoToDirection]::wdGoToAbsolute,

$i #Starting Page )$rngPage.Start = $word.Selection.Start [Void]$word.Selection.GoTo([Microsoft.Office.Interop.Word.WdGoToItem]::wdGoToPage,

[Microsoft.Office.Interop.Word.WdGoToDirection]::wdGoToAbsolute,

$i +$pageLength #Next page Number

)

$rngPage.End =$word.Selection.Start

$marginTop =$word.Selection.PageSetup.TopMargin
$marginBottom =$word.Selection.PageSetup.BottomMargin
$marginLeft =$word.Selection.PageSetup.LeftMargin
$marginRight =$word.Selection.PageSetup.RightMargin

$rngPage.Copy()$newDoc = $word.Documents.Add()$word.Selection.PageSetup.TopMargin = $marginTop$word.Selection.PageSetup.BottomMargin = $marginBottom$word.Selection.PageSetup.LeftMargin = $marginLeft$word.Selection.PageSetup.RightMargin = $marginRight$word.Selection.Paste() # Now we have our new page on a new doc
$word.Selection.EndKey(6, 0) #Move to the end of the file$word.Selection.TypeBackspace() #Seems to grab an extra section/page break
$word.Selection.Delete() #Now we have our doc down to size }  I would not go into detail but the code is quite understandable. I got this piece of code and adapted it a bit to my needs. I admit that I don't necessarily understand everything he does (but he does :-)) We will see the recording a little later in the rest of the article ### Retrieve the name of the file in the Word document Each new Word document created corresponds to a letter for a recipient. In the case of a letter for tenants, the unique number that I have to recover in the Word document and the lease number which is in the form of a series of 10 numbers always starting with 0. To find this number I used Regex. $FileNamePattern = ".*de bail.*(0\d{9})"
$regex = [Regex]::Match($rngPage.Text, $fileNamePattern) if ($regex.Success) {
$id =$regex.Groups[1].Value
}
else {
$id = "patternNotFound" +$i
}


I define my pattern which searches anywhere in the document for the string “lease” followed by any character and then followed by a group of 9 numbers preceded by a 0.

I apply my pattern and I get the result in the $regex variable. If all goes well I get the lease number in the$ id variable with which I would form my document name.

### Save each document as PDF

$path =$outputPath + $id + ".pdf"$newDoc.saveas([ref] $path, 17)$newDoc.close([ref]$False) $OutputPath is a variable that contains the path to the destination directory for my PDF files.

Here is the entire function

function Convert-Docx2Pdf {

[CmdletBinding()]
param (
[Parameter(Mandatory = $False)][string]$FileNamePattern = ".*de bail.*(0\d{9})",
[Parameter(Mandatory = $False)][string]$pageLength = 1,
[Parameter(Mandatory = $true)][string]$InputFile ,
[Parameter(Mandatory = $False)][string]$outputPath = $env:temp + "\Outputdir\" ) BEGIN { if (Test-Path$outputPath) {
Remove-Item -Path $outputPath -Recurse -Force -Confirm:$false
}
New-Item -Path $outputPath -ItemType Directory -Force -Confirm:$false
}

PROCESS {
$word = New-Object -ComObject word.application$word.Visible = $False$doc = $word.Documents.Open($inputFile)

$pages =$doc.ComputeStatistics([Microsoft.Office.Interop.Word.WdStatistic]::wdStatisticPages)

$rngPage =$doc.Range()

for ($i = 1;$i -le $pages;$i += $pageLength) { [Void]$word.Selection.GoTo([Microsoft.Office.Interop.Word.WdGoToItem]::wdGoToPage,

[Microsoft.Office.Interop.Word.WdGoToDirection]::wdGoToAbsolute,

$i #Starting Page )$rngPage.Start = $word.Selection.Start [Void]$word.Selection.GoTo([Microsoft.Office.Interop.Word.WdGoToItem]::wdGoToPage,

[Microsoft.Office.Interop.Word.WdGoToDirection]::wdGoToAbsolute,

$i +$pageLength #Next page Number

)

$rngPage.End =$word.Selection.Start

$marginTop =$word.Selection.PageSetup.TopMargin

$marginBottom =$word.Selection.PageSetup.BottomMargin

$marginLeft =$word.Selection.PageSetup.LeftMargin

$marginRight =$word.Selection.PageSetup.RightMargin

$rngPage.Copy()$newDoc = $word.Documents.Add()$word.Selection.PageSetup.TopMargin = $marginTop$word.Selection.PageSetup.BottomMargin = $marginBottom$word.Selection.PageSetup.LeftMargin = $marginLeft$word.Selection.PageSetup.RightMargin = $marginRight$word.Selection.Paste() # Now we have our new page on a new doc

$word.Selection.EndKey(6, 0) #Move to the end of the file$word.Selection.TypeBackspace() #Seems to grab an extra section/page break

$word.Selection.Delete() #Now we have our doc down to size #Get Name$regex = [Regex]::Match($rngPage.Text,$fileNamePattern)

if ($regex.Success) {$id = $regex.Groups[1].Value } else {$id = "patternNotFound" + $i }$path = $outputPath +$id + ".pdf"

$newDoc.saveas([ref]$path, 17)

$newDoc.close([ref]$False)
}
}

END {
[gc]::collect()
[gc]::WaitForPendingFinalizers()
}
}