I have problem with a PDF document. The documents contains some text-elements which are formatted with a special font (i.e. Tahoma).
I am iterating over the pages and textlines to check every symbol for beeing formatted with this font and attach those characters to a StringBuilder. the idea behind is to extract information from the document for further processing.
Here is what I see in any kind of PDF Viewer (this also includes the VintaSoft Demo Applications):
&Field1:608121 &Field64:01.07.2010 &Field3:12.286,75
I am using the following code to extract the required stuff:
Code: Select all
var pdf = new PdfDocument(file);
var sb = new StringBuilder();
for (int iPage = 0; iPage < pdf.Pages.Count; iPage++)
{
var page = pdf.Pages[iPage];
foreach (var textRegionLine in page.TextRegion.Lines)
foreach (var symbol in textRegionLine.Symbols)
{
//Compare fonts with allowed ones and add the symbol to the StringBuilder
}
}
pdf.Dispose();
pdf.ClearCache();
&Field1:608121 &Field64:01.07.2010 &Field3:12 286 75
.
,
Any suggestions? Any ideas?
Thanks,
Sebastian