sample to read text content in pdf

Questions, comments and suggestions concerning VintaSoft PDF .NET Plug-in.

Moderator: Alex

Post Reply
marco.malizia
Posts: 1
Joined: Wed Feb 27, 2013 5:13 pm

sample to read text content in pdf

Post by marco.malizia »

Hi

I need to read (in VB.NET application) TEXT CONTENT from a particular page and AREA from pdf files.
I try with this sample (below) but don't function (don't function well).
I have problem to know exact coordinates, i try with sample PdfReaderDemo, but the coordinates and resolutions do not corresponding well.
Any suggestions? Any way to suggest for reading text from a particular AREA form pdf?


Thanks.

Code: Select all

Private Sub Button1_Click(ByVal sender As System.Object, ByVal e As System.EventArgs) Handles Button1.Click

    Dim x1 As Int16, x2 As Int16, y1 As Int16, y2 As Int16
    Dim vTesto As String = ""
    Dim vArea As SizeF
    Dim vRect As RectangleF

    Try

        _fileStream = New FileStream(edPdf1.Text, FileMode.Open, FileAccess.Read)
        _document = PdfDocumentController.OpenDocument(_fileStream)

        vTesto = _document.Pages(0).TextRegion.TextContent
        edTxt1.Text = vTesto

        vArea = _document.Pages(0).GetPageSizeInPixels(_document.Pages(0).DefaultResolution)

        x1 = Convert.ToInt16(edX1.Text)
        x2 = Convert.ToInt16(edX2.Text)
        y1 = Convert.ToInt16(edY1.Text)
        y2 = Convert.ToInt16(edY2.Text)

        If x1 <> 0 Or x2 <> 0 Or y1 <> 0 Or y2 <> 0 Then
            vRect = New RectangleF(x1, y1, x2, y2)

             vTesto = _document.Pages(0).TextRegion.GetSubregion(vRect).TextContent

        End If


    Catch ex As Exception

    End Try
Alex
Site Admin
Posts: 2305
Joined: Thu Jul 10, 2008 2:21 pm

Re: sample to read text content in pdf

Post by Alex »

Hello,

You have 2 logical mistakes in your code.

First, you need convert coordinates from the image space to the page space before getting of text content.

Next, you need specify width and height of rectangle as third and fourth parameter in the RectangleF constructor.

Here is correct code:

Code: Select all

...
x1 = Convert.ToInt16(edX1.Text)
x2 = Convert.ToInt16(edX2.Text)
y1 = Convert.ToInt16(edY1.Text)
y2 = Convert.ToInt16(edY2.Text)

Dim points As Single() = {x1, x2, y1, y2}
_document.Pages(0).PointsToUnits(points, _document.Pages(0).DefaultResolution)

x1 = points(0)
x2 = points(1)
y1 = points(2)
y2 = points(3)

If x1 <> 0 Or x2 <> 0 Or y1 <> 0 Or y2 <> 0 Then
    vRect = New RectangleF(x1, y1, x2 - x1, y2 - y1)

    vTesto = _document.Pages(0).TextRegion.GetSubregion(vRect).TextContent
End If
...
Best regards, Alexander
Post Reply