Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4

Questions, comments and suggestions concerning VintaSoft Imaging .NET SDK.

Moderator: Alex

Post Reply
David_karlsson
Posts: 8
Joined: Mon Jan 14, 2019 12:55 pm

Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4

Post by David_karlsson »

Hi
Does Vintasoft.Imaging.Ocr.Tesseract plugin use Tesseract 4? If not, is it possible to do so?

Can I use multiple languages? In tesseract we can use eng+latin etc.
Alex
Site Admin
Posts: 2305
Joined: Thu Jul 10, 2008 2:21 pm

Re: Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4

Post by Alex »

Hi David,
David_karlsson wrote: Mon Jan 14, 2019 3:47 pm Does Vintasoft.Imaging.Ocr.Tesseract plugin use Tesseract 4? If not, is it possible to do so?
Current version of Vintasoft OCR .NET Plugin uses Tesseract OCR 3.04. We plan to use Tesseract OCR 4 in near time.

David_karlsson wrote: Mon Jan 14, 2019 3:47 pm Can I use multiple languages? In tesseract we can use eng+latin etc.
Please read how to recognize text in two languages here:
https://www.vintasoft.com/docs/vsimagin ... uages.html

Best regards, Alexander
David_karlsson
Posts: 8
Joined: Mon Jan 14, 2019 12:55 pm

Re: Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4

Post by David_karlsson »

Current version of Vintasoft OCR .NET Plugin uses Tesseract OCR 3.04. We plan to use Tesseract OCR 4 in near time.
Near time? 1 month ? 1 year?
Please read how to recognize text in two languages here:
https://www.vintasoft.com/docs/vsimagin ... uages.html
I have already read the documentation. In the documentation it says how to OCR interpret different sections of a pdf with different languages.
In tesseract to do a better interpretation of same section (page) it is possible to combine different languages ex. eng+deu.
It can even be used with multiple languages traineddata at a time eg. English and German:
tesseract myscan.png out -l eng+deu
https://github.com/tesseract-ocr/tesseract/wiki
Alex
Site Admin
Posts: 2305
Joined: Thu Jul 10, 2008 2:21 pm

Re: Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4

Post by Alex »

David_karlsson wrote: Mon Jan 14, 2019 9:52 pm Near time? 1 month ? 1 year?
Tesseract 4 will be available in version 8.7.2 in 2 months.

David_karlsson wrote: Mon Jan 14, 2019 9:52 pm I have already read the documentation. In the documentation it says how to OCR interpret different sections of a pdf with different languages.
In tesseract to do a better interpretation of same section (page) it is possible to combine different languages ex. eng+deu.

It can even be used with multiple languages traineddata at a time eg. English and German:
tesseract myscan.png out -l eng+deu
https://github.com/tesseract-ocr/tesseract/wiki
Thank you for information. We will analyze information and will try to provide the best solution.


Best regards, Alexander
David_karlsson
Posts: 8
Joined: Mon Jan 14, 2019 12:55 pm

Re: Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4

Post by David_karlsson »

Perfect. I will happily wait for release of Tesseract 4 plugin.
David_karlsson
Posts: 8
Joined: Mon Jan 14, 2019 12:55 pm

Re: Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4

Post by David_karlsson »

Hi !
Tesseract 4 will be available in version 8.7.2 in 2 months.
Is there any preview of version 8.7.2 ? We have started to develop our system. I need Vintasoft.Imaging.Ocr.Tesseract API for tesseract 4.
Is it possible to access Vintasoft.Imaging.Ocr.Tesseract 8.7.2 in advance?
Alex
Site Admin
Posts: 2305
Joined: Thu Jul 10, 2008 2:21 pm

Re: Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4

Post by Alex »

Hi David,
David_karlsson wrote: Mon Feb 04, 2019 10:12 pm
Tesseract 4 will be available in version 8.7.2 in 2 months.
Is there any preview of version 8.7.2 ? We have started to develop our system. I need Vintasoft.Imaging.Ocr.Tesseract API for tesseract 4.
Is it possible to access Vintasoft.Imaging.Ocr.Tesseract 8.7.2 in advance?
I think preview version will be available in 2 weeks.

Best regards, Alexander
Alex
Site Admin
Posts: 2305
Joined: Thu Jul 10, 2008 2:21 pm

Re: Vintasoft.Imaging.Ocr.Tesseract and Tesseract 4

Post by Alex »

Hi David,
David_karlsson wrote: Mon Feb 04, 2019 10:12 pm
Tesseract 4 will be available in version 8.7.2 in 2 months.
Is there any preview of version 8.7.2 ? We have started to develop our system. I need Vintasoft.Imaging.Ocr.Tesseract API for tesseract 4.
Is it possible to access Vintasoft.Imaging.Ocr.Tesseract 8.7.2 in advance?
Version 8.7.2.1 has been released today. In this version the used Tesseract OCR engine has been updated to version 4.0.

Also in version 8.7.2.1 you can specify that text must be recognized in several languages. Here is an example that shows how to recognize text written in English and German languages: https://www.vintasoft.com/docs/vsimagin ... uages.html

Best regards, Alexander
Post Reply