Form recognize using Template

Questions, comments and suggestions concerning VintaSoft Imaging .NET SDK.

Moderator: Alex

Post Reply
erasmo
Posts: 3
Joined: Fri Oct 09, 2015 8:19 pm

Form recognize using Template

Post by erasmo »

hi guys ,i created a template using the project of test(sample), but do not was successful in write the code for use the template and extract the text from pdf files,
the project sample is a desktop application , this turn out the task hard, understand a code with desktop components associate.
i need a code project type "console application"
my code works , a little :S
but do not is efficient, i know exist a better way, I'm not getting make it works
forgive my poor english :(
so , the code ... this was my better solution until now


part 1

Code: Select all

List<VintasoftImage> images = new List<VintasoftImage>();
                VintasoftImage imagem = new VintasoftImage(path + fileName + ext);
                imagem.RenderingSettings = new RenderingSettings(new Resolution(96 * fator, 96 * fator), InterpolationMode.HighQualityBicubic, SmoothingMode.HighQuality);
                FormRecognitionManager = new FormRecognitionManager();
                FormRecognitionManager.FormTemplates.LoadFromDocument(pathTemplate + template + extTemplate);
                images.Add(imagem);
                FormDocumentTemplate FormDocumentTemplate = FormDocumentTemplate.Deserialize(pathTemplate + template + extTemplate);

method

Code: Select all

public static String  PreprocessAndOcrImages(FormDocumentTemplate FormDocumentTemplate, OcrLanguage language, string filename)
        {
            String resultado = "";
            //Hashtable resultado = new Hashtable();
            List<RecognitionRegion> regions = new List<RecognitionRegion>();
            List<FormFieldTemplate> campos = (FormDocumentTemplate.Pages[0].Items[0] as Vintasoft.Imaging.FormsProcessing.FormRecognition.FormFieldTemplateGroup).Items.ToList();
            foreach(FormFieldTemplate campo in campos){
                int x = (int)campo.BoundingBox.X*fator;
                int y = (int)campo.BoundingBox.Y*fator;
                int w = (int)campo.BoundingBox.Width*fator;
                int h = (int)campo.BoundingBox.Height*fator;
                
                regions.Add(new RecognitionRegion(new RegionOfInterest(x, y, w, h),language));
            }

            // load image(s)
            ImageCollection images = new ImageCollection();
            VintasoftImage imagem = new VintasoftImage(filename);
            imagem.RenderingSettings = new RenderingSettings(new Resolution(96 * fator, 96 * fator), InterpolationMode.HighQualityBicubic, SmoothingMode.HighQuality);
            images.Add(imagem);

            //Console.WriteLine("Create Tesseract OCR engine...");
            using (TesseractOcr tesseractOcr = new TesseractOcr(TesseractOcrDllDirectory))
            {
                // create OCR engine manager
                OcrEngineManager engineManager =
                    new OcrEngineManager(tesseractOcr);

                OcrEngineSettings settings = new OcrEngineSettings(language);

                // foreach image
                foreach (VintasoftImage image in images)
                {
                    //Console.WriteLine("Preprocess image:");
                    //Console.WriteLine("BorderClear, Despeckle, Deskew, Segmentation...");
                    OcrPreprocessingCommand preprocessing = new OcrPreprocessingCommand();
                    preprocessing.Binarization = null;
                    preprocessing.ExecuteInPlace(image);

                    //Console.WriteLine("Recognize image...");
                    //OcrPage page = engineManager.Recognize(image, settings, preprocessing.SegmentationTextRegions);

                    OcrPage page = engineManager.Recognize(image, settings, regions);

                    //Console.WriteLine("Page Text:");
                    //Console.WriteLine(page.GetText());
                    Console.WriteLine();

                    //for (int i = 0; i < page.Regions.Count; i++)
                    //{
                    //    Rectangle ret = new Rectangle(
                    //        (int)campos[i].BoundingBox.X * fator,
                    //        (int)campos[i].BoundingBox.Y * fator,
                    //        (int)campos[i].BoundingBox.Width * fator,
                    //        (int)campos[i].BoundingBox.Height * fator);
                    //    Object[] obj= page.GetObjects(OcrObjectType.TextRegion, ret);
                    //    var dados =  obj==null ||obj.Length==0 ?"":(obj[0] as OcrTextRegion).Text.Trim();
                    //    resultado.Add(campos[i].Name,dados);
                    //}
                    resultado += page.GetText();
                }
                
            }

            // free resources
            images.ClearAndDisposeItems();
            images.Dispose();
            return resultado; 
           
        }
vladimirG
Posts: 2
Joined: Sun Jan 25, 2015 1:18 am

Re: Form recognize using Template

Post by vladimirG »

Hi,
erasmo wrote:the project sample is a desktop application , this turn out the task hard, understand a code with desktop components associate.
i need a code project type "console application"
VintaSoft provides detailed online documentation with code samples. Try to read this article, it contains essentials of forms recognition and contains several code samples reusable in console applications. The example of "RecognizeFormWithOcrFields" method might be helpful for you.

Don't hesitate to ask if further questions arise.

Regards, Vladimir.
erasmo
Posts: 3
Joined: Fri Oct 09, 2015 8:19 pm

Re: Form recognize using Template

Post by erasmo »

hi vladimir, thank you for your help, the problem with the examples is , the objects in the parameters ,is only instantiate? what i need configure in this objects?
i'm having troubles , the recognitionResult in all tries are null
the method RecognizeFormWithOcrFields, is the same of the documentation
we like of the your tools, of the possibilities, but i need get better results in my tests ,the purchase of the license is depending on that.

Code: Select all

 
	//i use the variable "fator" ,because the resolution of image was modified  when i created the "template"
	 static int fator = 10;
	 static ChangePixelFormatToBlackWhiteCommand _binarizeCommand = new ChangePixelFormatToBlackWhiteCommand(BinarizationMode.Global);
	// the other variables are just paths of the images,of the tesseract and others. 
	
	
    	static void Main(string[] args){
            FormRecognitionManager = new FormRecognitionManager();
            FormDocumentTemplate templateDocument = FormDocumentTemplate.Deserialize(pathTemplate + template + extTemplate);
            FormRecognitionManager.FormTemplates.LoadFromDocument(templateDocument);


            #region template
            RenderingSettings _renderingSettings = new RenderingSettings(new Resolution(fator * 96, fator * 96), InterpolationMode.HighQualityBicubic, SmoothingMode.HighQuality);

            //// images to remove from template manager
            List<VintasoftImage> imagesToRemove = new List<VintasoftImage>();


            for (int i = 0; i < templateDocument.Pages.Count; i++)
            {
            //    // current page template
                FormPageTemplate templatePage = templateDocument.Pages[i];
            //    // get corresponding template image
                VintasoftImage templateImage = FormRecognitionManager.FormTemplates.GetTemplateImage(templatePage);

                if (templateImage.PixelFormat != Vintasoft.Imaging.PixelFormat.BlackWhite)
                {
                    ProcessingCommandBase processingCommand = null;

                    templateImage.RenderingSettings = _renderingSettings;
                    processingCommand = _binarizeCommand;

                    // if processing command is set
                    if (processingCommand != null)
                    {
                        bool processingError = false;
                        try
                        {
                            processingCommand.ExecuteInPlace(templateImage);
                        }
                        catch (Exception ex)
                        {
                            Console.WriteLine(ex.Message);
                            templateImage.Dispose();
                            processingError = true;
                        }
                        if (processingError)
                            continue;
                    }
                }
            }
            #endregion

            #region arquivo pdf
            List<string> filenames = new List<string>();
            filenames.Add(path + fileName + ext);
            //// for each selected file
            ImageCollection filledImages = new ImageCollection();
            foreach (string filename in filenames)
            {
            //    // temporary image collection for all images in current file

                filledImages.Add(filename);

                foreach (VintasoftImage image in filledImages)
                {
                    // if binarization is canceled

                    if (image.PixelFormat != Vintasoft.Imaging.PixelFormat.BlackWhite)
                    {
                        ProcessingCommandBase processingCommand = null;
                        // if settings shall be applied for all remaining images or
                        // settings are approved

                        image.RenderingSettings = _renderingSettings;
                        processingCommand = _binarizeCommand;

                        // if processing command is set
                        if (processingCommand != null)
                        {
                            bool processingError = false;
                            try
                            {
                                processingCommand.ExecuteInPlace(image);
                            }
                            catch (Exception ex)
                            {
                                Console.WriteLine(ex.Message);
                                image.Dispose();
                                processingError = true;
                            }
                            if (processingError)
                                continue;
                        }
                    }

                    RecognizeFormWithOcrFields(FormRecognitionManager, image);

                }

                
                filledImages.Clear();

            }

            #endregion
            }
            
            
Alex
Site Admin
Posts: 2303
Joined: Thu Jul 10, 2008 2:21 pm

Re: Form recognize using Template

Post by Alex »

Hello,

Could you send us (to support@vintasoft.com) small working project which demonstrates your problem? We need reproduce your problem on our side and after this we will be able to provide you good solution for your task.

Best regards, Alexander
erasmo
Posts: 3
Joined: Fri Oct 09, 2015 8:19 pm

Re: Form recognize using Template

Post by erasmo »

the support helped me , the mistake was mine, i don't had understood that need configure the tesseract

Code: Select all

OcrEngine ocrEngine = new TesseractOcr(TesseractOcrDllDirectory);
OcrFieldTemplate.OcrEngineManager = new OcrEngineManager(ocrEngine);
Post Reply