iText pdfOCR, which is integrated in the iText 7 PDF SDK, offers OCR (Optical Character Recognition) functions that convert printed text in scanned documents and images into a fully searchable PDF / A-3 compatible format (PDF version 1.7) can. This enables a much easier and faster access to these texts. Without machine-readable text, printed or scanned documents cannot be searched, indexed or interpreted. Logical follow-up steps would be, for example, data extraction with iText pdf2Data, the secure editing of content with iText pdfSweep or the creation of new multilingual documents with iText pdfCalligraph. The reuse of data with the iText DITO® low-code document generator is often the icing on the cake.
The iText pdfOCR add-on is based on the technology of the Tesseract OCR engine. Tesseract supports over 100 languages and was originally developed by Hewlett-Packard (1985) and published in 2005 under the Apache Open Source license. The further development of Google has been promoted since 2006.
“Because of COVID-19, companies are forced to accelerate their digital transformation. Therefore, they have to look for new methods of data access and data management – for both existing and new data. As a leader in digital documents, we’re excited about our innovative contribution to this new era. So I’m very proud to announce the latest addition to our PDF library: iText pdfOCR’s OCR capabilities open up a lot of new opportunities for users and businesses looking to take full advantage of their data, ”said Yeonsu Kim, CEO of iText Group NV.
"True to our open source roots, we decided to develop iText pdfOCR based on the open source Tesseract OCR engine. In doing so, we would like to underline our position as an open source company, which benefits millions of users and customers. ”
“ This addition to our PDF library allows developers to use data in documents to which they previously had no access. Our latest product enables them to expand their digital workflow skills by accessing the data in scanned files and making it available for any operation or purpose that they or the end users want, ”said Tony Van den Zegel, VP Product and Marketing at iText Group NV and managing director of iText Software, Belgium.
iText pdfOCR offers a wide range of applications: for example, archiving historical documents, translations of legal documents, automatic data entry when processing different types of physical documents or applications, and sorting printed or scanned documents that cannot otherwise be processed.
Take part in our live demos on July 9, 2020. For more information, go to www.itextpdf.com/events
iText is a leading global provider of innovative PDF software. The award-winning products are used by several million users and are offered as open source versions and in commercial form. The diverse customer base includes numerous Fortune 500 companies – including technology, finance, travel and healthcare companies, as well as small businesses and government agencies. IText is headquartered in Belgium with offices in Asia (Singapore and South Korea) and the United States (Boston).
iText Software Belgium
Telefon: +32 (92) 9802-31
Telefax: +32 (92) 7033-75