Description

Multilingual text recognition is crucial for cross language information acquisition and related applications in the mobile computing era. The core problem is to find efficient representation and decoding methods for multilingual text recognition, including scene text recognition or handwriting recognition tasks. This book introduces primitive representation learning, which is a new deep learning framework for sequence modeling in contrast to CNN RNN CTC (convolutional neural network recurrent neural network connectionist temporal classification) or attention based encoder decoder approaches. Primitive representations are learned via global feature aggregation and then transformed into high level visual text representations via a graph convolutional network, which enables parallel decoding for text transcription. Multielement attention mechanism and temporal residual mechanism are further introduced to enhance the utilization of spatial and temporal feature information.

The methods presented in this book have been evaluated on public datasets and applied to scene text recognition and handwriting recognition systems. Readers will gain a better understanding of state of the art methods and research findings in multilingual scene text recognition, handwriting recognition, and related fields. The prerequisites needed to understand this book include basic knowledge for machine learning and deep learning.