Rajesh Shreedhar Bhat is working as a Senior Data Scientist at Walmart Labs, Bangalore. He completed his Bachelor’s degree from PESIT, Bangalore, and currently pursuing Masters from Arizona State University in CS with ML specialization.
He has a couple of research publications in the field of NLP and vision, which are published at top tier conferences such as ECML-PKDD, CoNLL, ASONAM, etc.. He is a Kaggle Expert(World Rank 966/122431) with 3 silver and 2 bronze medals and has been a regular speaker at Kaggle days meetups.
Apart from this, Rajesh is a mentor for Udacity Deep learning & Data Scientist Nanodegree for the past 2 and half years and has conducted ML & DL workshops in GE Healthcare, IIIT Kancheepuram, and many other places.
Extracting texts of various sizes, shapes and orientations from images containing multiple objects is an important problem in many contexts, especially, in connection to e-commerce, augmented reality assistance system in a natural scene, content moderation in social media platform, etc. The text from the image can be a richer and more accurate source of data than human inputs which can be used in several applications like Attribute Extraction, Offensive Text Classification, Product Matching, Compliance use cases, etc. Extracting text is achieved in 2 stages. Text detection: The detector detects the character locations in an image and then combines all characters close to each other to form a word based on an affinity score which is also predicted by the network. Since the model is at a character level, it can detect in any orientation. Post this, the text is then sent through the Recognizer module. Text Recognition: Detected text regions are sent to the CRNN-CTC network to obtain the final text. CNN's are incorporated to obtain image features that are then passed to the LSTM network as shown in the below figure. Connectionist Temporal Classification(CTC) decoder operation is then applied to the LSTM outputs for all the time steps to finally obtain the raw text from the image.