Rajesh Shreedhar Bhat

Senior Data Scientist, Walmart Global Tech India

“Rajesh Shreedhar Bhat is working as a Sr. Data Scientist at Walmart, Bangalore. His work is primarily focused on building reusable machine/deep learning solutions that can be used across various business domains at Walmart. He completed his Bachelor’s degree from PESIT, Bangalore.

He has a couple of research publications in the field of NLP and vision, which are published at top-tier conferences such as CoNLL, ASONAM, etc. He is a Kaggle Expert(World Rank 966/122431) with 3 silver and 2 bronze medals and has been a regular speaker at various International and National conferences which includes Data & Spark AI Summit, ODSC, Seoul & Silicon Valley AI Summit, Kaggle Days Meetups, Data Hack Summit, etc.

Apart from this, Rajesh is a mentor for Udacity Deep learning & Data Scientist Nanodegree for the past 3 years and has conducted ML & DL workshops in GE Healthcare, IIIT Kancheepuram, and many other places. “

Past sessions

Summit 2021 Conversational AI with Transformer Models

May 27, 2021 11:35 AM PT

With the advancements in Artificial Intelligence (AI) and cognitive technologies,  automation has been a key prospect for many enterprises in various domains. Conversational AI is one such area where many organizations are heavily investing in.

 

In this session, we discuss the building blocks of conversational agents, Natural Language Understanding Engine with transformer models which have proven to offer state of the art results in standard NLP tasks.

 

We will first talk about the advantages of Transformer models over RNN/LSTM models and later talk about knowledge distillation and model compression techniques to make these parameter heavy models work in production environments with limited resources.

 

Key takeaways:

  • Understanding the building blocks & flow of Conversational Agents. 
  • Advantages of Transformer based models over RNN/LSTMS
  • Knowledge distillation techniques
  • Different model compressions techniques including Quantization
  • Sample code in PyTorch & TF2
In this session watch:
Rajesh Shreedhar Bhat, Senior Data Scientist, Walmart Global Tech India
Dinesh Ladi, Data Scientist, Walmart Global Tech India

[daisna21-sessions-od]

Summit Europe 2020 Detecting and Recognising Highly Arbitrary Shaped Texts from Product Images

November 18, 2020 04:00 PM PT

Extracting texts of various sizes, shapes and orientations from images containing multiple objects is an important problem in many contexts, especially, in connection to e-commerce, augmented reality assistance system in a natural scene, content moderation in social media platform, etc. In the context of a scale with which Walmart operates the text from the product image can be a richer and more accurate source of data than human inputs which can be used in several applications like AttributeExtraction, Offensive Text Classification, Compliance use cases, etc. Accurately extracting text from product images is a challenge given that product images come with a lot of variation which includes small, highly oriented, arbitrary shaped texts with fancy fonts, etc .. Typical word-level text detectors for text detection fails to detect/capture these variations or even if they are detected, text recognition models without any transformation layers fail to recognize and accurately extract the highly oriented or arbitrary shaped texts.

In this talk, I will cover the text detection technique which detects the character locations in an image and then combines all characters close to each other to form a word based on an affinity score which is also predicted by the network. Since the model is at a character level, it can detect in any orientation. Post this, I will talk about the need for Spatial TransformerNetworks which normalizes the input cropped region containing text which can be then fed to CRNN-CTC network or CNN-LSTM-Attention architecture to accurately extract text from highly oriented or arbitrary shaped texts.

Key Takeaways:

  • Understanding the need for text extraction from Product Images
  • Deep Learning Techniques for detecting highly oriented text based on image segmentation
  • Understanding the need for Spatial Transformation Networks for text recognition. Deep dive on STN-CNN-LSTM-Attention architecture for text recognition
  • Sample code for Spatial transformation Networks. Usage of Text Extraction in various fields/domains

Speaker: Rajesh Shreedhar Bhat

Extracting texts of various sizes, shapes and orientations from images containing multiple objects is an important problem in many contexts, especially, in connection to e-commerce, augmented reality assistance system in a natural scene, content moderation in social media platform, etc. The text from the image can be a richer and more accurate source of data than human inputs which can be used in several applications like Attribute Extraction, Offensive Text Classification, Product Matching, Compliance use cases, etc. Extracting text is achieved in 2 stages. Text detection: The detector detects the character locations in an image and then combines all characters close to each other to form a word based on an affinity score which is also predicted by the network. Since the model is at a character level, it can detect in any orientation. Post this, the text is then sent through the Recognizer module. Text Recognition: Detected text regions are sent to the CRNN-CTC network to obtain the final text. CNN's are incorporated to obtain image features that are then passed to the LSTM network as shown in the below figure. Connectionist Temporal Classification(CTC) decoder operation is then applied to the LSTM outputs for all the time steps to finally obtain the raw text from the image.

Key Takeaways:

  1. Understanding the need for text extraction from Product Images.
  2. Deep Learning Techniques for detecting highly oriented text.
  3. End to End understanding of CRNN-CTC network for text recognition with TF 2.0
  4. Need for CTC loss and theoretical understanding of the same.
  5. Usage of Text Extraction in various fields/domains.