ML Software Engineer Intern

Amazon.com, Seattle, WA

Deep Learning AWS Sagemaker Recommender Systems TensorFlow Large Language Models

Personalization of the Outfit Builder API
  1. Built two use cases - "Build your own Outfit" and "Swap Item" - around an RNN-based recommendations with a P90 latency of less than 60 ms.
  2. Used Facebook's "Faiss: A library for efficient similarity search" to optimize the KNN search process for vector embeddings by 10x.
  3. Built an end-to-end data pipeline in AWS Sagemaker Batch Transform to evaluate model performance in terms of Outfit Confidence Score (an internal Amazon metric to measure the quality of outfit recommendations).
  4. Developed an LLM-based personalized complementary recommendations application using SentenceBERT and cosine similarity.
  5. Won the first prize in Soflines organization wide hackathon in 2023 for our idea on "Personalized and Real-time Outfit Builder".
Offline Optimizations
  1. Optimized the Fashion Outfit Team's Offline Batch Transform API by 3x using multiprocessing, bringing down the run time from 8 hours to about 2 hours everyday and saving about 100,000 USD annually.

Jan 2021 - Jul 2022

Machine Learning Engineer

Deep Learning Python3 Flask TensorFlow Google Cloud TPUs Apache Beam Google Cloud Dataflow

Client: Google X, the Moonshot Factory (Confidential Research Project)

Neural Machine Translation
  1. Handled the entire life cycle of Transformer models:
    1. Data Preparation
      • Storage and retrieval of data in Google Cloud Bigquery and Spanner Databases.
      • Large scale processing of 100 million+ records at scale using Apache Beam on Google Cloud Dataflow runner.
    2. Model Creation, Training, and Evaluation
      • Training Transformer models and its variants on Google Cloud TPUs.
        • Masked Language Model (MLM) Pretraining - To build a strong encoder representation of input text
        • Denoising Autoencoder - To help the model identify and correct errors in the input text
        • Supervised Finetuning objectives - For sequence to sequence prediction
        • Training with language embeddings instead of language tokens and positional embeddings instead of positional encoding in the conventional Transformer architecture.
        • Worked on a causal masking objective for text completion in text conditioned on arbitrary left and right contexts.
      • Visualization of training and validation curves and metrics on Tensorboard.
      • Visualization of embeddings using PCA and t-SNE dimensionality reduction techniques.
      • Migration of checkpoints between different training objectives.
      • Setting up an automated evaluation pipeline which obviated the need to run evaluation of checkpoints manually.
    3. Deployment
      • Exporting optimized versions of checkpoints for TPU and GPU deployment.
      • Setting up API endpoints on Google Compute Engine - VM instances for model serving.
  2. Unintended Memorization in Neural Machine Translation
    1. Loosely based on The Secret Sharer: Evaluating and Testing Unintended Memorization in Neural Networks
    2. Wrote a scoring API to expose log-likelihood scores and to test for unintended memorization as a function of frequency of a secret in the traning data.
Similarity Engine
  1. Developed a document similarity engine using SimHash and Multi-Indexed Hashing which can search 21 million records in under 150 milliseconds.
  2. Implemented CLIP loss on top of Transformer Encoder - an embedding based similarity approach.

Jan 2021 - Jul 2022

Data Scientist

Statistical Modeling Deep Learning Python3 Flask Elasticsearch PySpark MongoDB AWS Cloud

User Profiling and Learning Outcomes Team

Concept Mastery - Knowledge Tracing
    Knowledge tracing is the task of modeling student knowledge over time so that we can accurately predict how students will perform on future interactions.
  1. Bayesian Knowledge Tracing (BKT)
    1. Developed the organization's baseline Knowledge Tracing model - known as Bayesian Knowledge Tracing (BKT) which achieved an ROC-AUC score of 0.61.
    2. BKT is a four parameter model and is based on a Hidden Markov Model with two hidden states and two visible states.
    3. Backbone of two downstream products
      • Learning Intervention (LI): Uses BKT concept mastery scores to recommend learning material (videos and practice questions) to remedy a student's weak concepts.
      • Personalized Achievement Journey (PAJ): Uses BKT concept mastery scores to recommend a set of personalized tests from the student's weak concepts.
    4. Set up an end-to-end ETL pipeline to update concept mastery scores based on BKT using cron jobs on AWS EC2 instances, PySpark and Elasticsearch.
    5. Worked on an in-depth, scalable and reproducible analysis of 10 million+ student attempts for Test-on-Test student performance and concept mastery improvement. The report that was sent to the National Testing Agency, the organization which conducts national level entrance examinations - JEE and NEET - in India.
    6. Developed eGo - a Simulation Engine based on pre-trained BKT models. The simulations of student behavior helped uncover corner cases and bugs in two upcoming products.
  2. Deep Knowledge Tracing (DKT)
    1. Involved in the development of an LSTM based model which led to an ROC-AUC score gain of 0.21 on the validation set.

May 2020 - Jan 2021

Intern, Data Science Lab

Statistical Modeling Deep Learning Python3 Google Apps Script

Knowledge Graph Team

Research Project
  1. Wrote an internal survey article to exhaustively cover all the approaches - classified into Bayesian and non-Bayesian approaches in the literature.
  2. Conducted several talks on applications of Knowledge Tracing to familiarize the team with the literature.
Knowledge Graph Profiling and Hygiene
  1. Automated the Knowledge Graph Hygiene check using Google Apps Script and Python.
  2. Profiled the Knowledge Graph of the organization and helped add metrics to a daily internal email sent to various stakeholders.

Dec 2019 - May 2020

Undergraduate Peer Tutor

Ahmedabad University, Ahmedabad, India

Recommended by Professor Dhaval Patel to be a doubt solver for sophomore students.

Course Period
Probability and Random Processes August 2019 - Dec 2019
Aug 2019 - Dec 2019

Undergraduate Research Intern

Deep Learning Signal Processing Python3 Tensorflow Keras MATLAB

Project title: “Non-parametric Smart Sensing Analytics based on Large Spectrum Data and Estimation of Channel Activity Statistics” as part of DST-UKIERI, a UK-India Education and Research Initiative.

Research Project
  1. Trained a customized LSTM network to leverage the temporal correlation in signal data in Cognitive Radio to improve the detection performance. For FM broadcasting, the detection probability increased by 0.16.

Apr 2018 - Nov 2018