Professional Projects
Welcome to my portfolio! This is where I showcase the projects I've worked on, providing a glimpse into my journey as a Data Science manager and AI professional. Each project has a unique story, and I'm thrilled to share them with you.
Generating Insights from Medical CRM Data using GPT4
My Role: Data Science Project Manager
This project was focused on listening to Medical Liaison's notes from their HCP engagements in the field. This digital product features a Q&A mechanism powered by Generative AI, primarily using GPT4 and Pinecone, allowing for us to query and surface insights from Global Field Medical CRM text data across therapeutic areas using a React and Fast API powered dashboard .
It was an exhilarating experience that honed my leadership skills and data science prowess.

Understanding Treatment Journeys of Patients in the field of Immunology
My Role: Lead Data Scientist
This tool is intended to understand the characteristics and treatment journeys of patients in a broad cohort of having been diagnosed with at least one of the immunology diseases. We leveraged Python and Hydra to build the pipeline. While using G-estimation and SHAP to compare performance of biologics to determine efficacy of a treatment
​
This will provide novel information on the key characteristics, including demographic profiles, clinical characteristics of disease and patient clinical phenotype, such as co-morbidities using a PowerBI dashboard.

Compliance Engine
My Role: Lead Data Scientist
Developed an automated compliance engine for CRM free text using a mix of Natural Language Processing AI models.
It flags any mentions of defamatory, proprietary confidential, financial or derogatory remarks.
​
The tool presents those flagged notes to review in an easy-to-use interface built using FastAPI and React where Medical Governance Officers can come in and verify whether the flagged notes are non-compliant or compliant.
In a responsive web application, we move from a regular random check of a portion of notes to a smart/AI-driven guidance on the most likely notes to review for compliance.

Reusable Component for Natural Language Processing
My Role: Data Scientist
Created a reusable machine learning component as a library in Python for
​
-
Anonymization: Spacy-NER
-
Language Detection: FastText
-
Translation: AWS Translate API
​
This was used by multiple projects which needed the above services, saving so much money and resources for our company.
​

Investment Target Identification
My Role: Data Scientist
Developed a custom streamlit application identifying various drug targets in the clinical trial phases as potential investment opportunities for the company using ensemble learning with various features.
​
Built another NLP layer of the dashboard which used research papers text as input to highlight the compound gaining more ground in clinical trial phases 3 and 4.

Testing Feasibility of Compounds
My Role: Data Scientist
Developed a machine learning model in Python using XGBoost to predict the feasibility of various pharmaceutical compounds for preclinical trials.
​
This tool took in various features of the compound, from its ph level to viscosity to predict if this compound would fare well in the clinical trials given the profile of the endpoints.
