The Team

Our team consists of exceptional data scientists and faculty collaborators across multiple institutions who are keen to further our understanding on cancers and are in a mission to end this deadly disease. Here is a short summary of who they are (in alphabetical order), and the expertise they hold.

MDA - UT MD Anderson Cancer Center, UT - University of Texas at Austin, Mayo - Mayo Clinic, NUS - National University of Singapore

Data Scientist Position Expertise
Ali Pirani Data Scientist, MDA Spatial modeling, Graph Engineering
Gayatri Kumar Post-doc, Mayo Spatial modelling
Prahlad Bhat Undergraduate, UT Spatial modeling
Matt Flick MD PhD Student, Mayo Graph Engineering
Shruti Sridhar PhD Candidate, NUS Spatial Modeling
Yang Liu PhD Candidate, MDA Graph Engineering
Faculty Collaborator Position Expertise
Dr. Anand Jeyasekharan, MD Assistant Prof, NUS Hematology
Dr. Jason Huse, MD PhD Professor, MDA Neuropathology
Dr. Kanishka Sircar, MD Professor, MDA Anatomical Pathology
Dr. Krishna Bhat, PhD Associate Prof, Mayo Cancer Biology
Dr. Leland Hu, MD Associate Prof, Mayo Radiology
Dr. Nhan Tran, PhD Professor, Mayo Cancer Biology
Dr. Yinyin Yuan, PhD Professor, MDA Machine Learning/AI

A little about me

I am an Associate Professor leading a team on spatial and systems biology initiatives at The University of Texas MD Anderson Cancer Center. Professionally, I am a data scientist (17+ years experience) with expertise in machine learning, AI and bioinformatics. My several years of data-driven training include statistical modeling and image processing with several software development skills primarily using R, and Python libraries and databases such as Neo4j and PostgreSQL. My research interests are in systems biology and spatial modeling of tissues.

My science interest is diverse and includes cancer biology, formal logic, physical theories, computing, and mathematics. My interest in biology and in particular cancer genomics has resulted in numerous publications. Also, I am very passionate about teaching and have actively designed, developed and directed data science, machine learning and biomedical informatics courses.

Professionally, I have a primary appointment in the Department of Translational Molecular Pathology with a joint appointment in the Department of Neurosurgery at MD Anderson Cancer Center, Houston. Moreover, I hold a research collaborator appointment at the Mayo Clinic in Arizona. I hold the following formal trainings:

Training Institution Date
Post-doctoral Memorial Sloan-Kettering Cancer Center 2011-2013
PhD, Computer Science Texas A&M University, College Station 2002-2008
MS, Mathematics Texas A&M University, College Station 2000-2002
MSc, Mathematics Indian Institute of Technology, Madras 1998-2000
BSc, Mathematics University of Madras, Chennai 1995-1998

Click below to view/download my CV.
Curriculum Vitae

Active Projects

Our team is working on data science projects that are focussed on two exciting and emerging areas (1) Spatial Pathology (2) Systems Biology and AI. Here are the brief descriptions of some of the projects we do.

Geospatial modeling of tissues - Spatial point processes are powerful statistical frameworks for studying point patterns. By representing cells as points and annotating the measurements taken on those cells, such as gene expression at single cell level, it is appropriate to study their interactions using point processes. We extensively use spatstat, an R package, to model interactions and derive directed insights in cancers.

Biomarker discovery using graph database - Graph databases can help identify biomarkers by efficiently representing complex biological networks. By enabling powerful queries and community detection algorithms, graph databases make exploring the relationships between multiple genes easier, thus facilitating the discovery of potential biomarkers critical for diagnostics or therapeutic targets. We use Neo4j, a property graph database, and algorithms from the Graph Data Science Library (GDS) to derive insights and propose actionable biological targets. The concept paper for this effort can be viewed here.

Biomarker validation using Graph Neural Networks (GNNs) - Insilico validation of biomarkers is critical for providing actionable biological targets for experiments and prognosis. We apply GNNs (convolutional, attention etc.) to validate critical biomarkers discovered through the graph database. GNNs enhance biomarker discovery by leveraging the structure of biological networks, such as gene-gene interactions, and learning meaningful node/edge representations. By aggregating information from a node's neighbors, GNNs capture local and global patterns, allowing the model to predict relationships between genes and disease associations more effectively.

Past Projects

I have obtained several directed insights in biology through data science applied to genomics data. A majority of my contributions in science are in omics and numerical algorithms applied to physics problems aka computational physics.

Contributions in omics highlights my key work during the years 2009-2020, primarily at New York University where I was a faculty and at Memorial Sloan-Kettering Cancer Center where I was a post-doc. Contributions in computational physics marks my work as a graduate student at Texas A&M University between the years 2005-2008.

Teaching

Over the course of years and during my tenure at New York University School of Medicine, I developed and taught several topics in biomedical informatics.

Programming for Data Analysis is about the fundementals of data science using the R programming language. This course is primarily based on tidyverse and ggplot packages. We also cover a bit of mathematical modeling (such as optimization) towards the end of the course, but the main focus of the course is for non-programmers to get some expereince in doing data analysis. As a case study for analysis we use clinical databases (diabetes and critical care databases).

Machine Learning and AI consists of several important topics in this area such as classification, ensemble methods, feature selection and regularization. The focus is on the depth of these topics from a statistical perspective. For example, the lecture on ensemble methods would tell us the statistical basis for bootstrapping and random forests. The idea is to make students realize that machine learning is not just programming or data exploration but it is actually statistics. In contrast, this lecture series also contains an "hands-on" tutorial on AI based image classification.

Methods in Quantitative Biology is a set of four disparate lectures I developed in Fall 2017 as a part of biomedical informatics program at NYU. I believe these topics are a part of core subjects in informatics that data science students need to gain a good understanding. For example, algorithms are the core engines of any computing task and it is important to understand the analysis of their complexities. Similarly, linear algebra is a very important topic and plays a crucial role, be it quantum computing or deep learning.

A consolidated version of Programming for Data Analysis and Methods in Quantitative Biology courses can be found here.

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.


Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5
Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Alternate

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Ordered

  1. Dolor pulvinar etiam.
  2. Etiam vel felis viverra.
  3. Felis enim feugiat.
  4. Dolor pulvinar etiam.
  5. Etiam vel felis lorem.
  6. Felis enim et feugiat.

Icons

Actions

Table

Default

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Alternate

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Buttons

  • Disabled
  • Disabled

Form