The Team

Our team consists of exceptional data scientists and faculty collaborators across multiple institutions who are keen to further our understanding on cancers and are in a mission to end this deadly disease. Here is a short summary of who they are (in alphabetical order), and the expertise they hold.

MDA - UT MD Anderson Cancer Center, UT - University of Texas at Austin, Mayo - Mayo Clinic, NUS - National University of Singapore

Data Scientist Position Expertise
Ali Pirani Data Scientist, MDA Spatial modeling, Graph Engineering
Gayatri Kumar Post-doc, Mayo Spatial modelling
Prahlad Bhat Undergraduate, UT Spatial modeling
Matt Flick MD PhD Student, Mayo Graph Engineering
Shruti Sridhar PhD Candidate, NUS Spatial Modeling
Yang Liu PhD Candidate, MDA Graph Engineering
Faculty Collaborator Position Expertise
Dr. Anand Jeyasekharan, MD Assistant Prof, NUS Hematology
Dr. Jason Huse, MD PhD Professor, MDA Neuropathology
Dr. Kanishka Sircar, MD Professor, MDA Anatomical Pathology
Dr. Krishna Bhat, PhD Associate Prof, Mayo Cancer Biology
Dr. Leland Hu, MD Associate Prof, Mayo Radiology
Dr. Nhan Tran, PhD Professor, Mayo Cancer Biology
Dr. Yinyin Yuan, PhD Professor, MDA Machine Learning/AI

A little about me

I am a Professor of Data Science at the Center for Strategic Leadership, US Army War College, where I lead AI-driven research initiatives that integrate advanced analytics into national security strategy, decision-making, and warfighting applications. A recognized leader in data science with nearly two decades of experience, I have held professorial appointments at two of the world’s premier institutions: Associate Professor at The University of Texas MD Anderson Cancer Center, where I directed the computational pathology program and led multidisciplinary teams advancing spatial and systems biology initiatives; and Assistant Professor at New York University, where I built and directed data science and machine learning curricula while pioneering next-generation sequencing pipelines and statistical modeling for cancer genomics.

My expertise spans machine learning, artificial intelligence, bioinformatics, statistical modeling, image processing, and semantic technologies (including Neo4j graph databases and ontologies). I apply the methods from these areas to high-stakes domains that support senior-leader decision advantage in defense contexts and data integration efforts in cancer research. My research has produced over 35 peer-reviewed publications in Nature, Cancer Discovery, Cell, and Clinical Cancer Research, and helped secure multiple NIH grants totaling millions in funding, and generated patented insights into cancer evolution and immune microenvironment dynamics. I have delivered invited talks at Army Test and Evaluation Command, Mayo Clinic, National University of Singapore, MD Anderson Cancer Center, and international symposia, and I remain deeply committed to education—having designed, developed, and directed flagship courses in Data Science, Quantitative Biology, and Machine Learning.

Beyond cancer biology and computational pathology, my scientific curiosity extends to formal logic, physical theories, computing, and mathematics, fueling a career dedicated to translating complex data into actionable knowledge that drives discovery and strategic advantage. In addition to the primary appointment at the Center for Strategic Leadership at the US Army War College in Carlisle Barracks, I hold a Research Collaborator position at Mayo Clinic. Here are my formal academic trainings:

Training Institution Date
Post-doctoral Memorial Sloan-Kettering Cancer Center 2011-2013
PhD, Computer Science Texas A&M University, College Station 2002-2008
MS, Mathematics Texas A&M University, College Station 2000-2002
MSc, Mathematics Indian Institute of Technology, Madras 1998-2000
BSc, Mathematics University of Madras, Chennai 1995-1998

Click below to view/download my CV.
Curriculum Vitae

Active Projects

Our team is working on data science projects that are focussed on two exciting and emerging areas (1) Spatial Pathology (2) Systems Biology and AI. Here are the brief descriptions of some of the projects we do.

Geospatial modeling of tissues - Spatial point processes are powerful statistical frameworks for studying point patterns. By representing cells as points and annotating the measurements taken on those cells, such as gene expression at single cell level, it is appropriate to study their interactions using point processes. We extensively use spatstat, an R package, to model interactions and derive directed insights in cancers.

Biomarker discovery using graph database - Graph databases can help identify biomarkers by efficiently representing complex biological networks. By enabling powerful queries and community detection algorithms, graph databases make exploring the relationships between multiple genes easier, thus facilitating the discovery of potential biomarkers critical for diagnostics or therapeutic targets. We use Neo4j, a property graph database, and algorithms from the Graph Data Science Library (GDS) to derive insights and propose actionable biological targets. The concept paper for this effort can be viewed here.

Biomarker validation using Graph Neural Networks (GNNs) - Insilico validation of biomarkers is critical for providing actionable biological targets for experiments and prognosis. We apply GNNs (convolutional, attention etc.) to validate critical biomarkers discovered through the graph database. GNNs enhance biomarker discovery by leveraging the structure of biological networks, such as gene-gene interactions, and learning meaningful node/edge representations. By aggregating information from a node's neighbors, GNNs capture local and global patterns, allowing the model to predict relationships between genes and disease associations more effectively.

Past Projects

I have obtained several directed insights in biology through data science applied to genomics data. A majority of my contributions in science are in omics and numerical algorithms applied to physics problems aka computational physics.

Contributions in omics highlights my key work during the years 2009-2020, primarily at New York University where I was a faculty and at Memorial Sloan-Kettering Cancer Center where I was a post-doc. Contributions in computational physics marks my work as a graduate student at Texas A&M University between the years 2005-2008.

Teaching

Over the course of years and during my tenure at New York University School of Medicine, I developed and taught several topics in biomedical informatics.

Programming for Data Analysis is about the fundementals of data science using the R programming language. This course is primarily based on tidyverse and ggplot packages. We also cover a bit of mathematical modeling (such as optimization) towards the end of the course, but the main focus of the course is for non-programmers to get some expereince in doing data analysis. As a case study for analysis we use clinical databases (diabetes and critical care databases).

Machine Learning and AI consists of several important topics in this area such as classification, ensemble methods, feature selection and regularization. The focus is on the depth of these topics from a statistical perspective. For example, the lecture on ensemble methods would tell us the statistical basis for bootstrapping and random forests. The idea is to make students realize that machine learning is not just programming or data exploration but it is actually statistics. In contrast, this lecture series also contains an "hands-on" tutorial on AI based image classification.

Methods in Quantitative Biology is a set of four disparate lectures I developed in Fall 2017 as a part of biomedical informatics program at NYU. I believe these topics are a part of core subjects in informatics that data science students need to gain a good understanding. For example, algorithms are the core engines of any computing task and it is important to understand the analysis of their complexities. Similarly, linear algebra is a very important topic and plays a crucial role, be it quantum computing or deep learning.

A consolidated version of Programming for Data Analysis and Methods in Quantitative Biology courses can be found here.

Elements

Text

This is bold and this is strong. This is italic and this is emphasized. This is superscript text and this is subscript text. This is underlined and this is code: for (;;) { ... }. Finally, this is a link.


Heading Level 2

Heading Level 3

Heading Level 4

Heading Level 5
Heading Level 6

Blockquote

Fringilla nisl. Donec accumsan interdum nisi, quis tincidunt felis sagittis eget tempus euismod. Vestibulum ante ipsum primis in faucibus vestibulum. Blandit adipiscing eu felis iaculis volutpat ac adipiscing accumsan faucibus. Vestibulum ante ipsum primis in faucibus lorem ipsum dolor sit amet nullam adipiscing eu felis.

Preformatted

i = 0;

while (!deck.isInOrder()) {
    print 'Iteration ' + i;
    deck.shuffle();
    i++;
}

print 'It took ' + i + ' iterations to sort the deck.';

Lists

Unordered

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Alternate

  • Dolor pulvinar etiam.
  • Sagittis adipiscing.
  • Felis enim feugiat.

Ordered

  1. Dolor pulvinar etiam.
  2. Etiam vel felis viverra.
  3. Felis enim feugiat.
  4. Dolor pulvinar etiam.
  5. Etiam vel felis lorem.
  6. Felis enim et feugiat.

Icons

Actions

Table

Default

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Alternate

Name Description Price
Item One Ante turpis integer aliquet porttitor. 29.99
Item Two Vis ac commodo adipiscing arcu aliquet. 19.99
Item Three Morbi faucibus arcu accumsan lorem. 29.99
Item Four Vitae integer tempus condimentum. 19.99
Item Five Ante turpis integer aliquet porttitor. 29.99
100.00

Buttons

  • Disabled
  • Disabled

Form