Why?
Rationale for starting algorithmicBio
When I was an internal medicine resident in Baltimore I struggled to pick a subspecialty to pursue. Too many aspects of medicine were of interest, and none dominated enough. Meanwhile, my South African friend Malegapuru Makgoba was doing an extremely productive post-doc in Steve Shaw’s lab at NIH on the molecular basis of lymphocyte adhesion. Despite my lack of a research pedigree, I took William’s place in Steve’s lab and enjoyed 3 years helping further define the molecular heterogeneity of human lymphocytes. A particular highlight was discovering the human a4b7 integrin, which was subsequently be validated as a therapeutic target for inflammatory bowel disease, and the basis for the highly successful vedolizumab (Entyvio) marketed by Takeda. We characterized the expression of many molecules on human lymphocytes with a particular focus on integrins, such as a4b7 and isoforms of CD45. A notable observation was how few of the molecules appeared to have Gaussian patterns of expression, with many having highly skewed distributions, illustrating inherent biological complexity. One didn’t need to be sophisticated to realize that the functional implications of this phenotypic complexity was likely to be immense.
The relevance of our findings on a4b7 nudged me toward further training in gastroenterology, and a clinical focus on inflammatory bowel disease at UCLA. After 2 years on the UCLA faculty, I was restless as I felt in caring for inflammatory bowel disease patients and also pursuing laboratory research I was not optimally addressing either. Friends in biotech recommended that clinical research in industry would be an ideal fusion of my interests and background. I joined Merck and was lucky to be given the opportunity to lead the clinical development of the first neurokinin 1 receptor antagonist, aprepitant, to be successfully developed and approved for any indication. As an NK1 antagonist, aprepitant blocked the action of the neuropeptide substance P in the brain. A drug that would be prescribed by oncologists to prevent chemotherapy induced nausea and vomiting, that was being developed by gastroenterologist with a complex drug interaction profile. A perfect project for a novice drug developer. One of the challenges in the development program was the assessment of nausea, the severity of which was very skewed across patients, kindling memories of the skewed distributions I had seen in the immunology lab previously.
For some reason, I became preoccupied by the different distribution patterns evident in the datasets around me, which seemed to reflect both complexity and non-linearity. No one around me was interested in my curiosity about this and I thought I must be missing some obvious explanatory insight, until I read Nassim Taleb’s book “Fooled by Randomness” and realized that neglect of non-linearity and complexity was a widespread phenomenon. This coincided with the completion of the Human Genome Project and the emergence of gene expression microarrays with colorful heat maps. Connecting the dots I thought there must be some analytic approach that can handle high dimensional biological complexity, and inherent non-linearity to produce objective mathematical insight, otherwise we will not be able to understand all the biomedical data being produced.
In 2011, I found an analytic approach that produces such insight. An approach that fuses evolution, information theory and mathematics to yield transparent algorithms that are not overfitted.
My experience over the last 15 years with this approach has provided a window into biology and disease relevant to the discovery, development and deployment of therapies. There is a widespread sense that though there has been continued biomedical progress, the trajectory of clinically significant advances does not align with the increase in data/information. The obvious question is why? The answer was potentially provided by Sydney Brenner in 1971 when he said:
“We’re drowning in an ocean, or a sea of data but we are starved from knowledge. I believe that in biology programmatic explanations will be algorithmic explanations. We will seamlessly move between the molecular hardware and the logical software of biology. There will be no difficulty in adapting computers to biology. there will be luddites but they will be buried. But I think this is part of a kind of revolution in thinking, namely the whole of the theory of computation, which I think biologists have yet to assimilate.”
Despite Brenner’s prescient comments, and Hector Zenil’s work relatively little has been written about the algorithmic convergence in biology that will be necessary to realize the dividends of the immense accumulation of data that exists. This substack will be an attempt to ferment awareness and more discourse on this topic.
This endeavor has been inspired by many people including Patrick Lilley, Paul Burchard, Michael Rose, Hector Zenil, Andy Grove, Malegapuru Makgoba, Steve Shaw, Yoji Shimizu, Gijs van Seventer, Mago Clerici, Nassim Taleb, Garret FitzGerald, Tom Simon, David Madigan, Vahan Simonyan, Michael McDermott, Doug Harrington, W Brian Arthur, Douglas S Robertson, Sydney Brenner, Stephen Friend, Jonathan Rees, Poul Strange, Tony Ho and James Gleick. Their contributions were unwittingly made, and they bear no responsibility whatsoever for what is written here.

