Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Formal Systems in Biology (github.com/prathyvsh)
133 points by tablet on June 9, 2020 | hide | past | favorite | 37 comments


The first entry on this list is McCulloch-Pitts nerve nets whose expressive power were analysed by S.C. Kleene [1]. In his article he coined the term "regular events" for the class of languages that could be expressed by nerve nets/finite automata, and this is where regular expressions got their name from. If you have ever thought the name was strange, rest assured that Kleene didn't actually like it either, he just couldn't think of something better at the time:

  > We shall presently describe a class of events which we will call "regular events" (We would welcome any suggestions as to a more descriptive term.*)
  > [...]
  > * McCulloch and Pitts use the term "prehensible," introduced rather differently; but since we did not understand their definition, we did not adopt the term.
So, had McCulloch and Pitts been a bit clearer in their seminal paper, then maybe it would have been called "prehensible expressions" :).

[1] https://www.rand.org/content/dam/rand/pubs/research_memorand...


This stuff got me thinking, if RegExp comes from graphs, why not use it to process graph databases. This paper is about a year old https://arxiv.org/pdf/1904.11653.pdf Can anyone point to open source implementations?


Reminder: please link to arXiv abstract pages, not directly to PDF.

Wang, Han, Shao, and Li - Regular expression matching on billion-node graphs; https://arxiv.org/abs/1904.11653


Is this not how SPARQL or other graph query languages work?


This maybe a rudimentary question, but if someone was going to study this at a university level, what would they study?

I ask because I'm starting my masters in CS, but I've also been going to workshops/events at a local citizen bio lab and really enjoying it. I'd really like to go deeper into the cross-section of CS and Bio, specifically the kinds of things listed in this repo (modeling biological phenomenon as formal systems, using computation to simulate those systems, etc.)

But when I look at potential programs to pursue after my CS course, I get a bit lost in all the different titles—bioinformatics, systems biology, computational biology, etc. It's hard for an outsider in the field to discern any meaningful delineation. Does anyone with experience in the field know what category of study these resources would fall under, from a university perspective?


I recommend looking at Luca Cardelli's work [1]. He's probably the leading researcher in the intersection of CS and biology, in the sense of modelling biological phenomenon as formal systems. Some of his lectures are online [2]. He's extremely approachable in my experience, so don't hesitate to contact him if you want to educate yourself more in this field.

[1] http://lucacardelli.name/

[2] https://www.youtube.com/watch?v=o8q7kFeGUTM


His paper Abstract Machines of Systems Biology [1] is a wonderful, only slightly outdated, of what the field might be able to achieve in the future.

I think and hope program analysis, abstract interpretation and friends will make a comeback to biology. Definitely topics covered in CS not in bioinformatics, which IMHO tends to be too applied for a first or second degree. I'd rather stick to the basics.

[1] http://lucacardelli.name/Papers/Abstract%20Machines%20of%20S...


Probabilistic model checking of (models of) biological systems is definite done by Luca's group:

- Design and Analysis of DNA Strand Displacement Devices using Probabilistic Model Checking http://lucacardelli.name/Papers/Design%20and%20Analysis%20of...

- Central Limit Model Checking http://lucacardelli.name/Papers/Central%20Limit%20Model%20Ch...


It varies from place to place, the better schools generally offer a more rigorous combination of CS and Biology, the reason why it may appear applied is mostly because Bioinformatics offered on MOOCS like EdX tend to only offer courses that fall under its specific subject code, giving the impression the entire field is about string manipulation. This is the same with other interdisciplinary subjects like mechatronics, they are generally a combination of two majors but courses exclusive to the particular interdisciplinary subject are lot more rare. Computational biology is a lot more than string processing and crunching through PDEs (synthetic biology), there is a tremendous amount of machine learning used in the current state of the art proteomics and drug synthesis research.


Totally agree. I think his paper "Can a Systems Biologist Fix a Tamagotchi?" is really nice, shows some of the fundamental conceptional issues and very fun reading nonetheless: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.131...


The lines between all of those fields is super blurry. Blurry lines between human-defined and ultimately somewhat meaningless categories is intrinsic to the field I'm afraid. The syllabus for each course with a listing of the actual topics you study is going to be far more helpful in evaluating what you want to study. The broadest lines I can draw (with the caveat that these are really fuzzy bounds) is that bioinformatics tends to be focused on drawing insights out of big data (lots of the "omics" - genomics,epigenomics, proteomics, microbiomics), while systems biology tends to be about modeling complex systems this work usually involves a lot of bioinformatics. And computational bio is an umbrella term for everything, though sometimes the umbrella is weighted more to bioinformatics than systems stuff.

Might make sense to ak yourself what kind of work you want to do first. As in, do you want to have a few terabytes of data dumped in front of you that you run analytics on to find correlations? Do you want to study interactions between networks in biology and tease apart the bug picture? Or do you want to narrow in on individual protein domains and use neural nets to simulate protein folding to characterize and engineer individual sequences of amino acids that are important to understand? All of those involve computation amd involve bio, and it'll probably be easier to decide on courses/programs after you have something like that in mind and can evaluate the syllabi directly


A lot of the stuff in the repo is pretty marginal, from the point of view of mainstream molecular/cell/developmental biology, so i don't think there is a reliable systematic way to find it.

In particular, that repo collects what are basically discrete maths approaches to biology: representing living things as systems of symbols rather than differential equations. I have always found that approach intuitively appealing - something about biological robustness meaning you have an opportunity to ignore a load of quantitative details and focus on the underlying structure. But in twenty-five years of being vaguely interested in it, i have never seen a really productive application of that approach, outside of treating DNA as a string of symbols.

Still, perhaps 'marginal' is just another way of saying 'cutting-edge'. I think it's most likely to show up in elite research institutes where people can do slightly out-there stuff, or in explicitly cross-disciplinary institutes or programmes.

The specific terms you mention have different meanings to me:

bioinformatics - treating DNA, RNA, and protein sequences as text and applying computation to them, eg searching, phylogeny, structure and function prediction

computational biology - various approaches to simulating cells and tissues, usually involving numerically evaluating differential equations at some level, eg how morphogens cause tissue patterning

computational biophysics - computational chemistry but for large biomolecules, eg simulating how proteins work

systems biology - smoke and mirrors used to obtain grants

But biologists aren't really into rigorous definitions and fixed boundaries, so you might find interesting stuff within any of these.


I'm curious about your thoughts (if you have any) on the boolean modeling formalism. Basically, you represent bio-molecules as having two states: active and inactive. Their states change according to boolean logic update rules that are determined by the state of other molecules in the system. You end up with a very simple dynamical system. Theoretical biologists have been working with Boolean models for >15 years [1]. Boolean circuits also have some pretty deep connections to theoretical computer science [2]. The goal of this very simple formalism is to get the structure of a system, while retaining much of the quantitative behavior. How productive this is depends on your perspective, I guess.

Also, that's a pretty uncharitable view of systems biology. It seems clear to me that understanding even moderately complex phenotypes practically requires models of biology that include many molecules, with significant feedback loops. Further we see emergent biological behavior across multiple scales of time and space, from milli-second long protein-protein interactions to multi-year developmental processes. Systems biology is basically just studying biology while taking all of that into account. That seems worth studying to me, especially given the ineffectiveness of our current therapies in managing most diseases.

So, I'm interested in what parts of systems biology you would describe as "smoke and mirrors".

[1] https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6388622/pdf/nih... [2] https://www.quantamagazine.org/mathematician-solves-computer...


It's definitely not a charitable view of systems biology, i admit. I furiously agree that understanding living things requires including many molecules, feedback loops, multiple scales, all that stuff (and people always forget to include physical forces!).

What i'm so far unconvinced of is that formal modelling of that is more useful than just thinking about it in the usual way. The acid test of a formal model, for me, is the ability to make insights or reliable predictions that are useful, and that couldn't be done without the formal model. Do models actually do that?

That fly wing paper is nice - i remember reading that von Dassow et al paper from 2000 when i was an undergraduate! It's a really satisfying read (for me, a skim read right now, i confess). But what does it tell us about the fly's wing that we didn't know already? Don't mistake knowledge of the model for knowledge of the thing. The bit about steady states doesn't really seem physiologically relevant to anything.

At some point i should add a disclaimer that i got out of cell biology over a decade ago, and was only ever a bench scientist. These are strictly the opinions of an ill-informed amateur,

What this reminds me of more than anything is category theory. Category theory is a set of formal tools for modelling all sorts of things that are interesting to programmers and computer scientists (and beyond!), and when you make those models, they have an enormous intuitive appeal - you look at the model and think yes, that is the essence of what this thing is! The models kick you right in the brain in the same way that those boolean network models do. You can push arrows around and make the models recapitulate the behaviour of the real thing. But that's as far as it goes - the models describe, but in practice, they don't explain or predict in any useful way.

But perhaps your point about productivity depending on perspective is the key. We haven't traditionally had models like these in biology. So perhaps it's just that we don't recognize them as knowledge, because they don't look like the kind of knowledge we had before? I don't know. I'm even more out of practice at metaphysics than cell biology.


> bioinformatics, systems biology, computational biology

Of those bioinformatics is more specific (usually genomics data); the other two are overlapping and pretty non-specific terms.

For example I started a Sys Bio PhD and ended up in a Comp Bio research group. A friend started the same way but ended up in control theory/microbiology.

The title and even the department are somewhat arbitrary and more to do with the organisation at the university than anything else (e.g. I was in CS but my friend was Engineering I think).

If you can find a good interdisciplinary course they will be familiar with people moving around depending on their interests.


Lots of statistic. Especially anything to do with clustering, and how to analyze and compare gene and protein expression networks from various species, or tissues. Those are particularly important in this COVID-19 era.

In term of general network analysis and visualization, I think logic, symbolism, Petri net, are underutilized. Probably because they require more CS and Math type of trained people to work on those than your typical biologist becoming bioinformatician learned to handle.


Where do you live where there is a citizen bio lab? I'm considering a move in the near future and would be interested in having a place like that available to me.


I live in New York. If you're interested in checking it out, GenSpace is really fantastic:

https://www.genspace.org/


I was wondering if there was any work in the field of evolving 'creatures' moving in a virtual space, beyond what Karl Sims did 23 years ago. https://www.karlsims.com/evolved-virtual-creatures.html https://www.karlsims.com/galapagos/index.html

It would be nice if Karl Sims could open source it as it is really inspiring visual example in the field of Artificial Life, next to seeing the generations of metuselahs unfolding in Conway's game of life

Another interesting article in the field of Artificial Life: https://arxiv.org/pdf/1803.03453.pdf


"Flexible Muscle-Based Locomotion for Bipedal Creatures": https://www.goatstream.com/research/papers/SA2013/


Colleagues in my research group are working on evolving 'creatures' in physical space, e.g. evolvable robots [0]. They address deep questions regarding the interplay between evolution/the body, learning/the brain and the environment using evolutionary algorithms and machine learning.

[0] https://www.york.ac.uk/robot-lab/are/


Emergent Tool Use from Multi-Agent Interaction[1]. Also these agents lay a claim to being the most god damned cutest things ever.

[1]https://openai.com/blog/emergent-tool-use/


Haha, great aptronym. :)


"What Bodies Think About: Bioelectric Computation Outside the Nervous System"

https://www.youtube.com/watch?v=RjD1aLm4Thg

https://news.ycombinator.com/item?id=18736698

and

"Team Builds the First Living Robots, Tiny 'xenobots' assembled from cells..."

https://www.uvm.edu/uvmnews/news/team-builds-first-living-ro...

https://news.ycombinator.com/item?id=22040150


I LOVE the first talk, I think I discovered it on HN. I study bioinformatics and had no idea something like that existed.


I'd wager it's the most important and most obscure research being done today. Can you imagine being able to regrow a limb!? :-D


A-Life 2020 is virtual this summer (Jul 13-18). For all interested in lifting the veil between its and bits ;)

https://vermontcomplexsystems.org/events/ALIFE-2020/


Do you think that software should be architected as autonomous agents for it to scale infinitely? I watched an Alan Kay video [1] some time back. In the video, he had an argument about how software systems cannot scale unless the basis is the most complex "computation" system that we know -- that system being our own biological system.

[1]: https://www.youtube.com/watch?v=NdSD07U5uBs


To limit complexity in complex systems design, you need to be able to create agents which perform simple functions based on well-defined inputs. You can have a few different types of those agents interacting, and each should be discrete and be able to "survive" in an adequate "environment". Then the system, if designed correctly, can become much greater than the sum of its parts but you retain the relative simplicity to monkey around with the internals of the agents as well as the reservoirs to which they're attached, etc. Nature has seemed to become a system where iterative improvement is performed by virtue of the finite life cycle and sexual reproduction (including all the ways that DNA shuttles around the necessary source code).


I love the neuron simulator linked under McCulloch and Pitts [0]. Very fun to play around with and build your own networks. Reminds me of Conway's Game of Life and Minecraft redstone.

0. https://github.com/prathyvsh/formal-systems-in-biology


At the end it just says:

> Prior Art

> Ramón y Cayal

> Golgi

Those guys did a hell of a lot of stuff, so sure!


Wow - takes me back to my PhD - I think I must have read everything on that list that was published before 2000.

Wonderful stuff!

Steve


hey Steve,

What was your thesis about?


I'm not sure why, but I feel certain that Ilya Prigogine's work on non-equilibrium systems deserves to be on this list.


Thanks for the note! I have added this to research section.


Looks interesting, gotta take a closer look in the summer.

I think it's Stanislaw (Ulam) not Stainslaw.


it should be Stanisław Ulam if we want to be hyper precise :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: