The first entry on this list is McCulloch-Pitts nerve nets whose expressive power were analysed by S.C. Kleene [1]. In his article he coined the term "regular events" for the class of languages that could be expressed by nerve nets/finite automata, and this is where regular expressions got their name from. If you have ever thought the name was strange, rest assured that Kleene didn't actually like it either, he just couldn't think of something better at the time:
> We shall presently describe a class of events which we will call "regular events" (We would welcome any suggestions as to a more descriptive term.*)
> [...]
> * McCulloch and Pitts use the term "prehensible," introduced rather differently; but since we did not understand their definition, we did not adopt the term.
So, had McCulloch and Pitts been a bit clearer in their seminal paper, then maybe it would have been called "prehensible expressions" :).
This stuff got me thinking, if RegExp comes from graphs, why not use it to process graph databases. This paper is about a year old https://arxiv.org/pdf/1904.11653.pdf Can anyone point to open source implementations?
This maybe a rudimentary question, but if someone was going to study this at a university level, what would they study?
I ask because I'm starting my masters in CS, but I've also been going to workshops/events at a local citizen bio lab and really enjoying it. I'd really like to go deeper into the cross-section of CS and Bio, specifically the kinds of things listed in this repo (modeling biological phenomenon as formal systems, using computation to simulate those systems, etc.)
But when I look at potential programs to pursue after my CS course, I get a bit lost in all the different titles—bioinformatics, systems biology, computational biology, etc. It's hard for an outsider in the field to discern any meaningful delineation. Does anyone with experience in the field know what category of study these resources would fall under, from a university perspective?
I recommend looking at Luca Cardelli's work [1]. He's probably the leading researcher in the intersection of CS and biology, in the sense of modelling biological phenomenon as formal systems. Some of his lectures are online [2]. He's extremely approachable in my experience, so don't hesitate to contact him if you want to educate yourself more in this field.
His paper Abstract Machines of Systems Biology [1] is a wonderful, only slightly outdated, of what the field might be able to achieve in the future.
I think and hope program analysis, abstract interpretation and friends will make a comeback to biology. Definitely topics covered in CS not in bioinformatics, which IMHO tends to be too applied for a first or second degree. I'd rather stick to the basics.
It varies from place to place, the better schools generally offer a more rigorous combination of CS and Biology, the reason why it may appear applied is mostly because Bioinformatics offered on MOOCS like EdX tend to only offer courses that fall under its specific subject code, giving the impression the entire field is about string manipulation. This is the same with other interdisciplinary subjects like mechatronics, they are generally a combination of two majors but courses exclusive to the particular interdisciplinary subject are lot more rare. Computational biology is a lot more than string processing and crunching through PDEs (synthetic biology), there is a tremendous amount of machine learning used in the current state of the art proteomics and drug synthesis research.
The lines between all of those fields is super blurry. Blurry lines between human-defined and ultimately somewhat meaningless categories is intrinsic to the field I'm afraid. The syllabus for each course with a listing of the actual topics you study is going to be far more helpful in evaluating what you want to study. The broadest lines I can draw (with the caveat that these are really fuzzy bounds) is that bioinformatics tends to be focused on drawing insights out of big data (lots of the "omics" - genomics,epigenomics, proteomics, microbiomics), while systems biology tends to be about modeling complex systems this work usually involves a lot of bioinformatics. And computational bio is an umbrella term for everything, though sometimes the umbrella is weighted more to bioinformatics than systems stuff.
Might make sense to ak yourself what kind of work you want to do first. As in, do you want to have a few terabytes of data dumped in front of you that you run analytics on to find correlations? Do you want to study interactions between networks in biology and tease apart the bug picture? Or do you want to narrow in on individual protein domains and use neural nets to simulate protein folding to characterize and engineer individual sequences of amino acids that are important to understand? All of those involve computation amd involve bio, and it'll probably be easier to decide on courses/programs after you have something like that in mind and can evaluate the syllabi directly
A lot of the stuff in the repo is pretty marginal, from the point of view of mainstream molecular/cell/developmental biology, so i don't think there is a reliable systematic way to find it.
In particular, that repo collects what are basically discrete maths approaches to biology: representing living things as systems of symbols rather than differential equations. I have always found that approach intuitively appealing - something about biological robustness meaning you have an opportunity to ignore a load of quantitative details and focus on the underlying structure. But in twenty-five years of being vaguely interested in it, i have never seen a really productive application of that approach, outside of treating DNA as a string of symbols.
Still, perhaps 'marginal' is just another way of saying 'cutting-edge'. I think it's most likely to show up in elite research institutes where people can do slightly out-there stuff, or in explicitly cross-disciplinary institutes or programmes.
The specific terms you mention have different meanings to me:
bioinformatics - treating DNA, RNA, and protein sequences as text and applying computation to them, eg searching, phylogeny, structure and function prediction
computational biology - various approaches to simulating cells and tissues, usually involving numerically evaluating differential equations at some level, eg how morphogens cause tissue patterning
computational biophysics - computational chemistry but for large biomolecules, eg simulating how proteins work
systems biology - smoke and mirrors used to obtain grants
But biologists aren't really into rigorous definitions and fixed boundaries, so you might find interesting stuff within any of these.
I'm curious about your thoughts (if you have any) on the boolean modeling formalism. Basically, you represent bio-molecules as having two states: active and inactive. Their states change according to boolean logic update rules that are determined by the state of other molecules in the system. You end up with a very simple dynamical system. Theoretical biologists have been working with Boolean models for >15 years [1]. Boolean circuits also have some pretty deep connections to theoretical computer science [2]. The goal of this very simple formalism is to get the structure of a system, while retaining much of the quantitative behavior. How productive this is depends on your perspective, I guess.
Also, that's a pretty uncharitable view of systems biology. It seems clear to me that understanding even moderately complex phenotypes practically requires models of biology that include many molecules, with significant feedback loops. Further we see emergent biological behavior across multiple scales of time and space, from milli-second long protein-protein interactions to multi-year developmental processes. Systems biology is basically just studying biology while taking all of that into account. That seems worth studying to me, especially given the ineffectiveness of our current therapies in managing most diseases.
So, I'm interested in what parts of systems biology you would describe as "smoke and mirrors".
It's definitely not a charitable view of systems biology, i admit. I furiously agree that understanding living things requires including many molecules, feedback loops, multiple scales, all that stuff (and people always forget to include physical forces!).
What i'm so far unconvinced of is that formal modelling of that is more useful than just thinking about it in the usual way. The acid test of a formal model, for me, is the ability to make insights or reliable predictions that are useful, and that couldn't be done without the formal model. Do models actually do that?
That fly wing paper is nice - i remember reading that von Dassow et al paper from 2000 when i was an undergraduate! It's a really satisfying read (for me, a skim read right now, i confess). But what does it tell us about the fly's wing that we didn't know already? Don't mistake knowledge of the model for knowledge of the thing. The bit about steady states doesn't really seem physiologically relevant to anything.
At some point i should add a disclaimer that i got out of cell biology over a decade ago, and was only ever a bench scientist. These are strictly the opinions of an ill-informed amateur,
What this reminds me of more than anything is category theory. Category theory is a set of formal tools for modelling all sorts of things that are interesting to programmers and computer scientists (and beyond!), and when you make those models, they have an enormous intuitive appeal - you look at the model and think yes, that is the essence of what this thing is! The models kick you right in the brain in the same way that those boolean network models do. You can push arrows around and make the models recapitulate the behaviour of the real thing. But that's as far as it goes - the models describe, but in practice, they don't explain or predict in any useful way.
But perhaps your point about productivity depending on perspective is the key. We haven't traditionally had models like these in biology. So perhaps it's just that we don't recognize them as knowledge, because they don't look like the kind of knowledge we had before? I don't know. I'm even more out of practice at metaphysics than cell biology.
> bioinformatics, systems biology, computational biology
Of those bioinformatics is more specific (usually genomics data); the other two are overlapping and pretty non-specific terms.
For example I started a Sys Bio PhD and ended up in a Comp Bio research group. A friend started the same way but ended up in control theory/microbiology.
The title and even the department are somewhat arbitrary and more to do with the organisation at the university than anything else (e.g. I was in CS but my friend was Engineering I think).
If you can find a good interdisciplinary course they will be familiar with people moving around depending on their interests.
Lots of statistic. Especially anything to do with clustering, and how to analyze and compare gene and protein expression networks from various species, or tissues. Those are particularly important in this COVID-19 era.
In term of general network analysis and visualization, I think logic, symbolism, Petri net, are underutilized. Probably because they require more CS and Math type of trained people to work on those than your typical biologist becoming bioinformatician learned to handle.
Where do you live where there is a citizen bio lab? I'm considering a move in the near future and would be interested in having a place like that available to me.
It would be nice if Karl Sims could open source it as it is really inspiring visual example in the field of Artificial Life, next to seeing the generations of metuselahs unfolding in Conway's game of life
Colleagues in my research group are working on evolving 'creatures' in physical space, e.g. evolvable robots [0].
They address deep questions regarding the interplay between evolution/the body, learning/the brain and the environment using evolutionary algorithms and machine learning.
Do you think that software should be architected as autonomous agents for it to scale infinitely? I watched an Alan Kay video [1] some time back. In the video, he had an argument about how software systems cannot scale unless the basis is the most complex "computation" system that we know -- that system being our own biological system.
To limit complexity in complex systems design, you need to be able to create agents which perform simple functions based on well-defined inputs. You can have a few different types of those agents interacting, and each should be discrete and be able to "survive" in an adequate "environment". Then the system, if designed correctly, can become much greater than the sum of its parts but you retain the relative simplicity to monkey around with the internals of the agents as well as the reservoirs to which they're attached, etc. Nature has seemed to become a system where iterative improvement is performed by virtue of the finite life cycle and sexual reproduction (including all the ways that DNA shuttles around the necessary source code).
I love the neuron simulator linked under McCulloch and Pitts [0]. Very fun to play around with and build your own networks. Reminds me of Conway's Game of Life and Minecraft redstone.
[1] https://www.rand.org/content/dam/rand/pubs/research_memorand...