Properties of Content-Based Networks
PhD Thesis by Duygu Balcan
Supervised by Prof. Ayse Erzan
March 2007
The research we present in this thesis has been devoted to the modelling and understanding of transcriptional gene regulatory networks, on the basis of an information theoretical approach.
Transcriptional gene regulation involves special proteins, namely the transcription factors, which bind to the DNA by recognizing specific subsequences, namely the transcription factor binding sites, embedded in them. We have modelled the transcriptional regulation network of yeast within this approach by associating random linear codes with the genes of the organism represented by nodes in our content-based network, and establishing edges between the nodes if and only if they share a certain amount of information, which has been realized via a sequence-matching rule. The distribution of the amount of shared information, which has been represented by the bitwise Shannon information of the random linear codes associated with the binding sequences and the promoter regions, are the most important biological inputs to our content-based model.
We have made a very careful analysis of the transcriptional regulation networks of yeast, and compared their topological features with those of the ensemble of our content-based networks. We have observed that our content-based model is able to reproduce all the global topological features of these networks, which provides us with an understanding of their emergent nature. We conclude that the complex networks of gene regulation can arise spontaneously even with the random codes, so they do not need to be constructed from scratch by evolutionary mechanisms.
We have also introduced the hidden-variable version of our content-based model involving only the pairwise connection probabilities as a function of the string lengths and observed that this model is able capture the main properties of our double-string model. So the analytical calculation on the hidden-variable model can provide us with making some predictions on the further properties of real networks.
Very close topological similarities between the content-based models and genetic regulatory networks have led us to consider a modified random Boolean dynamics on our content-based networks, which we believe will help us with the understanding of the relationship between the architecture of the underlying network and the function of these systems.
Our results point to further promising research problems in biological systems, where interactions between different components require the fulfillment of a series of constraints, which means the exchange of a certain amount of information. Examples are immune systems and protein interactions.
----------------------------------------------------------------------------------------------
Properties of Content-Based Networks