Absolutely elementary programming: Python at code(a)cademy

For the past few weeks, I’ve been grappling with this wonderful course offered on Coursera. I’m about halfway through. I’d want to say I’m feeling pleased with my progress, but this course turns out to be a part of an 8-part bioinformatics specialization program offered by the University of California, San Diego. That would mean I’ve been able to maneuver through a whopping one-sixteenth of the whole thing at best. To top it all off, the Bioinformatics for Beginners course that I’m doing isn’t even part of the specialization- it’s a

. . .gently-paced introduction to our Bioinformatics Specialization, preparing learners to take the first course in the Specialization. . .

Meaning what I’ve been doing is a preparation to the first step in the actual thing.

The goal of this course is to ease full-blooded biologists (like myself, and perhaps the mythical readership of the blog) into the world of programming (more specifically, Python). The course has you go through a number of coding problems, but they are only presented in the context of real-world biological problems. This is definitely a whole lot better than learning to code from any old programming course, where the biologist would fail to see the utility of solving coding problems until substantial progress is made. Sure, it’s fun to play around with loops and lists and whatnot, but like the newbie guitar player- the new (and admittedly inept) coder can only enjoy themselves so much. The Coursera course is very successful in firmly establishing the biological relevance of programming. That was enough to hold my intrigue.

Even then, there’s some debate to be had about exactly how friendly the course is to absolute beginners to programming. Exhibit A would be the comment section of their exercises. The forest is thick with frustration, lit only by the occasional glimmers of some hard-won Eurekas.

Which brings us to where we are now. This post is not a review of the course itself, it’s about something yet more upstream: Codecademy.org.

Codecademy’s course on Python is probably the most accessible introduction to the topic. Without some exposure to coding, it’s impossible to even cut your teeth on problems in biology. Yes, even the preparation to the specialization requires another preparation. No one said this was gonna be easy.

Codecademy provides that training wheels platform. What’s more, the Coursera course liberally uses Codecademy as a reference- directing their students to solve this or that section on Codecademy in order to continue on to the next parts of the course.

python codecademy.png

With Codecademy, soon you too will be printing strings like an absolute pro.

The Codecademy interface is just lovely. It’s incredibly well-paced, starting from the very basics and slowly working its way up to more complex problems. It’s littered with references to Monty Python (liberally utilizing immortal sketches like The Dead Parrot or Argument Clinic) and cute animals- both of which are immediate wins.

Going back to the interface though- Codecademy almost functions as a coloring book. In the beginning, it tells you exactly what to do, baby step by baby step, and awards brownie points for every little line (or even word) of code you write. Even near the end of the ~13 hour material, it hesitates to let go of your hand, making sure to hit you with the occasional reminder, just in case.

To top it all off, the latest version of the program comes with a feature which basically offers the user the solution if they run the wrong code four times- which means the particularly sneaky among you can basically cheat through the whole thing without learning to tell apart a string from a list.

I know none of y’all will do that, though- that’s why I got mad love for you. It’s a rap phrase.

The course still has its problems though. The level of spoon-feeding may strike someone as running counter to the how one is supposed to learn to code (i.e. by learning to think). I understand that criticism, but I disagree with its thrust. General people conceptualize working on computers in a particular way- you click on cute icons, things happen in terms of other cute icons. The idea that you can simply “write” stuff into being in a computer can be very counterintuitive. As such, there’s a need to have an environment where all you do is get exposed to that one idea. Even if you’re being spoon-fed, even if you’re being told what to every step of the way, for the coder spirit to germinate within you- for you to reach that point where you find coding fun- you absolutely need that exposure, that getting-used to. We’re crossing into a different world here, and I feel people criticizing Codecademy’s approach underestimate that chasm. Heck, that’s how I learned how to divide. No one could get it through my fat skull until grandma took pity on me and started telling me where to put every digit. You do that enough times, and at one point patterns begin to emerge.

One different, but important problem that people have pointed out has to do with the width of material covered on Codecademy. Trying to hit the sweet spot between being short enough to keep people involved and yet comprehensive enough to give a full view of the Python landscape- Codecademy often doesn’t have the luxury to repeat certain lessons. You might deal with dictionaries in one introductory lesson, and not deal with certain aspects of working with it until much later.

Well, the solution to that, unfortunately, is to just continuing to revisit the course. I think the Practice makes Perfect module (75% of the way through the course) is a pretty good indicator of the progress you’ve made. I could solve most (not all) of the problems, and found that as being occasion enough for a meaty pat in the back.

All in all, Codecademy receives a good review in my book. Will there be teething pain going through all of the modules? Definitely. Would it be worth it? You bet. Even if you give up somewhere in the middle, you can always come back and finish the rest.

All that’s needed is being consistent, and never giving up ever. Here’s some inspirational footage for you to aid in your noble journey.



That mysterious second chromosome

There was a time when I thought bacterial cells were permitted only one chromosome each. I took that as one of the foundational axioms of biology, right up there with the genetic code being universal (wrong) and bacteria lacking internal compartmentalization or organelles (also wrong).

As it turns out, some bacteria come packing with more than one chromosome. Not plasmids, chromosomes. A number of questions arise in the wake of this paradigm shifter, and Egan et al (2005) provide a nice little barrage thereof:

What is a chromosome? Are there common features of multipartite genomes? Are the mechanisms of replication of each chromosome identical? Is the timing of replication co-ordinated among multiple replicons? How are multiple chromosomes faithfully partitioned to daughter cells? Are divided genomes stable? How did multipartite genomes evolve?

(Aside- I got reminded of this from that long list of questions)

So let’s talk about that last question a little. We’ll be talking exclusively about the dual chromosome carrying Vibrio, because it’s this genus that has been studied the most as far as multipartite genomes are concerned.

Whence cometh #2?

The hypothesis we’ll be exploring today is that the second chromosome of Vibrio was originally a plasmid, and made the conversion to being a chromosome relatively recently. Different lines of evidence converge on that conclusion. The first and most obvious clue comes from description of the chromosomes themselves. Consider what would happen if Vibrios harbored the two chromosomes from deep evolutionary past. Wouldn’t we expect both of the chromosomes to bear important genes? Yes, because there would be no reason for evolution to “bias” the distribution of genes required for survival to only one chromosome. But that’s not what you see in the case of Vibrio– the majority of the housekeeping and metabolic genes are found on the larger chromosome. This chromosome also carries all the genes of Vibrio that we hear about in clinical contexts, the cholera toxin gene, the toxin-corregulated pilus gene, and even the master transcription regulator gene (toxR). This indicates that Vibrios originally had only one chromosome which encoded all the genes needed for its survival. Somewhere down the line, it picked up a plasmid, and that plasmid eventually accumulated some essential genes itself. This results in something of a dominant vs. minor chromosomal hierarchy, something that a plasmid origin can handily account for.

vibrio chromosomes

The two chromosomes of Vibrio. Note the difference not only in size but also in gene distribution. Fun fact: chromosome 2 encodes a much higher percentage of “hypothetical proteins” compared to its cousin.

This dovetails into the question of how a chromosome is defined to begin with. What do we mean when we say chromosome 2 was once a plasmid, and then it became a chromosome?

Egan et al (2005), whose nice review I cited above, lists two criteria. First, a plasmid doesn’t carry any gene that is essential for the organism’s survival under normal conditions. Plasmids aren’t essential for the usual life of a cell- by definition, they’re wayward gene shuttles that semi-fortuitously make their ways into cells. The moment they start carrying housekeeping genes is the moment they become essential to the cell, and hence get promoted to chromosomehood. By this criterion, even tiny “plasmids” can become chromosomes- Buchnera sp. carries a 7.8 kb “chromosome” because they carry the only copy of the leucine biosynthesis gene.

Second, a plasmid’s replication isn’t tied to the cell cycle. Plasmid replication is regulated not by when the cell is ready to split, but via a feedback mechanism based on the extant number of plasmids in the cell. Chromosome replication, however, has to occur during a particular phase of the cell cycle.

Other evidences

More compelling evidences for the ex-plasmidhood of chromosome 2 is also available. For example, the two chromosomes have different replication requirements. Their replication origins look nothing like each other- chromosome 1 has a replication origin that is similar to that of E. coli‘s oriC region, while no such similarity is there in the ori of chromosome 2. We’d expect differences like this to “balance out” over evolutionary time- it’s always more efficient for a cell if its chromosomes have similar replication programs. After all, having chromosomes with multiple replication requirements means the cells would have to accommodate both of them in the cell cycle window. That’s something of a balancing act.

There are also differences in what specific protein factors are required for the two chromosomes to initiate replication. In fact, some evidence suggests the replication requirements of chromosome 2 are shared by some plasmids- providing more evidence to the plasmid origin hypothesis. Additionally, the chromosome 2 also carries toxin-antitoxin loci, i.e. they harbor genes encoding a toxin as well as its neutralizer (antidote, if you will). This is thought to be a common feature associated with plasmids.

There’s one catch, however. Both the chromosomes have a remarkably similar GC content (approximately 47%). This is kind of puzzling if the two chromosomes did indeed have different origins- after all, how did the GC content of both “balance out” over time to become so conspiratorially similar?

Perhaps there is something to be said for the ‘long cohabitation of two chromosomes’ hypothesis after all.

References and further reading

Egan, E. S., Fogel, M. A., & Waldor, M. K. (2005). MicroReview: Divided genomes: negotiating the cell cycle in prokaryotes with multiple chromosomes. Molecular microbiology56(5), 1129-1138.

Shakhnovich, E. A., & Dziejman, M. (2008). Genomics of Vibrio cholerae and its Evolution. Vibrio Cholerae: Genomics and Molecular Biology, 9.

Trucksis, M., Michalski, J., Deng, Y. K., & Kaper, J. B. (1998). The Vibrio cholerae genome contains two unique circular chromosomes. Proceedings of the National Academy of Sciences95(24), 14464-14469.

How mobile are antibiotic resistance genes found in the environment?

Once we know that antibiotic resistance genes are widespread in the environment, there are two sorts of scientifically interesting questions that can be asked.

  1. What are the resistance genes found in the environment?
  2. How much of a threat do they pose to us humans?

Considerable work has already been done on question 1. This is a nice representative (and open access too!). The paper serves as a nice metagenomic systematic review if you will- it pools a large number of the metagenomes available in the databases, then aligns them against particular genetic elements databases which are directly or indirectly indicative of the presence of resistance genes. It’s basically like casting an iron net into a stack of hay containing the occasional magnets, then finding out which bits of the net stuck to which magnets. Maybe we could have a longer and less picturesque discussion on this some other day. In this blog post, I want to touch on question 2- although admittedly, I’ve just started reading about these topics and my views may very well evolve substantially.

The question of the threat of environmental isolates (=non-pathogens) bearing resistance genes boils down to whether their resistance elements can be shuttled off into human pathogens. If the resistance genes are transferable to pathogens, then that means the environment can act as a supply store for pathogens looking to get resistant. So how do we know if genes are “transferable” in this way from the environment to pathogens?

This review, based on the findings of these authors, propose two basic criteria.

  1. If the resistance gene is surrounded by mobile genetic elements (aka jumping genes), then the region between the elements can jump off elsewhere. There are some truly intriguing molecular “jumping” mechanisms which I hope I’ll be writing about some day, but the point for now is- these mobile elements being present in the neighborhood of antimicrobial resistance genes indicate that the genes could be plausibly transferred.
  2. High level of similarity between corresponding genes found in both types of isolates means there’s a (very) recent evolutionary relationship. So if you find two close to identical beta-lactamase genes in two sorts of isolates, it’s a safe bet to assume the element was transfered from one to the other.

Here’s the case study. The paper from Science I cited above compares a genetic region among a number of pathogens and a soil (environmental) isolate. This is how the comparison looks (taken from the Nature Reviews paper):


This picture captures both of the criteria I mentioned above.

First, in each of the sequences, the green bits indicate mobile genetic elements- meaning the blue and red resistance genes show potential of being shuffled off elsewhere.

More interestingly, not only does the non-pathogenic soil bacteria genome come with genes flanked with the green bits, but this particular gene order, to varying extents, shows up in the clinical isolates as well. Preservation of the order of genes in this way is called synteny. This indicates not only could the soil microorganism’s gene have been transferred, but it did in fact get transferred- only that can explain the precarious order of the genetic elements being preserved across all isolates.

Second, the blue shaded regions indicate regions of high (>99%) similarity. This too bolsters the earlier conclusion- if gene order weren’t enough, large parts of the sequences themselves have been near-perfectly preserved across both types of isolates.

These allow us to cumulatively infer not only the transferability, but the actual historical transfer of the resistance elements from the environment to the pathogens.


Antibiotic resistance in the environment

WatchMojo has a nice video on historical predictions that turned out to be false. #3 is pretty interesting:

The time has come to close the book on infectious diseases.

Those words were spoken back in 1969 by the Surgeon General of the US. I don’t doubt that they were reflective of the microbiological zeitgeist back thenthe virtual onslaught of penicillin against anything infectious was a thing of wonder.

Of course, this era was soon followed by one of resistance.

Antibiotic resistance: origin story

Our discussions of antibiotic resistance usually take place in clinical contexts, which is why it may come off as strange to know that antibiotic resistance genes had their origins in the environment. There are so many different lines of evidence substantiating this. Resistance genes to popular antibiotics are found in environmental as well as clinical strains (1). Significantly, antibiotic resistance genes have been isolated from environments which had no earthly exposure to anything clinical- including glacier environments (2) and a cave that has been isolated for over 4 million years (3). In addition to being found in remote environments, antibiotic resistance genes are also ancient. A letter to Nature reported finding a “highly diverse collection of genes encoding resistance to [antibiotics]” in “rigorously authenticated ancient DNA from 30,000-year-old Beringian permafrost sediments” (4). They also ran structural analyses of the ancient variant of a particular resistance gene, and found it to be similar to its modern variant.

This pretty effectively seals the debate: antibiotic resistance genes originated in the environment, well before humans started using commercialized antibiotics in clinical settings.

Why do bacteria need antibiotic resistance in the environment?

Part of the answer to that is simple. Environmental bacteria need antibiotic resistance for the precise reason they need it in clinical settings: protection against antibiotics. Antibiotics have important roles in bacterial ecology- bacteria can secrete them to fend off competition, low levels of antibiotics can modulate cell transcriptome (5), they act as signaling molecules for communication between bacterial cells, and so forth. Given the role antibiotics have to play in bacterial ecology, it’s not surprising bacteria would have defense mechanisms to protect themselves against higher concentrations of the stuff. Heck, antibiotics are even used as nutrients- a report documented the growth of a wide range of bacteria using antibiotics as their sole carbon source (6). Interestingly, these same genera would also have resistance mechanisms against the antibiotics at clinically relevant (i.e. harmful) levels.

This still doesn’t address some parts of the riddle. Antibiotic resistance genes are surprisingly widespread in nature- even found in the genomes of bacteria which don’t produce antibiotics. Consider the MDR efflux pump, a hallmark of a wide range of multi drug resistant bacteria. This is found in a lot of environmental bacteria, even those who don’t produce antibiotics.

This goes to the question of what even is an antibiotic resistance gene. As the name suggests, the efflux pump’s actions are rather general- it pumps things out of the cell, which may or may not include antibiotics. A shovel is perfectly capable of bashing people’s heads in, just that it’s not commonly used that way among people of high culture. Similarly, garden variety furniture in the bacterial cell can sometimes be conscripted for the service of antibiotic resistance. This is especially notable among opportunistic bacteria with large genomes who have genes to adapt to a whole host of different environments. They may, quite unintentionally, come packing with some metabolite-modifying enzymes which double as antibiotic resistance machinations. This blurs the definitional lines considerably- perhaps a shovel, divorced from its context, has no intrinsic form that defines its function. Fight me Aristotle.

Shift from environment to clinic

A clinically relevant case in point- Providential struartii has an enzyme to modify the bacterial peptidoglycan (2′-N-acetyltransferase) (1). Peptidoglycan is a component of the bacterial cell wall, as garden variety a molecule as they come. However, due to its similarity to the antibiotic compound gentamycin, our boy P. struartii unintentionally becomes equipped to deal with gentamycin in the environment as well.

Now consider some consequences down the line. In the context of a cell, every whir in every cog in the overarching metabolic network displays a finely tuned balance. This is because metabolic networks tend to be integrated systems, and change in one component (say the expression of an enzyme) may translate to a larger change in another. 2′-N-acetyltransferase is tightly regulated in the biochemical pathway in which it normally finds itself. But what if the gene is shipped off to another bacteria via a plasmid? There is no more biochemical context, much less fine-tuning. The enzyme is now constitutively expressed. In its new host, the enzyme classifies strictly as an antibiotic resistance mechanism. A shift (or stripping away) of context thus results in a functional upheaval as well. Indeed, certain antibiotic genes found on integrons come bundled with strong promoters.

This is a very basic example where a bacterial gene found in environmental strains that had nothing to do with antibiotic resistance becomes a resistance gene, triggered by an initial overlap in function. Not much was needed as modes of persuasion, just “shippability” via gene capture units and a consequent loss of regulatory context.


  1. Martínez JL. Antibiotics and antibiotic resistance genes in natural environments. Science. 2008 Jul 18;321(5887):365-7.
  2. Segawa T, Takeuchi N, Rivera A, Yamada A, Yoshimura Y, Barcaza G, Shinbori K, Motoyama H, Kohshima S, Ushida K. Distribution of antibiotic resistance genes in glacier environments. Environmental microbiology reports. 2013 Feb 1;5(1):127-34.
  3. Bhullar K, Waglechner N, Pawlowski A, Koteva K, Banks ED, Johnston MD, Barton HA, Wright GD. Antibiotic resistance is prevalent in an isolated cave microbiome. PloS one. 2012 Apr 11;7(4):e34953.
  4. D’Costa VM, King CE, Kalan L, Morar M, Sung WW, Schwarz C, Froese D, Zazula G, Calmels F, Debruyne R, Golding GB. Antibiotic resistance is ancient. Nature. 2011 Sep 22;477(7365):457-61.
  5. Fajardo A, Martínez JL. Antibiotics as signals that trigger specific bacterial responses. Current opinion in microbiology. 2008 Apr 30;11(2):161-7.
  6. Dantas G, Sommer MO, Oluwasegun RD, Church GM. Bacteria subsisting on antibiotics. Science. 2008 Apr 4;320(5872):100-3.

Systematic review and Meta-analysis: a brief introduction

I recently completed a short course on Systematic Review and Meta-Analysis offered by the James P. Grant School of Public Health (11th-13th July). It really helped clarify many of the misconceptions I previously had about scientific papers called systematic reviews. Now publishing a systematic review myself doesn’t seem like that distant a prospect. Either way, in this post I talk briefly about what systematic reviews and meta-analyses are. I don’t know if I’ll write more on this particular topic.



Everyone who has written a journal paper or even a thesis knows that we need to preface our work with a “literature review”- a survey of the work that has already been done on the topic. This is done to identify gaps of knowledge in a particular field, so the new knowledge generated by the publication could try and fill that gap.

Systematic review is a very specialized sort of literature review. In a regular review, the author has considerable freedom in choosing which studies he wants to include in his analysis, and also the extent to which he will analyze the studies. A systematic review, on the other hand, must incorporate *all* available studies on the topic that are found in the various scientific databases (Medline, Embase, Web of Science etc). This is why a systematic review is done with a lot of methodological rigor.


For example, the author (after setting out his intended research question), must explicitly specify how he conducted the litearture search by mentioning the keywords he looked for, the databases he mined, and so forth. For example, in this paper, the authors mention their search strategy as follows:

In accordance with the PRISMA guidelines, we identified published studies through a systematic review of Medline (via PubMed), Cochrane database, and EMBASE (via Ovid) from the inception to June 31, 2015, with the following search terms: (“Chlamydia trachomatis”) AND (“cervical carcinoma OR cervical cancer OR cancer of the cervix OR carcinoma of the cervix OR cervical neoplasm OR cervical dysplasia OR cervical intraepithelial neoplasia”). We also checked reference lists and citation histories during the search.

This sort of specialized search strategy cannot usually be devised without expert help, and that’s why all systematic review projects must enlist the help of a search coordinator or information scientist. The author must then go on to mention how many studies his search yielded, how many of them he included in the systematic review, how many of them he excluded- and on what basis.


The PRISMA flowchart for keeping track of excluding and including studies. Each systematic review boasts one of these figures.

This involves a lot of record-keeping, data tracking and collaboration between groups of different expertise, so there are a lot of softwares which are used for searching, screening and assessing the qualities of the studies.

In short, the hassle involved in writing a systematic review is comparable to the hassle one faces when conducting original research: a detailed protocol needs to be submitted, teams need to be built, funds need to be specified, etc. It is often said that a top-notch systematic review cannot be conducted without at least 3 people (one of them being an information scientist), at least one year of time, and at least 100,000 US dollars.


It may seem odd that so much time, money and effort could be poured into the writing of a review which requires no lab space or reagent cost, but once we realize the purpose of writing systematic reviews- the mystery disappears. A systematic review is written to provide conclusive evidence on an issue. When there’s a veritable information overload in a particular field (not surprising, given PubMed has somewhere in the neighborhood of 27 million submissions) experts are expected to conduct a thorough systematic review. Decisions like whether particular drugs get prescribed, certain interventions are carried out, and whether there are statistical relationships between different variables (e.g. Zika virus infection and microcephaly, chlamydia and cervical cancer, urbanization and violence, and so forth) all hinge on the results provided by systematic reviews. Single studies are hardly ever definitive, even a combination of multiple studies in the form of a regular literature review may have reporting bias. It’s only when a systematic review is conducted- a thorough analysis of all the extant evidence and synthesis of a concrete conclusion- that policymakers and stakeholders take note. Given the ambitious purposes of a systematic review, it’s hardly surprising that it’s so methodologically cautious, labor-intensive, and often costly.


A meta-analysis (literally meaning: analysis of analysis) is a sort of statistical analysis that is often carried out in systematic reviews of quantitative studies. A meta-analysis is used to statistically combine the results of all the studies that have been done (and analyzed in the systematic review) and produce a single, overall result. This is usually done by first tallying the results of the studies (either in the form of means/standard deviations or odds ratio), giving a “weight” to each piece of study data depending on how narrow their individual confidence interval was (i.e. how “confident” we can be in the conclusion of a particular study), and then combining the results to produce a total outcome using a Forest Plot. In addition to this, a meta-analysis also judges publication biases by means of a Funnel Plot to make sure the systematic review took studies of different outcomes and sample sizes into consideration.


A forest plot being generated by the software Review Manager 5.


Although a meta-analysis is often done in a systematic review, it’s not an essential part of it. It may not even be feasible- either the data may be qualitative which precludes rigorous statistical analysis of this type, or the studies may be so methodologically varied that it’s impossible to combine their result into a single statistical output. For example, in this systematic review, the authors mention

We deemed a meta-analysis inappropriate due to the heterogeneous nature of the available publications.

In these cases, researchers instead opt for writing what is called a “narrative synthesis”.


Despite every impression that was given above, a systematic review may not always have to be so labor and cost-intensive. It can be, but it doesn’t have to be. For one, there’s a difference between systematic reviews which target sociological questions (e.g. does urbanization lead to violence?) vs. clinical/environmental ones (e.g. does Vitamin C reduce sore throat?). The latter is almost always simpler than the former, because hard science has a lesser number of variables, and the data is often simpler to interpret. Also, while a systematic review which wants to answer a question with a wide scope (e.g. association between causative agents of STDs with cervical cancer) is very complex, one with a narrow scope (e.g. association between Chlamydia trachomatis with cervical cancer) may not be as complex. In other words, there can be labor-intensive, expensive, high-impact systematic reviews, and there can be single-author, free, medium-impact systematic reviews.

For example, this sytematic review has only one author and has been published in a journal with an impact factor of 0.982. So while systematic reviews are indeed complex, it’s not fair to say it’s completely out of reach for everyday researchers with their own day jobs.

So yes, I think it’s a safe bet to say that anyone can indeed write systematic reviews, provided they give the effort a bit of time and energy. It’s a publication strategy not beyond the scope of any of us.

The Epigenetics Revolution by Nessa Carey- Study Notes pt. 5 [Ch. 6]

Previous parts: 1, 2, 3, 4


So in this part we’ll be looking at epigenetic inheritance- more precisely, the question of whether epigenetic marks, be they DNA methylation or histone modifications, are passed on from generation to generation. The relevant experimental model here is, unsurprisingly, the mouse.

We need to bring back the agouti gene mouse talked about in the last part. In an experiment done by Emma Whitelaw, it was seen that a yellow mother only gives birth to yellow or lightly colored pups, never a dark one; while a dark mother may give birth to some dark pups. As was discussed in part 4- this particular phenotypic variation is epigenetic, not genetic. It was because of a retrotransposon being methylated that the agouti gene’s expression was silenced (lack of methylation would mean constitutive expression). So the fact that a phenotypic variation caused by epigenetic mechanisms is heritable goes some distance to prove that epigenetic patterns are in fact inherited.

Well, there are a few more complexities. For any such mother-offspring non-DNA inheritance, there are three possible explanations:

  1. It could be due to DNA methylation and/or histone modification marks being inherited transgenerationally, which is what we’re interested in.
  2. It could be due to the intrauterine environment that the offspring comes to have those peculiar features. In this case, perhaps the agouti gene affects other aspects of the organism, which may include the environment where the fetus is developed, or specific modes in which the fetus gets nutrition, etc. In that case, it’s not really epigenetic marks that are being inherited, but the fetus is affected by the way the mother’s intrauterine environment is set up (which in turn may have been due to the agouti gene, or some environmental effect or other).
  3. It could be that the cytoplasmic environment that the fetus receives from the egg is what’s responsible. The mother passes on quite a bit of cytoplasm in her egg, so maybe that was shaped in a particular way by genetics or environment, which contributed to the fetus’ development.

To rule out (2) above, scientists took fertilized eggs from a yellow mother and implanted it into a dark one and got the same results, showing intrauterine environments weren’t responsible. Complex breeding schemes also ruled out (3), establishing firmly that DNA methylation in the case of agouti gene repression can indeed be inherited transgenerationally. Also, male-to-offspring inheritance of epigenetic marks has also been demonstrated- and that effectively rules out (2) and (3). In addition to not contributing to the intrauterine environment whatsoever, males also contribute very little in terms of cytoplasmic content (the sperm is tiny compared to the egg).

This, of course, brings us to an intriguing possibility. We established in earlier posts that the effects of the environment is preserved in biological organisms via epigenetic marks. But if epigenetic marks are inherited, does it mean environmental changes that happen within an organism’s life are also inherited? Does it, then, give credence to Lamarck’s idea that acquired traits are sometimes inherited? The answer to both of these questions is yes.

Two papers published in Nature and Cell did the most to argue this point. One group worked with male inheritance (to rule out 2 and 3 above) involving a breed of rat. The males were given a high-fat diet and allowed to mate with ordinary females. The former were overweight and had many symptoms of type-2 diabetes. Their offspring, while normal weight, also had many of these symptoms, including irregular metabolism. Another group ran a similar experiment with an inbred mouse strain, where the males were given a very low-protein diet, allowed to mate with normal females, and ended up producing offspring with metabolic abnormalities. This latter group found peculiar epigenetic modifications in the liver of the offspring as well. A remarkable case of transgenerational inheritance was seen among rats, where if a drug is administered to a pregnant rat at the time when the testes are developing, not only do the male offspring show reduced fertility, but the effect is carried over the next three generations.

All of this taken together prove beyond doubt that Lamarckian inheritance is more than just a historical curiosity- it’s a real thing that occurs.

In the first part of this series, we talked about the Dutch Hunger Winter, and how effects due to diet seemed to be transferred across generations. If a mother was underfed in the first trimester, her granddaughter born of the child she was carrying had a higher chance of being overweight. This is weird on the everyday conception of inheritance, seeing the baby she was carrying never underwent malnutrition in her life. One could chalk that up as evidence for Lamarckism, but that couldn’t be said with certainty until these experiments were done. For one, there’s always the question of reliability of old records. In addition, the effects could be explained by the peculiarities of either the intrauterine or cytoplasmic environments. But now with the controlled experiments involving different mammalian models and a number of traits, we’re in a position to say with much more confidence that traits acquired from the environment (in molecular biology terms- DNA methylation of particular genes) are, under certain circumstances, transferred across generations.

The Epigenetics Revolution by Nessa Carey- Study Notes pt. 4 [Ch. 5]

Previous parts: 1, 2, 3

This post (and chapter) has to do with a question we discussed in the opening post of this series: why are identical twins not identical in every way? Consider schizophrenia. Since it’s been proven to have a genetic basis, if one of the twin were schizophrenic- we’d expect the other to be so too. But that happens only half the time. That probability is still tragic, but given that this disease has a genetic basis, and identical twins have identical genomes, shouldn’t we expect both pairs to have the condition if either has it?

Scientists have designed interesting experiments to answer this question. Let’s consider our model first: genetically identical mus musculus. We’re looking at three genetic varieties based on the color of their hair.

Normal mouse- Has banded hair, which means their hair is black at the top and bottom (root), but yellow in the middle. This particular phenotype is due to the agouti gene. In normal mice, the agouti gene is switched on and off cyclically, resulting in this banded pattern.

Mouse with a genotype- The agouti gene for them is wholly inactive, and their hair is black all the way as a result.

Mouse with Avy genotype- For this interesting variety, there’s a retrotransposon insertion just upstream of the agouti gene. This element codes for a piece of RNA that messes with agouti regulation, keeping it switched on permanently. As a result, Avy mice have yellow hair all the way through.

Scientists crossed pure Avy breed with the a variety, resulting in a strain with an Avy/a genotype. Avy happens to be dominant over a, so scientists expected all the mice to have yellow hair. That, however, didn’t happen- the mice, although genetically identical, had their hair color varying across the board:


Why would this extent of variation be noticed in genetically identical mice, who were all supposed to be yellow?

The answer is given by epigenetics. It was discovered that the aforementioned retrotransposon which controlled the transcription of the agouti gene could be methylated. In some of the mice, this was heavily methylated, reducing the activity of the agouti gene and expression of yellow hair. In others, the amount of methylation was low which let the retrotransposon keep the gene almost permanently active.

This sort of epigenetic-level control has been noticed not only in the case of hair color, but also for kinked tails, body weight, and other phenotypic properties. This mechanism seems representative of how epigenetic modifications can tinker with expression of particular genes even among genetically identical individuals. There are some additional observations to be made here.

First, this and other experiments showed epigenetic proteins to have a clear purpose. Naked DNA- without any epigenetic interference- would be subject to random transcription. Genes would be switched on all over the place without any rhyme or reason. This sort of spurious transcription is often called transcriptional noise. The key function of epigenetic proteins is to reduce this noise. Reduce, mind you, not eliminate- they function as something of a dimmer switch. In the agouti gene experiment above, the level of transcription was reduced by methylation to different degrees in different mice.

From the perspective of the cell, this sort of transcription dimming is something of a balancing act- on one hand, this gives the cell some flexibility to switch certain genes on and off. The cells have a degree of transcriptional autonomy which wouldn’t have been afforded to it were epigenetics to turn off transcription altogether. This sort of autonomy would be required if the cell faces adverse environmental conditions, say, where it would need to express proteins which it had no need of otherwise. On the other hand, the epigenetic control of genes also keep the cells committed enough to their respective lineage, to make sure rods don’t start expressing hemoglobin all of a sudden, say.

Second, there’s a degree of stochasticity or randomness to this process. There’s no easily graspable reason that can be offered within the realm of biology as to why certain mice get more or less methylation than its siblings. To ask that question is just to ask why particular mutations happen in the DNA (some of it may be directed based on cell function and chromosome context- I’m not talking about that). That simply has to do with the randomness inherent in really small things interacting with each other. Similarly, in the case of agouti methylation, levels of methylation that are more or less stochastically fixed during early development stays with the organism throughout their lives. Which brings us to the next point.

Third, running with our earlier analogy (see last post) of seeing an organism’s development as a Rube Goldberg machine- in such a machine, earlier events determine its course in the longer run. Same goes for development. Epigenetics is incredibly significant during early development, because what the cells acquire in terms of epigenetic marks during this period tend to stay with them in the long run. These effects are also amplified as development wears on. As mentioned above, the setting down of these epigenetic marks can certainly be random within some extents, but it can also be affected by environment. Certain environmental nudges during early development are picked up by the cell’s epigenetic writers, and they remain with the organism.

This also helps to explain the other phenomenon we brought up in part 1 of this series- the Dutch Hunger Winter. Malnutrition during certain parts of early development endowed the developing baby with certain epigenetic marks, and that stayed with them for all time to come.

The Epigenetics Revolution by Nessa Carey: Study notes pt. 3 [Ch. 4]

[Contd. from Part 2]

In the last parts of this series, we discussed natural observations and scientific experiments which establish the reality of epigenetics. To vaguely point to some mechanism that exists in the biological ether can get us only so far. And so from this post onwards, we’ll be entering the actual molecular nitty-gritty of it all and see what fundamental biological entities and processes epigenetic phenomena refer to.

Before we move on to the actual discussion in this part- I have two disclaimers. First, readers may note I don’t have any notes on chapter 3. That’s because chapter 3 is about the basics of genome biology- what the DNA looks like, how it’s transcribed, repaired, and expressed. My blog already assumes its readership to be knowledgeable about these issues. Second, Dr. Carey’s literary ingenuity really shines through in her accounts of the molecular processes that go on in the cell, exactly what we’ll be picking up today. She uses colorful but very to-the-point analogies to form reader’s intuitions about these matters (we’ll be seeing some examples later in this post). So even though we’re skipping chapter 3 notes- even seasoned biology students would get something out of reading the chapter in terms of fine-tuning their visualizations.

Let’s first take stock of what we have established. As Gurdon’s experiments proved, cells don’t lose anything in terms of actual DNA content as they become more specialized. The ball rolling down the epigenetic landscape yet remains the same ball. Which means the cell uses extra-genetic mechanisms to preferentially suppress (or enhance) the expression of certain types of genes depending on the specific differentiation program they’re committed to. Cells in the eye, for example, can’t afford to express the genes for hemoglobin, and so they must turn them off.

This presents two conceptual questions molecular biology must answer. First, what precise mechanisms does the cell use to turn on and off its genes? Second, cells have a rather limited life-span, and almost every cell of our body gets replaced. That means these gene-suppressing (and enhancing) mechanisms must not only exist, but also be faithfully inherited from cell generation to generation. Given the view of DNA being the hereditary molecule in our cells, how can we make sense of this?

Dr. Carey offers a very apt analogy. DNA is often talked about as a book, but she submits that it makes more sense to talk about it as a script for a play. A director’s script usually has all the lines for all the actors. The same script will result in different sorts of plays, if not different plays altogether, depending on how the play is directed. This is done not by changing the script itself- all the actors may have the same basic script, but the director may make specialized notes to each actor’s copy depending on his or her role. So in the script for The Dark Knight, for example, Christian Bale would have the same script as Heath Ledger, but the director’s notes on their scripts would differ. Once these scripts are photocopied, both the script as well as the directorial notes will survive the process.

In this big picture analogy, the director is the cell’s epigenetic mechanisms, the script without the director’s notes is the DNA, the actors are individual cells, and photocopying represents replication. The play, of course, is life.

With this overarching logic in place, let’s see how the cell conducts its plays.

DNA methylation

One of the key processes the cell uses to turn off gene expression is slapping a methyl (-CH3) group on the cytosine nucleotide in DNAs. An enzyme by the name of DNA Methyl Transferase (DNMT) carries out this reaction. A methyl group is really small compared to the overall base (15Da compared to the 600Da of a base pair)- the author compares this to sticking a grape to a tennis ball.

The chemical structure of a methyl group (blue) stuck to a nucleotide- see how small it looks when you compare it to the entire impressive DNA structure in the model to the right.

All cytosines are not created equal in their susceptibility to methylation- this is usually done for cytosines that are followed by a guanine (C followed by G, written as CpG). In 1985, the British scientist Adrian Burd discovered CpG motifs are not randomly distributed throughout the genome- rather, they are concentrated around the upstream region of genes where the gene promoters lie. So the hypothesis (that has now been confirmed) was that methylation of these CpG islands on or near gene promoters can switch off, or at least decrease, gene expression. The exact process here involves a protein- MeCP2 (Methyl CpG binding protein 2)- which, as its name suggests, binds to CpG motifs that have been methylated. This protein binding reduces expression by either recruiting other proteins in the cell which are involved in gene repression, or preventing transcription factors from binding (or both).

These epigenetic “marks” can also be stably inherited following a logic very similar to semiconservative replication. In brief, if a newly synthesized DNA molecule has one strand with and the other without epigenetic marks, DNMT corrects the imbalance by marking the newly synthesized (and hence unmarked) strand according to the patterns left on the old strand.

DNA methylation gets us a little close to understanding some of the body’s key epigenetic processes, but it still doesn’t explain everything. Different sorts of cells, for instance, repress different proteins depending on their specialization. So how does the cell know which genes to methylate and repress? How does an eye cell know to methylate and repress a skin cell?

Histone modification

Histone modification is the other key epigenetic process. There are some notable differences between methylation and histone modification- the latter is, for example, much less stable than the former. If DNA methylation are the printed directorial comments on each actor’s script, histone modification is more similar to pencil marks which will can only survive a few rounds of photocopying- in fact, they may be as transient as post-it notes and not be inherited at all. Methylation only represses gene expression, histone modification can increase or decrease it. Methylation is a relatively simple process, somewhat similar to a (gene expression) on-off switch, while the effect of histone modification comes in decrease, reminiscent of a radio dial.

People often think of DNA as naked spaghetti strands suspended in the casserole of the nucleus. If they really were as loose, they couldn’t have been fitted into the cell at all. Rather, DNA is spooled around globular proteins called histones. Histone proteins are organized in clusters of eight- four above and four below, like four ping-pong balls stacked on top of one another. The DNA strand loops around them like a licorice whip around a marshmallow (none of these colorful analogies are mine, just to be straight). It’s impossible for the cell’s transcription apparatus to read the DNA where the coiling is too tight.

The model above shows each individual histone as being completely globular, but that’s not totally accurate- they also have a wiggly chain or tail extending from them. These tails are where epigenetic codes are “written”. Like DNA methylation where a methyl group was added to cytosine, an acetyl group can be added to a particular amino acid (lysine) on the tail as well. Unlike DNA methylation, however, this sort of histone acetylation drives gene expression up.

In fact, also unlike DNA methylation, there are many different ways in which cells slap groups onto histone tails, and each of these ways have their unique effects on gene expression. Different sorts of functional groups attached to histone tails mean different degrees of increase or decrease in gene expression. These correlations between particular histone modifications and consequent effect on gene expression constitute a code or a grammar. The rules here are incredibly difficult to unearth, and that’s what scientists are working on today.

Functional groups attaching to histone tails have a myriad of effects on gene expression

Dr. Carey uses the following innovative illustration to give us a mental image of how the entire phenomenon of histone modification looks like:

Imagine a chromosome as the trunk of a very big Christmas tree. The branches sticking out all over the tree are the histone tails and these can be decorated with epigenetic modifications. We pick up the purple baubles and we put one, two or three purple baubles on some of the branches. We also have green icicle decorations and we can put either one or two of these on some branches, some of which already have purple baubles on them. Then we pick up the red stars but are told we can’t put these on a branch if the adjacent branch has any purple baubles. The gold snowflakes and green icicles can’t be present on the same branch. And so it goes on, with increasingly complex rules and patterns. Eventually, we’ve used all our decorations and we wind the lights around the tree. The bulbs represent individual genes. By a magical piece of software programming, the brightness of each bulb is determined by the precise conformation of the decorations surrounding it. The likelihood is that we would really struggle to predict the brightness of most of the bulbs because the pattern of Christmas decorations is so complicated.

As with DNMT and MeCP2 being the writers and readers respectively of DNA methylation, histone modification also employs a number of enzymes for these purposes. Again, the entirety of the code hasn’t been figured out as of yet, but lack of these reader and writer proteins are often causes of debilitating diseases- speaking to the importance of these modifications.

So to review- the two key was in which cells modify the levels of gene expression are DNA methylation and histone modification. The former is akin to a simple on-off switch, while the latter is much more complex, allowing for sophisticated fine-tuning of gene expression patterns in particular sorts of cells.

As a concluding (somewhat personal) note- learning about these mechanisms is what finally convinced me that when it comes to control of biological processes, there’s no element that’s more primary to another. Much of my undergraduate education seemed to at least implicitly assume DNA/genetic reductionism, so when in graduate school I learned about epigenetic mechanisms- my mind couldn’t process how these could work without DNA precisely telling them what to do. In reality, protein processes in the cell are not exclusively dependent on DNA, or vice versa. DNA and cellular processes always work together in an organism, with no part being in any way secondary or dependent in a unidirectional, asymmetric way.

Some people feel like the enormously complex processes that occur throughout development are similar to how a Rube Goldberg machine operates- a horribly complex and elaborate system with each of its part depending on some other. This analogy holds with one exception- in a Rube Goldberg machine, you need an initial trigger, be it the pull of a chord, the kick of a boot, what have you. That really doesn’t happen in case of life. At each stage of development, gene expression depends on epigenetic marks made by proteins, and the coding of these proteins depend on the genes. The epigenetic marks, in turn, are inherited from prior cells, in which the same process was repeated. You can trace an organism’s life history back this way until you reach the zygote. Surely the zygote is ultimately dependent on the DNA for instruction, non? Actually, here too crucial epigenetic information is inherited from the proteins in the egg. So it’s impossible to point to any part in the cell in the life of an organism- be it the DNA or any protein- and say, “this is the part that started it all”. In reality, it makes no sense to divide life’s processes in this way. All of life is integrated and interdependent with no primacy of any part over the other.

The Epigenetics Revolution by Nessa Carey: Study notes pt. 2 [Ch. 1-2]

[Contd. from Part 1]

In part 1 of this series, the basic idea of epigenetics was presented as a refutation of DNA/genetic reductionism. We explored this concept by reference to some examples and case studies. There’s a much simpler way to understand why epigenetics is needed for biology to make sense, however. Consider the fact that all of the 50 trillion cells in the human body essentially have the same genome (excluding rare mutations). However, a liver cell is completely different from a heart cell. The cells in your eye seem to have almost nothing in common with the ones in your gum. And yet, at the DNA level, their 3 billion nucleotides are arranged in the exact same away. To top it off, this bewildering variety of cell types in our bodies ultimately originated from just one cell, the zygote.

Clearly, reference to DNA alone can’t explain this phenomenon, and we need to appeal to some innate non-genetic biological mechanisms to explain any of this. Enter epigenetics. This seems like a very straightforward way of making the point we did in part 1.

Thing is, this rather everyday observation wasn’t always available to past scientists. Nowadays it’s common knowledge that all cells have the same DNA. But that took quite a bit of clever experimentation to find out.

John Gurdon’s experiment

When one thinks about the phenomena of cell differentiation- how a single zygote differentiates into so many different types of cells in our body- two explanations conceptually present themselves:

One, as the zygote develops into a particular type of cell- the rods or cones in the eye, say- their DNA itself undergoes change. Only those genes that are responsible for the unique properties of light sensing are preserved in the cell, and the rest are done away with. So on this model, cell differentiation is equivalent to deletion of particular bits of DNA depending on cell type.

Two, during development, the DNA of a cell stays as it is. The zygote and the rods/cones have the same DNA, but extra-genetic biological mechanisms are used to silence the effects of some genes, while preserving the effects of others. On this model, cell differentiation sees no change at the level of the DNA, it’s just the extra-genetic cellular mechanisms that suppress particular types of genes from being expressed.

The English biologist Sir John Gurdon designed an experiment to see which of these hypotheses was true. The basic premise for his experiment was very simple. If cells do lose DNA as they become more specialized (as per hypothesis 1), then that means the DNA derived from a specialized cell shouldn’t be able to do the job of zygote DNA. On this hypothesis, the zygote DNA would have a fuller complement of genes that would be missing from more specialized cells.

Sir Gurdon extracted the nucleus from a developed muscle cell from a toad and introduced it into an unfertilized toad egg. As it happened, the eggs did manage to develop into normal tadpoles. This didn’t have a high success rate, but the fact that even one of these eggs with the DNA extracted from a developed cell could give rise to a fully functional organism meant that no DNA is removed from an adult cell during the process of cell differentiation or development. The “zygote DNA” is no different than “specialized cell DNA”, so the latter can be used to replace the former and yet be expected to work.

Ripped off from Google Images.

To take stock, Gurdon’s experiment didn’t so much as prove anything than it disproved the hypothesis that DNA is lost from cells during development. It’s by this disproof that the second hypothesis above was confirmed.

Waddington’s epigenetic landscape

As it happened, a conceptual framework for understanding cell differentiation was already available by the time Sir Gurdon ran his experiments.

Waddington’s epigenetic landscape.

The picture you see above is a model for cell differentiation due to the British Polymath Conrad Waddington. This offers a particularly elegant framework to understand the issue. If the ball in the picture is allowed to roll (because of gravity), it will travel down one of the troughs at bottom. Once it reaches the bottom, it’s impossible for it to naturally “switch tracks” and go to the bottom of some other trough. Gravity would prevent the ball to roll uphill under normal conditions.

The ball, of course, represents the zygote; and the troughs represent specialized differentation pathways. Once the zygote rolls down any particular pathway, it becomes “committed” to that pathway and can’t normally switch tracks. This is why a liver cell can’t become a heart cell, because once the zygote transforms into the former, it can’t normally ignore the restrictions imposed upon it by the differentiation program (think of that as the gravity and inertia working on the ball) and choose to become the latter.

This makes the landscape look all science-like.

Another important benefit of this landscape model is that it appropriately captures the results of Gurdon’s experiments. Notice that as the ball rolls down particular troughs, it still remains the same ball- it doesn’t morph into something else, a square for example, precluding the possibility of putting it on top of the landscape and rolling it down once more. Same way, the zygote DNA doesn’t undergo change as it commits to any particular differentiation program. That’s why in Gurdon’s experiments it was possible to introduce differentiated cell DNA into an egg (putting the ball on top again) and having it develop (rolling down particular troughs).

Can we roll the ball uphill more efficiently?

At this juncture, it would be useful to introduce some terminology. A totipotent cell is a cell that can transform into any cell type, including the placenta. Only the zygote fits this bill. A pluripotent cell, on the other hand, can also transform into pretty much all cell types with the exception of the placentaEmbryonic stem (ES) cells fit this description- they are near the top of the landscape and can roll down any trough. The cells that roll down the troughs are differentiated or specialized cells, we’ll call them lineage-committed (LC) cells.

Gurdon engineered his LC cells into pluripotency by replacing the DNA of the latter with that of the former. There has to be a more efficient way of getting the job done. If the pluripotent cells are characterized by a certain gene expression profile- meaning if only certain genes are expressed in pluripotent cells- that means by tinkering with the gene expression profile of LC cells, it should theoretically be possible to transform them into ES cells. All that would be required to do is to express the specific gene complement that endows pluripotency to a cell, and that would revert any cell to its ES state. It would be possible to roll the ball uphill. This called for another set of experiments to be performed, and the baton was taken up by Shinya Yamanaka in Kyoto.

Professor Yamanaka and co. started with a complement of 24 genes that were known to be involved in developmental pathways. Their goal was to express these genes into mouse fibroblasts, and by gradually knocking down some genes at a time, find out the minimal set of genes required to induce pluripotency. By this sort of progressive whittle-down of genes, they discovered that only 4 genes being expressed was all it took for a cell to become pluripotent. Once it became committed to a particular lineage, these genes got switched off and more specialized genes started being expressed. Their findings, counterintuitive as it may have seen back then, were later confirmed by other laboratories.

Being able to cultivate embryonic stem cells in this way was a huge technological breakthrough, and talking about them would take us a few more paragraphs. However, this post series is more about theoretical biology than practical or technical relevance of the experiments. This is why I’ve omitted some crucial experimental details (e.g. how the controls were set up, why certain cell types were chosen for the tests etc) only to present what we eventually came to learn from them in terms of basic epigenetics research. To that point, it’s important to recognize that Yamanaka’s experiments offered additional confirmation of Gurdon’s findings- not only were the ES cells not different from the LC cells in terms of their DNA, but the latter can actually be transformed into the former by genetic manipulation. The fact that the ball can be rolled uphill in this way is very clear evidence that the ball at the top of the landscape and the one at the bottom of a trough are one and the same.

Gurdon told us cells manipulate, and don’t erase, bits of the DNA to commit them to particular lineages. Yamanaka gave us more information on how that’s achieved in mechanistic terms.

The Epigenetics Revolution by Nessa Carey: Study notes pt. 1 [Preface]

I recently picked up Nessa Carey’s The Epigenetics Revolution: How Modern Biology is Rewriting our Understanding of Genetics, Disease, and Inheritance. I plan to blog through the book as I read it, making posts on the study notes from a few chapters at a time. After I’m done reading (and blogging about) the whole thing, I’ll write a cumulative review of some sort. I have some understanding of how Dr. Carey writes, and I think this review scheme would be the most productive one for a book like this.

I’ve hardly read any other author who can so effectively break down complex topics in biology for consumption of an average-to-intermediate reading level audience. Dr. Carey is extremely adept at build-up- she takes her good time to gradually introduce novel concepts, making liberal use of innovative and intuition-boosting analogies along the way. Her explanations of facts and phenomena are peppered with references to relevant real-life events and general comments about scientific research. That sort of ‘couching’ may as well be the most balanced way to present information on a topic like this. All in all, this book is a nothing short of a stylistic masterpiece in this genre. My post series, on the other hand, would record the study notes only- which means having to strip away all the beautiful linguistic devices she makes use of and present the bare facts about epigenetics. That’s almost a crime.

The reason this disclaimer is necessary is because I don’t want anyone to get the impression that there’s nothing more to the book than my modest study notes. The book is an experience all on its own, and enthusiastic readers looking for their biology fix are highly recommended to get hold of their own copy.

One way to look at the book is as a sustained critique of what can be called DNA/genetic reductionism- the idea that DNA controls everything in our body, and that an exhaustive description of all biological processes can be given in terms of the way nucleotides are arranged. This idea is incredibly common among the scientific laity. There are definite appeals to this view, no doubt. It presents a Cartesian dualism-esque portrayal of how life functions- there’s a part of the cell with all the information (DNA), and other parts that do all the work (proteins). It’s a really simple, clear-cut way to sum up basic biology.

Reality, of course, is seldom as simple (and all the more so in biology). A very stark counterexample is presented by twin studies in Schizophrenia. Schizophrenia is proven to have genetic basis, and not surprisingly, when one of an identical twin pair has Schizophrenia, the other has a high chance of having it too. What’s puzzling is the lower than expected rate at which this correlation holds. Since identical twins are genetically- well, identical, and since Schizophrenia has a genetic basis, shouldn’t both twins always have this condition if any one of them does? But in reality, the identical twin of a Schizophrenic patient has a much lower chance of having this condition (50%) compared to what’s expected (100%). Clearly, genetic determinism can’t explain this. If genes dictated everything about our biology, then identical DNA would mean identical organism. That’s not what’s happening in this case.

Here’s a more elaborate case study. In post-WWII Netherlands, a generation of people were subjected to a really harsh famine, leading to the death of some 20,000. Tragic though it have been, epidemiological records collected at this time led to a fascinating scientific observation about the children who were born in this era. Consider the following two groups of children:

(A) This group of children were conceived during the tail end of the famine, and hence their first few months of pregnancy were spent in nutritionally dire conditions. However, food supplies arrived soon, and the mothers were relatively well-off during the last few months of pregnancy.

(B) This group of children were conceived just before the famine started, and so their first few months during pregnancy were normal. The famine rolled in soon, however, and the mothers spent their last few months of pregnancy deprived from adequate nutrition.

The children of these two groups showed some remarkable variations in their biology. Group A children, for example, consistently showed a higher rate of obesity, while those in Group B stayed small- and these effects stayed with both groups throughout their lives. To top it off, even the descendants of this latter group shared these effects.

Clearly, all of this indicates that an extra-genetic effect is being exercised on health, and even heredity. It seems during the early months after conception, nutrition or lack thereof puts a certain impression on the DNA, one that stays until death and may even pass on to progeny.

The above examples demonstrate the falsity of DNA/genetic reductionism, and lays the groundwork for introducing epigenetics. Simply put, epigenetics refers to the extra-genetic biological mechanisms which account for differences among genetically identical individuals.

One may be tempted to say that all these examples lead to is the rather obvious conclusion that both nature (genes) and nurture (environment) have a role to play, and there’s no reason to posit a biological mechanism when the environment can explain these effects just as well. That, however, leaves a few questions unanswered. It’s a genuinely bizarre fact that environmental impressions at early months of development not only stays with you for all your life, but can also pass on to the next generation. Which means while the source of this extra-genetic influence may be the environment, there’s a biological mechanism which translates the environmental effect to something innate to the organism (in addition, environment can’t account for all such deviations either- consider the twin study example cited earlier. Identical twins usually grow up in similar environments, and the difference isn’t drastic enough to account for the difference).

This gives us another useful way of defining epigenetics: it is the way of spelling out, in biologically precise terms, what “nurture” is. What does it mean to say that environment shapes a person’s health (and even offspring) in a certain way? Which mechanisms are at play here? That’s what epigenetics seeks to answer.