Omics Integration course 2021

Rui Benfeitas, Scilifelab, NBIS National Bioinformatics Infrastructure Sweden

rui.benfeitas@scilifelab.se

Abstract
In this notebook we will perform some of the basic operations in working with a genome-scale metabolic model (GEM). The vast majority of software that has been developed surrounding GEMs has been done in MATLAB, likely because this form of modeling has origins in engineering (specifically chemical engineering). Although well-suited for metabolic modeling, MATLAB is not open-source and therefore limits the accessibility of such software. Fortunately, the modeling community has implemented the MATLAB COnstrant-Based Reconstruction and Analysis (COBRA) Toolbox in Python, as COBRApy.

COBRApy is still relatively new and therefore lacks some of the functionality of its MATLAB counterparts, but the core utilities are available and quickly expanding. Here, we will demonstrate some of the basic functions and classes of the COBRApy package, which should also familiarize the user with the fundamentals of GEM structure and simulation.

Most of the commands and material covered in this tutorial can be found in the COBRApy Documentation, so we encourage you to reference the documentation if you encounter errors, warnings, or need further detail about something.


Contents

Global configuration object

Before jumping right into things, it is always nice to see what sort of default settings are in place. COBRApy has organized such defaults into a global configuration object, which can be viewed or adjusted as needed.

We can view a brief summary of the object

As well as the default reaction flux bounds (min,max)

Importing and inspecting models

GEMs, as their name implies ("genome"-scale), are often quite large, containing thousands of reactions, metabolites, and genes. It is therefore best to begin working with a simplified model that is quick to load and use, and easy to conceptualize.

For this exercise, we will use the textbook model that is provided with the COBRApy package. This model encompasses the core pathways of central carbon metabolism in E. coli. The cobra package ships with several test models in different formats

We will load the "textbook" model from the SBML (.xml) file

GEMs are often shared as SBML (Systems Biology Markup Language), an XML-based format commonly used to store GEMs. The aim of SBML is to serve as an open and standardized format to facilitate sharing of models and software. Feel free to open the textbook.xml file in a text editor to get an idea of how it is formatted.

You can find plenty of SBML models at the Metabolic Atlas, and can contribute to their curation on their github. Let's list the first few reactions in the model, with their reaction equations

We can specify the target method to retrieve a specific reaction (e.g., AKGDH)

Here we can see some of the reaction-associated information that is stored in the model. The Reaction identifier must be a unique string, and is typically a short abbreviation or code since there is also a more descriptive Name field.

Question: Is this reaction irreversible or reversible? From what field(s) can this be determined?

Question: Is the reaction catalyzed by isozymes or an enzyme complex?

Let's list the first few metabolites in the model, along with their chemical formula

We can use the identifiers above to inspect the 3pg_e metabolite in more detail

Note that we cannot reference the metabolite using model.metabolites.3pg_c because the metabolite ID begins with a number, which python doesn't like.

We can see the abbreviation of the compartment in which the metabolite exists, though c by itself is not super informative.

Question: What is the name of the c compartment, and what other compartments does the model have?

Adding reactions to the model

GEMs are never really "finished" because we continue to improve models based on known metabolism, finding errors or missing content, adding newly discovered reactions/genes/metabolites. It is therefore common for a user to need to add or remove content from the GEM.

For this example, we will add the aspartate aminotransferase reaction to enable the synthesis of aspartate:

L-glutamate + oxaloacetate <==> 2-oxoglutarate + L-aspartate

Create and edit the reaction object

Let's create a template reaction and determine what information we need to provide

To add it we'll need to add the name

We need to find the IDs of the metabolites involved in the reaction

Two interesting observations:

  1. There are two instances of 2-Oxoglutarate and L-Glutamate
  2. Aspartate is not yet in the model

For the first point, note that the _c and _e suffixes represent the compartment to which the metabolite belongs. Note that not all GEMs append the compartment information to the metabolite IDs, but it is quite common.

We can use the .compartments method to see the model's compartments

We want to add our reaction to the cytosol (c) compartment, so we will use the _c form of the metabolites.

For the second point, we will need to add aspartate to the model.

Create a new metabolite object

Let's create the aspartate metabolite

And fill in some information about the new aspartate metabolite

We will also need to specify all the stoichiometric coefficients for each metabolite in the new reaction

Is a reaction reversible by default?

This reaction is reversible biochemically, but the model considers it as irreversible by default. Unfortunately, we cannot directly edit the "reversibility" field

But instead we need to change the lower bound of the reaction

And now we can verify the arrow direction

Add a gene-protein-reaction (GPR) rule to the reaction

The reaction will still function even if it doesn't have a GPR (GEMs contain many non-enzymatic reactions, after all), but this information is important to include since it can affect some analyses, such as gene deletion analysis or reporter metabolite analysis.

Let's specify the new gene in the GPR. Aspartate aminotrasferase is encoded by aspC (b0928) in E. coli

Note that gene(s) in the GPR rule are automatically added to the "genes" field of the immutable reaction object.

Add the reaction to the model

We can then add the reaction to the model. The input should be a list.

We can also provide the gene name

Flux balance analysis

FBA is one of the most common and fundamental GEM-based analyses. It involves an optimization to estimate the flux (metabolite flow) through each reaction in the model, given the following constraints:

  1. The system is at steady state - each metabolite must be consumed and produced at the same rate (represented by the equation Sv = 0).

  2. Reaction fluxes must be within their defined lower and upper bounds (lb and ub, respectively).

The optimization will seek to minimize or maximize some objective defined by the user. Most often the objective is to maximize the flux through a "biomass" reaction, which represents an organism trying to allocate its resources such that it maximizes its growth rate.

Escher FBA

To help understand and visualize FBA and the resulting reaction fluxes, there is an excellent tool called Escher FBA. Follow this link to Escher FBA, and start the browser app by clicking the Launch image.

By default, the app is maximizing flux through the Biomass_Ecoli_core_w_GAM reaction, and flux values are represented by reaction arrow color and thickness. Hovering over a reaction name will show some information, as well as options to knock-out the reaction or to change the objective to maximizing or minimizing flux through that reaction.

Try changing the objective to different reactions, and see how the flux distribution changes. Also try knocking out some reactions to view how it affects the results. You can reset the app using the Reset Map button in the lower right-hand corner.

Inspect the optimization objective

Now we will perform FBA using (the less pretty but more functional) COBRApy. First let us take a look at the optimization objective (Biomass reaction by default in this model).

We can check the default objective function in this model with cobra.util.solver

Alternatively, we can use list comprehension. The "objective_coefficient" indicates which reactions are being maximized (1) or minimized (-1).

The 2 cells above also tell us the id of the objective functions. We can look into this reaction in more detail

As one might expect, the biomass reaction formula is quite long and has therefore been truncated in the preview. View the full formula to see all the metabolites involved, and the (molar) ratios in which they are consumed/produced.

Let's examine the entire reaction stoichiometry

We can also view the objective direction (maximize or minimize the reaction flux)

Perform the optimization

In order to run FBA, we can simply call model.optimize(). As indicated above, this will aim to maximmize the objective function. If we wanted to minimize it we could run model.optimize(objective_sense='minimize').

We can also view a summary of the returned optimal flux distribution

From the summary, we can view some details beyond the value of the objective, such as which metabolites are being consumed from the media, which are being produced, and at what rate.

We can also get a summary of the fluxes involving a specific metabolite

Alternatively, we can get a summary of the fluxes involving a specific reaction

Change the optimization objective

We have mentioned above that we could change an objective function to maximize/minimize. What if we want to change the reaction to optimize altogether? Let us now optimize the flux through ATPM ("ATP Maintenance"), which is just the hydrolysis of ATP.

Change the objective to ATPM

This will effectively maximize the production of ATP. Can you think of why we chose the ATPM reaction as the objective to do this?

Now let's run with the new objective and summarize the results

Note that there is now no metabolic flux through the biomass reaction model.reactions.Biomass_Ecoli_core.summary(). This gives an error because of its null flux.

Let's set the objective back to Biomass again, for the next sections

Perform an in silico knockout

As you may have seen in the Escher FBA app, we can simulate the effect of a gene knockout. In practice, this entails setting the upper and lower flux bounds of the associated reaction equal to zero, so that it cannot be used.

Reaction knockout

Although in reality we knockout a gene from an organism, not a reaction, we will start with knocking out a reaction.

Let's first copy the original model and optimize biomass production to view the initial maximum flux value

What if we knock out the AKGDH reaction? Is the output of the next cell expected after AKGDH is knocked out?

Note that the reaction is still present in the model, but it now cannot carry any flux. If we wanted to completely remove it from the model altogether, we could use the remove_from_model function: model.reactions.AKGDH.remove_from_model()

Let's now re-optimize biomass flux

And then calculate the difference in biomass flux between the original and knock-out objectives.

Since we observe a decrease in the biomass flux after knocking out the AKGDH reaction, it indicates that the model found an alternative but less efficient pathway to generate the required biomass precursors. However, the biomass is not zero, so we would predict that this knockout would likely not be lethal to the cell.

Gene knockout

In reality, genes are knocked out, not reactions. Let us now try knocking out the gapA (b1779) gene encoding the GAPD (Glyceraldehyde-3-phosphate dehydrogenase) reaction.

Let's view the GAPD reaction. Note the Lower and Upper bounds

And finally let's re-optimize the knockout model

Here we can see that the gapA gene (and its encoded GAPD reaction) were quite important, as the biomass flux has effectively been reduced to nearly zero ($<10^{-14}$). This is consistent with reports that E. coli cannot grow without this gene.

Perform all single gene or reaction deletions

Since it is a common analysis, COBRApy has specific functions for iterating through each gene (or reaction) in the model, knocking it out, and calculating the associated objective value.

We find 2 genes whose knockout renders the model infeasible

Question: Can you find the chemical formula of the reactions they catalyze, and suggest why these specific reactions are rendered infeasible?

Let's delete all reactions, one by one, and view the results

Isozyme and enzyme complex knockouts

If you still have time, try knocking out genes that encode an isozyme or a complex subunit to see what effect it has on a reaction (remember that the bounds will change to zero once it has been knocked out).

First try to inactivate the AKGDH reaction by knocking out one or more of its associated genes

Next try knocking out one or more of the genes associated with the ACALD reaction

Did you notice anything different in how each of these reactions responds to having one of its genes knocked out? Is that consistent with your understanding of the difference between isozymes and enzyme complex subunits?

Acknowledgements

This notebook was adapted from a notebook previously developed by Jonathan Robinson.