Introduction

Life developed under a variety of environmental pressures, such as pH, temperature, and osmotic pressure that have driven the evolution of robust mechanisms of cellular adaptation1,2. Through variations in metabolite levels, metabolic networks sense, adapt to, and influence the local environment to maintain functioning metabolic pathways under diverse and unpredictable conditions3,4,5. We have not yet achieved a quantitative understanding of how various environmental variables influence metabolite levels.

Constraint-based modeling provides a quantitative framework to study the interactions between cellular metabolism and the environment6. Constraint-based modeling of reaction fluxes, through flux-balance analysis (FBA), has been successful in predicting the growth rate and metabolic phenotypes by imposing mass-balance constraints on the flow of metabolites through the network7,8. Constraint-based formulation of metabolite levels, based instead on thermodynamic constraints, has also been shown to be feasible9,10. Unlike FBA, many more biophysical considerations must be accounted for when specifying the constraints on the quantitative metabolome. Moreover, while maximizing the growth rate is an intuitive objective to determine optimal pathway usage, cellular objectives that may underlie the quantitative metabolome have not yet been identified.

In this article, we introduce a comprehensive set of abiotic constraints (ABCs) on the metabolome. We present an ABC-based analysis to determine bounds on metabolite concentrations for a prescribed environment. We validate the results for a metabolic network of Escherichia coli against measured concentrations and known thermodynamic bottlenecks of central-carbon pathways. We then use the ABC-based analysis to evaluate how intracellular concentrations are influenced by the environment and shape biological functions. The ABCs are universal and apply from primitive to modern cells across the tree of life. Thus, this work paves the way to biophysically accurate representations of the quantitative metabolome at scales, where one can ask fundamental questions about the evolution and function of cellular processes.

Results

Formulation of the ABC-based analysis

We introduce an ABC-based approach to characterize the constraints that govern the interactions of biological functions with the inorganic world. This approach encompasses ten fundamental and evolutionary classes of abiotic constraints on cellular functions (see Fig. 1 and “Abiotic constraints” section): (i) thermodynamics, (ii) charge balance, (iii) solubility, (iv) membrane potential, (v) buffer capacity, (vi) enzyme saturation, (vii) ionic strength, (viii) osmotic balance, (ix) ion binding, and (x) experimentally unresolved metabolites (Fig. 1a).

Fig. 1: Constraint-based formulation of the ABCs determines the concentration state of metabolic networks.
figure 1

a Schematic representation of ten fundamental and evolutionary constraints that define a concentration solution space \({\mathcal{C}}\). Reactants, reactions, and species are associated with the indices i, j, and k, respectively. b Construction of \({\mathcal{C}}\) from the ABCs, formulated as equalities (e.g., charge balance) and inequalities (e.g., thermodynamic), illustrated in a two-dimensional space. The solution space \({\mathcal{C}}\) can be nonconvex or disconnected. Feasible concentration ranges furnished by the ABC-based analysis are generally more restricted than those provided by thermodynamic constrains alone (TMFA analysis10). c Sampling the solution space by generating random trajectories in \({\mathcal{C}}\) from an interior point. Interior points are identified in a region (purple area), where experimental data most likely fall. Trajectories are continued until at least one of the thermodynamic constraints is violated. For each thermodynamic constraint, the violation probability is defined as the ratio of the number of trajectories intersecting it and the total number of trajectories.

Mathematically, thermodynamic and solubility constraints are represented by inequalities and the rest by equalities. Together, these ten classes of constraints define a nonconvex and possibly disconnected set of feasible concentration states that is challenging to characterize (Fig. 1b).

The conceptual basis for the ABC-based analysis is analogous to the widely used flux-balance analysis8. A biologically relevant concentration solution space (CSS) is constructed by restricting the concentrations to those obeying all the ABCs. Thus, the CSS contains all quantitative metabolomes that satisfy the ABCs. Accordingly, the ABC-based analysis can be leveraged to determine bounds on metabolite concentrations and Gibbs free energy of reactions.

An objective function may be used to ascertain a particular concentration state from the CSS. We study two cellular objectives pertaining to the concentration state of the cell, namely the thermodynamic and enzyme-saturation efficiency. These objectives play the same role in the ABC-based analysis as the growth-rate objective does in FBA11 (see “Methods” section for detailed mathematical formulations and computational approaches). They both rest on the assumption that intracellular concentrations are regulated to ensure efficient enzyme utilization, regardless of the growth state12. To achieve a prescribed flux state at minimum protein cost, the first maximizes the thermodynamic driving force of all reactions in the network independently of enzyme saturation levels, whereas the second maximizes the saturation level of all enzymes independently of thermodynamic driving forces. Within the framework of the ABC-based analysis, how well these objectives could represent the functional state of the cell is determined by identifying the location of metabolomic data inside the CSS.

Characterization of the CSS for genome-scale metabolic models is computationally challenging. Therefore, we applied the ABC-based analysis to a reduced metabolic network of E. coli, comprising 78 reactions, 72 cytoplasmic, and 11 periplasmic metabolites (Supplementary Fig. S3). This network consists of (i) high-flux pathways and (ii) pathways connecting the reduced network to key metabolites with the largest concentrations among those reported in the literature13,14. Major pathways, such as glycolysis, tricarboxylic acid (TCA) cycle, anaplerotic reactions, electron transport chain (ETC), biosynthetic sugar metabolism, and cofactor interconversions are included in this network. Importantly, this reduced network contains over 90% of the observed metabolome by mole and includes all reactions with experimentally measured fluxes14. Therefore, it possesses the most essential characteristics of the global metabolome for the ABC-based analysis.

We present our results in three parts. First, we show that the ABC-based analysis leads to results that are consistent with concentration and flux data. We also elaborate on the implications of our analysis for other constraint-based approaches, such as thermodynamics-based metabolic flux analysis (TMFA)10. Then, we show that the ABCs necessitate the evolution of multiple specialized phosphate transporters. Finally, we provide two case studies, illustrating how the ABCs shape transcriptional responses to stress conditions.

Quantitative metabolomes defined by the ABCs are consistent with experimental data

We sought to determine whether the ABCs were consistent with available experimental metabolomic data. We considered growth on four carbon sources, including glucose, acetate, pyruvate, and succinate. The flux states of the network were determined using the latest genome-scale metabolic network reconstruction of E. coli15. The FBA model (see “Flux distribution” section) was solved to identify fluxes that best matched their experimentally measured values14 for each carbon source. The resulting flux directions were then used to specify the thermodynamic constraints.

To assess the consistency of metabolomic data with the results of the ABC-based analysis, we identified a point inside the CSS, referred to as computed concentrations Ccm (Fig. 2b, crosses), that is the closest to the experimental measurements of Gerosa et al.14 (Fig. 2b, circles) for each carbon source. We found a strong correlation (Pearson coefficient R = 0.84–0.93) and relatively small root-mean-square deviation (RMSD = 2.3–5.1 mM) between the computed and experimental concentrations across all carbon sources (Fig. 2a), showing that the CSS contains the measured concentrations. Note that, when only the thermodynamic constraints are included in the ABC-based analysis, the computed concentrations exhibit a higher correlation with metabolomic data, reflecting the restrictive characteristic of the charge-related constraints of the ABCs (see “Consistency of the ABCs with metabolomic data” section and Supplementary Fig. S12).

Fig. 2: The ABCs are consistent with metabolomic and isotope-labeling data for E. coli.
figure 2

a Comparison of measured concentrations Cex and computed concentrations Ccm. Here, computed concentrations refer to a point inside the CSS that is the closest to the experimental measurements of Gerosa et al.14. Dashed lines indicate deviation by root-mean-square deviation (RMSD) from the line Cex = Ccm. The Pearson correlation coefficient is denoted R. b Feasible ranges of metabolite concentrations. Dark gray bars, determined from known Michaelis constants63 (Methods: Eqs. (53) and (54)), show part of the feasible range, where some of the reactions a given metabolite participates in are undersaturated. Measured concentrations are reported by Gerosa et al.14. The number of molecules per cell is calculated from the respective molar concentration based on Vcell = 2.6 fL64. c Feasible ranges of transformed Gibbs energy of reactions and violation probabilities of thermodynamic constraints. Near-equilibrium reactions are those, for which the scaled transformed Gibbs free energy (\({{{\Delta }}}_{r}G^{\prime} /RT\)), evaluated at a point inside the CSS that is the closest to experimental values (crosses), is smaller in magnitude than 10−2. Therefore, the corresponding points are not shown on the respective green bars. For b and c, glucose is the sole carbon source. Pathway abbreviations are defined in Supplementary Table S3. Error bars indicate twice the standard deviation.

We ascertained feasible concentration ranges for all intracellular metabolites in the reduced network when glucose is the sole carbon source (Fig. 2b). The ABC-based analysis reveals a qualitative connection between the network topology and the global concentration bounds. The upper and lower concentration bounds are generally restrictive for hub metabolites (e.g., glucose 6-phosphate and fructose 6-phosphate), while they exhibit the opposite trend for end-node metabolites (e.g., trehalose and UDP-N-acetyl-glucosamine). The hub metabolites simultaneously participate in several reactions, and their possible concentrations are constrained to a narrow range, so that they satisfy all the thermodynamic constraints these reactions are subject to. The lower bound for exported metabolites with a low periplasmic concentration is restrictive (e.g., CO2) because the cell must retain a minimum cytoplasmic concentration to sustain a thermodynamic driving force across the cell membrane. For a similar reason, the upper bound for imported metabolites with a low periplasmic concentration is also restrictive (e.g., choline).

To derive order-of-magnitude estimates of cytoplasmic concentrations, we computed expected values for concentrations (Fig. 2b, asterisks) by sampling the CSS (see “Characterization of concentration solution space” section). We found a good agreement between the expected values and measured concentrations for all metabolites, except for a few cofactors, such as ATP, NADP, and NADH (RMSD = 5.47 mM, excluding the cofactors). These cofactors participate in many reactions across the network. They are highly regulated with concentrations spanning several orders of magnitude depending on the growth condition or in response to environmental stresses10,14. As a result, their concentrations may be driven towards the CSS boundaries to accommodate a desired flux state. Thus, they cannot be generally well-approximated by the expected values, which are usually interior points of the CSS.

Enzyme-saturation efficiency has been proposed in the literature as a cellular objective to determine optimal intracellular concentrations12. To maximize this objective, metabolite concentrations are expected to be of the same order as, or greater than, the Michaelis constants of the reactions the metabolites participate in. However, this heuristic principle does not apply to all enzymes (e.g., degradation reactions)13. Nevertheless, it is expected to hold for reactions in central-carbon metabolism, where flux directions alter in response to stress conditions, so that enzymes are efficiently saturated in both directions13.

We compared the maximum Michaelis constant \({K}_{{\rm{m}}}^{\max }\) (Methods: Eq. (53)) of metabolites (Fig. 2b, dark gray bars) with measured concentrations (Fig. 2b, circles). We performed this comparison only for reactions with known Michaelis constants. We found 14 undersaturated reactions (\({C}_{i}\, <\, {K}_{{\rm{m}},i}^{\max }\)) from central-carbon pathways (e.g., PGI, ICDHyr, TALA, and AKGDH), indicating that saturation efficiency is not a fundamental cellular objective or constraint, with which to determine the concentration state of metabolic networks. This result is consistent with previous studies on the efficiency of enzymatic reactions13,16. Nonetheless, saturation efficiency could be one of several competing factors determining the functional state of the cell17.

Thermodynamic bottlenecks identified by the ABC-based analysis are consistent with isotope-labeling measurements

Thermodynamics play a fundamental role in adaptation. Modern bacteria have an extraordinary ability to grow and survive using minimal environmental resources. To maximize their survival chances, they efficiently allocate available resources to essential biochemical reactions, maintaining a steady flow of materials through all their metabolic pathways. Sustaining this optimal flux state hinges on a trade-off between pathway thermodynamics and proteome allocation. Reactions with a small thermodynamic driving force have higher protein cost and vice versa17. Through this trade-off, metabolic fluxes are adapted to environmental perturbations18.

We sought to determine the thermodynamic bottlenecks resulting from the ABCs. We calculated the feasible ranges of transformed Gibbs energy of reactions (Fig. 2c, green bars) and the violation probabilities (see Fig. 1c and its caption for the definition) of their respective thermodynamic constraints (Fig. 2c, brown bars) to identify the most constraining reactions of the reduced network when glucose is the sole carbon source. Reactions with negative upper and lower bounds on transformed Gibbs energy (e.g., MDH2) are irreversible; all the other reactions can be either inactive or reversible. In FBA, every reaction is considered reversible if no information about Gibbs energies is available. However, using the feasible ranges of transformed Gibbs energy computed from the ABC-based analysis, the flux solution space can be reduced by imposing the corresponding flux bounds. We also found that, if the expectation of transformed Gibbs energy of a reaction is of the same order as RT or less, then the violation probability of that reaction is substantially larger than other reactions, providing a qualitative criterion for when a reaction is thermodynamically constraining in a metabolic network.

Reaction energies and concentrations are restricted more by the ABCs than thermodynamic constraints alone. As a result, the global concentration bounds derived from the ABC-based analysis can either coincide with (e.g., upper bound for choline) or be more restrictive than (e.g., lower bound for glucose 6-phosphate) those furnished by TMFA (Fig. 3a). The thermodynamic constraints are more dominant in the first, and charge-related constraints are more dominant in the second. The dominant constraints set the limit for the metabolite concentrations. We compared the feasible ranges of transformed Gibbs energy of reactions arising from the ABC-based analysis and TMFA (Fig. 3b), identifying five irreversible reactions (GLUN, GLUSy, MDH2, NH4tpp, CYTBO3-4pp) compared to the three provided by TMFA (MDH2, NH4tpp, CYTBO3-4pp).

Fig. 3: Abiotic constraints underlie the evolution of alternative transport systems.
figure 3

Feasible ranges of metabolite concentrations (a) and transformed Gibbs energy of reactions (b) furnished by the ABC-based analysis and TMFA are compared. c Feasible cytoplasmic-phosphate concentration ranges in mutants, where either the Pst system (blue) or the Pit system (red) is active at periplasmic concentrations \([\![ {\rm{pi}}]\!]_{{\mathrm{p}},1}=0.01\) and \([\![ {\rm{pi}}]\!]_{{\mathrm{p}},2}=69\) mM. d Phosphate-transport kinetics in E. coli. Solid blue and red lines show phosphate uptake rates in mutants, where either the Pst system or the Pit system is active, and dashed black line corresponds to a wild-type strain, where both systems are active. Maximum velocities and Michaelis constants are taken from Willsky et al.65. The total phosphate uptake rate is estimated vT = αPstvPst + αPitvPit with αPst = 0.09 and αPit = 0.9 at large \([\![ {\rm{pi}}]\!]_{\mathrm{p}}\)65. Uptake rates are estimated using \([\![ {\rm{pi}}]\!]_{\mathrm{c}}=1\), \([\![ {\rm{atp}}]\!]_{\mathrm{c}}=3.45\), and \([\![ {\rm{adp}}]\!]_{\mathrm{c}}=0.6\) mM. e Schematic representation of the Pst and Pit systems operating in parallel in wild-type strains.

We illustrated the thermodynamic bottlenecks of the reduced network on a network map (Fig. 4a) to visualize metabolite concentrations and transformed Gibbs energy of reactions. These were evaluated at a point inside the CSS with the minimum distance from the experimental data of Gerosa et al.14. We found several reactions in glycolytic (ENO, PGI, GAPD, PGM), TCA cycle (ICDHyr, MDH, ACONTa, SUCOAS), and sugar metabolism (PGMT, PGAMT, GF6PTA, GALUi) pathways with small transformed Gibbs energy of reactions (\(| {{{\Delta }}}_{r}G^{\prime} | \ll 1\) kJ/mol). The feasible concentration ranges of metabolites participating in these reactions are not necessarily restrictive, so they are regulated to operate close to their equilibrium.

Fig. 4: The ABC-based analysis elucidates the distinct ways, in which biochemical reactions perturb the physicochemical state of the cell.
figure 4

Computations are performed for exponential growth on glucose under physiological conditions (pHp = 7, pHc = 7.5, [Mg]c = 2 mM, Ic = 300 mM). a Concentration and energy map: reaction colorbar indicates transformed Gibbs energy of reactions in kJ/mol-rxn and metabolite colorbar indicates log10(Ci/Co) with Co = 1 M. b Charge map: reaction colorbar indicates intrinsic charge consumption in mol-e/mol-rxn and metabolite colorbar indicates metabolite effective charge in mol-e/mol-met. c Hydrogen map: reaction colorbar indicates intrinsic hydrogen consumption in mol-H/mol-rxn and metabolite colorbar indicates metabolite hydrogen content in mol-H/mol-met. d Magnesium map: reaction colorbar indicates intrinsic magnesium consumption in mol-Mg/mol-rxn and metabolite colorbar indicates metabolite magnesium content in mol-Mg/mol-met. Metabolite and reaction colorbars are denoted by “met” and “rxn”, respectively. For bd, arrows point in the direction, in which negative charge, hydrogen, and magnesium ions are consumed. Reaction charge, hydrogen, and magnesium consumption for transport reactions, as indicated by the colorbars, do not reflect translocated hydrogen ions. Small arrows next to each transport reaction show the direction of their respective net fluxes into and out of the cell, accounting for translocated hydrogen ions and the accompanying metabolites. Enlarged and detailed maps are provided in Supplementary Figs. S8–S11. The definitions of reaction charge, hydrogen, and magnesium consumption are given in “Abiotic constraints” section.

Four major findings from the foregoing results are notable: (i) Near-equilibrium reactions from the ABC-based analysis agree with those previously reported for glycolysis18 and the TCA cycle19,20 based on 2H and 13C metabolic flux analysis. (ii) Flux-direction reversal in glycolysis is often induced by carbon-source and nutrient alterations14. Thus, having near-equilibrium steps enhances the energy efficiency of pathways, enabling the cell to rapidly switch flux directions in response to energy and biomass demands with minimal concentration changes18. (iii) The analysis indicates that there are several thermodynamic bottlenecks in the TCA cycle, hampering its full cyclic operation19. (iv) The results suggest that thermodynamic efficiency (see “Enzyme kinetics” section) is not a fundamental cellular objective that determines the concentration state of metabolic networks.

To quantify the degree to which the additional constraints of the ABC-based analysis can reduce the feasible ranges, we compared the feasible ranges of concentrations and transformed Gibbs energy of reactions furnished by the ABC-based analysis and TMFA when glucose was the only carbon source. We found that the upper bounds on metabolite concentrations and transformed Gibbs energy of reactions from TMFA were on average 728% and 11% larger than those from the ABC-based analysis, and the lower bounds were on average 7% and 87% smaller.

Thermodynamic constraints drive the evolution of high-affinity phosphate transporters

By examining individual ABCs in isolation, one can identify the most dominant constraints under a given condition. Such identification may have important evolutionary implications because each of these constraints could have been dominant at different point in time during evolution. For example, thermodynamic laws are believed to have constrained the evolution of biochemical-reaction networks ever since the formation of primitive cells to ensure the energy requirements and spontaneity of prebiotic chemical reactions21. However, the osmotic-pressure differential and membrane potential likely have changed significantly during the course of evolution from ion-permeable porous membranes in protocells to ion-impermeable lipid-bilayers in modern cells22. This forms a rational basis for chronological postdictions concerning the evolution of alternative pathways.

E. coli has two major phosphate-transport systems (Fig. 3e), namely Pit (low-affinity) and Pst (high-affinity). We sought to determine whether any of the ABCs could have been restrictive enough to hinder the operation of the Pit system in phosphate-limited environment, thereby driving the evolution of the Pst system. These transporters, which differ in their stoichiometry (Fig. 3e), are of particular interest because inorganic phosphate is an important constituent of energy-carrying molecules (e.g., ATP), cofactors (e.g., NADH), and information-storage molecules (e.g., DNA). Therefore, E. coli maintains an optimal homeostatic phosphate level around 1–10 mM23,24.

We, thus, asked: Is it possible to achieve cytoplasmic-phosphate concentrations above 1 mM while satisfying all the ABCs? We computed feasible concentration ranges for the reduced network, in which either the Pit or Pst system was the only active phosphate transporter. We compared these feasible ranges at two periplasmic concentrations, representing phosphate-limited and phosphate-rich environments (Fig. 3c). While the feasible concentration range for the cytoplasmic phosphate is almost insensitive to its periplasmic concentration when the Pst system is active, achieving cytoplasmic concentrations above 1 mM is impossible in phosphate-limited environments when the Pit system is active. Moreover, comparing the upper bound on the cytoplasmic-phosphate concentration provided by the ABC-based analysis and TMFA (Fig. 1b) at the foregoing periplasmic concentrations reveals that the operation of the Pit system is limited by the thermodynamic constraints, not the osmotic-balance or other charge-related constraints (see “Role of phosphate transporters” section).

Next, we asked: Can high-affinity transporters maintain sufficient phosphate influx in phosphate-limited environment to support growth? We compared the transport kinetics of the two phosphate transport systems (Fig. 3d) in a wild-type strain of E. coli to demonstrate the transition between the Pit and Pst systems in response to phosphate-starvation stress. Clearly, E. coli can remain functional in a wide range of extracellular concentrations, importing phosphate through the Pit system at high rates when it is available in excess. However, when the Pit system reaches its equilibrium at low extracellular concentrations, E. coli exhibits a stress response by taking up phosphate through the Pst system to retain the phosphate influx, albeit at lower rates. Note that, this is not the case for every membrane protein in E. coli. Cytochrome oxidase in the ETC is an example, where the transition from low-affinity (cytochrome-bo3) to high-affinity (cytochrome-bd) system occurs most likely due to kinetic reasons. Because these reactions are both highly energetic in their forward direction (\({{{\Delta }}}_{r}{G}^{^{\prime} \circ }\approx -90\) kJ/mol), their equilibria are never reached for any biologically feasible concentration ranges of ubiquinone, ubiquinol, and oxygen. Thus, the thermodynamic constraints are unlikely to be responsible for the low activity of cytochrome-bo3 in oxygen-limited environments.

The ABCs shape transcriptional regulatory responses to environmental stresses

Biochemical reactions comprise transformations among metabolites that can have multiple charge states arising from hydrogen dissociation and metal-ion binding25. These reactions liberate or bind hydrogen and metal ions and, thus, interact with the cytoplasmic fluid to establish intracellular pH and metal-ion homeostasis. The reactions that have the highest rate of exchange with the cytoplasmic fluid (highlighted in Fig. 4) represent the key players in maintaining hydrogen and metal-ion concentrations. Thus, they are prime targets for evolving transcriptional regulatory mechanisms in response to environmental stresses.

Short-term transcriptional regulatory responses to acid and osmotic stress are usually classified into phase I (20 min; regulation is transient) and phase II (20–60 min; new steady state is reached)26. Early stress responses by wild-type strains differ from those exhibited by evolved strains that have adapted to high osmolarities27. In our formulation, strain-specific characteristics are captured by active pathways, dominant metabolite concentrations, and condition-specific parameters of the reduced network. We constructed a charge map (Fig. 4b) using the characteristics of a wild-type E. coli strain to interpret short-term stress responses. Accordingly, we sought to establish associations between phase-I/II osmoregulatory responses (see “Known osmoregulation mechanisms” section) and reaction charge consumption provided by the ABC-based analysis (Fig. 4b).

From the RNA-sequencing data of Seo et al.28, we identified 69 differentially expressed genes (>2-fold change 30 min after osmotic upshift) out of 124 genes associated with the reduced network. Among the primary-response genes that are differentially expressed, those associated with the ETC are all downregulated and those with osmoprotectant transporters are upregulated. Although not explicitly included in the reduced network, potassium importers and sodium exporters are also activated. Moreover, several intracellular reactions are among the secondary regulatory targets. For example, glycolytic reactions (except GAPD) and glutamate decarboxylase (GLUDC) (not explicitly included in the reduced network) are activated, while glutamate synthase (GLUSy) and glutamate dehydrogenase (GLUDy) are suppressed.

These observations point to possible connections between reaction charge consumption and transcriptional regulatory responses to osmotic stress. These connections arise from cellular requirements to maintain electroneutrality, pH homeostasis, and a steady energy flow by coordinating glycolytic and ETC reactions. Qualitative connections have been drawn between expression data and ion homeostasis in the literature. For example, an inhibited respiration is regarded as an early response to an alkalinized cytoplasm due to a sudden efflux of hydrogen ions that accompanies potassium intake upon osmotic upshift29. This initial response is followed by a rapid accumulation of glutamate—a potassium counterion—through glutamate importers or its biosynthetic pathways26,30. Charge and pH neutrality are resumed in phase II upon flux regulations targeting the ETC, glycolysis, potassium and sodium transporters, glutamate transporters, glutamate biosynthesis, and glutamate decarboxylase. Here, the apparent inconsistency between the downregulated glutamate biosynthesis, observed from early stages31, and elevated glutamate concentration is indicative of an active glutamate importer, such as glutamate/γ-aminobutyrate antiporter, in phase I.

To highlight the significant role that charge-balance and ion-binding constraints could play in transcriptional regulatory processes, we examined the regulation of glutamate in response to osmotic stress as a case study. We computed the intrinsic hydrogen and charge consumption (Methods: Eqs. (37) and (36); see “Abiotic constraints” section for the definition of intrinsic and extrinsic quantities) of glutamate-biosynthesis pathways (i.e., GLUDy, GLUSy, and GLUDC) and the glutamate/γ-aminobutyrate antiporter. We found that glutamate accumulation through these biosynthetic pathways does not affect the net cytoplasmic charge. It also lowers the hydrogen-ion concentration of an already alkalinized cytoplasm in phase I. In contrast, glutamate uptake through the glutamate/γ-aminobutyrate antiporter results in an inflow of negative charge, which can counterbalance the accumulated K+ in the cytoplasm during phase I without affecting the pH, demonstrating the osmoregulatory role of glutamate antiporters (see “Glutamate role in osmoregulation” section for a more detailed discussion).

The interpretation of these complex processes can be simplified using graphical representations. The hydrogen (Fig. 4c) and magnesium (Fig. 4d) maps graphically visualize equilibrium shifts with respect to pH and pMg perturbations. Similarly to the charge map, these maps can help elucidate regulatory responses to pH- and pMg-related stress conditions. Consider the hydrogen map for example. Here, a change in pHc affects equilibrium constants, Gibbs energy of reactions, and reaction rates according to La Chatelier’s principle25. If a reaction produces hydrogen ions (ΔrNH,j < 0) (Fig. 4c, blue arrows), then a positive perturbation in pH increases the equilibrium constant \({K}_{j}^{\prime}\), which, in turn, shifts the reaction more in the forward direction, so that hydrogen ions are produced at a higher rate to counter the effect of the change. The magnesium map can be interpreted in a similar manner.

Thus, the ABCs shape transcriptional regulatory responses to environmental stresses. The combined effects of ion-binding and charge-balance constraints strongly restrict possible regulatory strategies to restore electroneutrality and pH homeostasis upon osmotic shock. In particular, the ABC-based analysis revealed why regulating the cytoplasmic charge through glutamate antiporters is more desirable than glutamate-biosynthesis pathways. This highlights the importance of the GAD operons in osmoregulation, the transcriptomic function of which has been recently detailed using large datasets and machine learning32.

Hydrogen-ion and charge imbalances underlie transcriptional regulatory responses to acid stress

pH homeostasis is a fundamental cellular process. Bacteria can survive extreme acidic environments thanks to several sophisticated acid-resistance mechanisms, such as glucose-repressed-oxidative, glutamate-dependent, and arginine-dependent systems33,34. Acid stress causes increased hydrogen-ion entry into the cell, disrupting charge and hydrogen-ion balance. The cytoplasmic pH is perturbed as a result, affecting equilibrium constants, reaction energies, and reaction rates. This invokes transcriptional regulatory responses, similar to those observed in osmoregulation discussed above. Being directly involved in pH-homeostasis and reaction equilibria, reaction hydrogen consumptions are naturally expected to play a key role in acid-stress regulation.

To better understand how pH-dependent equilibrium shifts can affect transcriptional responses, we sought to identify connections between reaction hydrogen consumptions and gene expressions in phase-I/II RNA-sequencing data34. We identified 41 differentially expressed genes (>1.4-fold change within an hour after pH downshift) that are associated with the same reactions (except phosphate transporters) in the reduced network as those involved in osmoregulation. However, there are key differences in regulatory responses between the two conditions: Contrary to osmotic stress, NADH dehydrogenase is upregulated, while phosphate and osmoprotectant transporters are downregulated under acid stress. These gene-expression changes are consistent with the regulatory objective to neutralize the acidified cytoplasm by adjusting hydrogen-ion inflow and outflow rates.

For any given reaction, we found the intrinsic hydrogen consumption to be a reasonable indicator of its transcriptional regulatory state. Interestingly, among the 14 reactions with the highest hydrogen consumption or production (ΔrNH > 1 mol-H/mol-rxn) in the hydrogen map (Fig. 4c), 11 are associated with the differentially expressed genes in phase I35 and II34. To quantify the extent to which reaction hydrogen consumption in the reduced network can influence acid-stress response, we generated a connectivity map, identifying associations between key metabolic pathways and transcriptional regulators (Fig. 5). Glycolysis, glutamate metabolism, and the ETC are major regulatory targets in the reduced network. Their extrinsic hydrogen consumption in the base state is ΔrNH = −21.23, 7.18, and 113.78 mmol-H/gDW/h, respectively. Three important points concerning this analysis are notable:

Fig. 5: Association of transcriptional regulatory responses with key metabolic pathways reveals factors influencing the E. coli acid-stress response (pHp,ϵ = 5.5).
figure 5

Reaction colorbar “rxn” indicates hydrogen consumption in mol-H/mol-rxn, metabolite colorbar “met” indicates metabolite hydrogen content in mol-H/mol-met, and gene-expression colorbar “gene” indicates expression change log2(TPMϵ/TPM0). The subscripts ϵ and 0 denote the acid-stress condition and base state. The base state corresponds to a stress-free growth in a neutral medium with pHp,0 = 7, pHc,0 = 7.5, [Mg]c,0 = 2 mM, and Ic,0 = 300 mM. Genes with more than 1.4-fold expression change (shown in boldface) are considered differentially expressed. Expression data are taken from the RNA-sequencing analysis of Seo et al.34 on a wild-type E. coli strain. Reaction hydrogen consumption for the ETC reactions, as indicated by the colorbar, does not reflect translocated hydrogen ions. Small arrows next to each ETC reaction show the direction of net hydrogen-ion flux into (black) and out of (white) the cell.

First, glycolysis is a major regulatory target. Overall, it is upregulated under acid and osmotic stress. Interestingly, it comprises reactions with high intrinsic hydrogen production (GAPD and PFK) and consumption (PYK). The same is true of charge consumption and production. As a result, the net hydrogen consumption of the entire pathway (ΔrNH = −21.23 mmol-H/gDW/h) is much smaller than that of the ETC (ΔrNH = 113.78 mmol-H/gDW/h), even though the fluxes (obtained from FBA analysis) passing through these pathways are comparable. This feature allows the cell to reverse the flux direction through glycolysis in response to other simultaneous stresses (e.g., carbon-source change) without significant disruption to pH homeostasis (see “Glycolysis and gluconeogenesis in pH homeostasis” section).

Second, the glutamate-dependent system has been reported as the most effective acid-resistance mechanisms, complementing primary responses34. Glutamate metabolism contains some of the highest hydrogen-consuming reactions in the reduced network—a possible explanation for their observed upregulation. They also carry high flux (especially GLUDy) whether or not the cell is under stress29, so they can effectively attenuate acidification of the cytoplasm. The upper TCA cycle is downregulated, further promoting this mechanism by directing the flux of precursors from glycolysis and lower TCA towards glutamate biosynthesis.

As previously stated, these pathways are also involved in osmoregulation. However, unlike in pH regulation, GLUDy and GLUSy are both downregulated in response to osmotic stress. This raises a question as to why GLUDC is upregulated during osmoregulation when pH  7.5. Here, contrasting transcriptional responses to osmotic and acid stresses in relation to hydrogen-ion balance might offer a plausible answer: Under osmotic stress, K+ is the main glutamate counterion when the cytoplasm is alkalinized in phase I with GLUDC counterbalancing hydrogen-ion influx through sodium antiporters and osmoprotectant symporters in phase II. In contrast, under acid stress, hydrogen ion becomes one of the main glutamate counterions when the cytoplasm is acidic in phase I with GLUDC, GLUDy, and GLUSy offsetting hydrogen-ion leakage through the membrane in phase II.

Third, the ETC can contribute the most to hydrogen-ion balance in the cell. However, it is directly or indirectly coupled to many crucial cellular processes, such as the membrane potential, ion transport, energy balance, and redox balance. Therefore, regulation strategies that solely target the ETC would partially disrupt essential cellular functions and reduce the robust adaptability of bacteria to extreme and diverse environmental perturbations.

Discussion

“Those [constraints] that operate at any given level are still valid at all more complex levels” — Francois Jacob36. Biological functions are subject to myriad constraints. The constraints on the function of abiotic and early biotic cells have been the subject of much discussion from Darwin’s warm-pond hypothesis to the role of geothermal vents at the bottom of the ocean. Central to this discussion is the role of abiotic constraints. Given Francois Jacobs’ quote, these constraints are fundamental and affect the function of ancestral and modern cells. The ABCs, thus, apply to functions across the biosphere.

In this study, we (i) formulated the ABCs, (ii) developed a computational platform, enabling their application to biological systems, (iii) showed that measured quantitative metabolomes satisfy these constraints, (iv) found that they furnish thermodynamic bottlenecks that mirror isotope-labeling experiments, (v) explained the reasons underlying the evolution of multiple phosphate transporters, and (vi) elucidated the constraints that shape transcriptional regulatory responses to external stresses. These results show the broad applications of the ABC-based analysis to understand biological functions and their origins. In fact, the two are inseparable.

Constraint-based approaches have significantly advanced in recent years, incorporating more detailed molecular descriptions of biological networks7,37. Despite their expanded predictive scope, these approaches still fall short in one key regard: Metabolite concentrations do not enter the formulation through governing equations or constraints grounded in first principles. To fill this gap, we introduced a constraint-based approach to characterize the concentration state of metabolic networks. This approach predicts feasible concentration and Gibbs-energy ranges resulting from the constraints biological networks are subject to.

We showed that the ABCs are consistent with metabolomic data across carbon sources. To investigate whether intracellular concentrations are associated with optimality principles, we studied two cellular objectives that have been previously considered17, namely thermodynamic and enzyme-saturation efficiency. We found that biochemical reactions do not generally satisfy these objectives. Specifically, we identified several reactions from the TCA cycle and glycolysis that operate near their equilibrium during the exponential growth phase. Near-equilibrium reactions lower dissipative energy loss at higher protein cost. They also offer functional advantages, enabling rapid flux-direction reversal in energy-metabolism pathways to robustly achieve new steady states upon a wide variety of environmental perturbations. These observations underscore the complex and multifaceted nature of cellular metabolism, which cannot be fully described by single-objective or oversimplified optimality principles5,38.

We studied the role of the ABCs in the evolution of phosphate transport systems. These concepts may be applied in a broader context to prebiotic reaction networks to gain insights into the origins and evolution of early metabolism21,39. The fundamental constraints we implemented in our analysis are particularly relevant to origins-of-life theories. Specifically, the membrane potential, transmembrane-ion gradients, charge balance, and thermodynamic laws are believed to have constrained the most ancient pathways for the formation of small organic molecules21. Several scenarios have been suggested for the origins of primitive life40. For all these theories to be plausible, one must establish a balance between the availability of an energy source that could have existed on the early earth (e.g., pH gradient in hydrothermal vents) and the energy required to operate the most energy demanding step (e.g., carbon-carbon bond formation) of the proposed chemistry for carbon metabolism (e.g., CO2 reduction by H2)21. These hypotheses can be rigorously formulated and examined within the quantitative framework we developed here to address some of the fundamental questions concerning the emergence of life from prebiotic chemistry.

Besides characterizing the quantitative metabolome, abiotic constraints shape regulatory responses to stress conditions. We analyzed osmotic- and acid-stress regulation to establish a relationship between transcriptional responses and reaction hydrogen and charge consumption. We argued that the net hydrogen and charge consumption of the electron transport chain, glutamate biosynthesis, and glutamate transporters can effectively counteract the hydrogen-ion and charge imbalances induced by osmotic and acid stresses, providing an explanation for why they are subject to transcriptional regulation. Other reasons for why certain reactions are regulatory targets have been proposed. For example, reactions with a large thermodynamic driving force are believed to be under stricter transcriptional regulatory control than near-equilibrium ones since their rates are less sensitive to concentration perturbations10. Reaction rates that are sensitive to concentration variations can be effectively controlled through allosteric regulation. Hence, near-equilibrium reactions are expected to rely less on transcriptional regulations than highly exergonic ones.

An observed stress response is a resultant of overlapping allosteric and transcriptional regulatory processes, the organization and mechanism of which depend on: (i) regulation objective (e.g., restoring homeostasis after a vital constraint is violated), (ii) regulation constraints (e.g., charge and H+ balance, osmotic balance, cell structural integrity, gene clusters and physical proximity, ATP and redox balance), and (iii) regulatory targets (biochemical reactions). We showed several ways, in which flux adjustments to biochemical reactions can modulate critical intracellular variables (net charge, hydrogen, and magnesium-ion concentrations, metabolite concentrations) to alleviate stress-induced imbalances, providing a more comprehensive description of stress-specific regulatory responses. The gene-expression activity of glycolytic reactions in response to acid stress is an example, where pfkB and pykA are differentially expressed. Although the corresponding reactions have the largest thermodynamic driving force, they also produce and consume the highest amount of H+ among glycolytic reactions. Therefore, their transcriptional regulation is possibly more related to their role in pH homeostasis than their energetics.

The detailed formulation of the ABCs unraveled additional layers of attributes associated with biochemical reactions besides energetics and stoichiometry that are important to network functions. These constraints can help quantify perturbations in charge, pH, and pMg homeostasis induced by biochemical reactions, explaining the commonalities and differences in regulatory processes with similar objectives. Electroneutrality and pH homeostasis are examples of basic regulatory objectives that underlie transcriptional regulatory response to acid and osmotic stress, where we elucidated the associations between stress responses and charge-related constraints.

Taken together, the ABC-based analysis elucidates the role of physicochemical constraints in the operation of fundamental cellular processes, especially those involved in stress regulation. Thus, it lays the foundation for integrated models of flux, concentration, and macromolecular expression in the future that are capable of predicting functional states under any growth or stress conditions.

Methods

Reactants and species in biochemical reactions

Most metabolites behave as weak acids in the cell, donating hydrogen ion to the intracellular fluid in multiple steps (ref. 25, Ch. 1). The resulting protonation states can, in turn, bind to several metal ions (e.g., K+ and Mg2+) in separate steps. These charge states, referred to as species, can coexist in the cell with varying distributions depending on the pH and ionic strength of the solution. Several species can be biologically active under physiological conditions, simultaneously participating in biochemical reactions. In general, all the protonation states and their magnesium-bound counterparts are active in the cell, especially, those with phosphate groups (ref. 1, Sect. 9.4.2). We account for all the active charge states of a given metabolite, incorporating hydrogen-dissociation and magnesium-binding reactions (up to two Mg2+)

$${{\rm{H}}}_{k}{\rm{A}}\rightleftharpoons {\rm{H}}+{{\rm{H}}}_{k-1}{\rm{A}},\ {K}_{0,k},\ k=1\cdot \cdot N,$$
(1)
$${\rm{Mg}}{{\rm{H}}}_{k}{\rm{A}}\rightleftharpoons {\rm{Mg}}+{{\rm{H}}}_{k}{\rm{A}},\ {K}_{1,k},\ k=0\cdot \cdot N,$$
(2)
$${{\rm{Mg}}}_{2}{{\rm{H}}}_{k}{\rm{A}}\rightleftharpoons {\rm{Mg}}+{\rm{Mg}}{{\rm{H}}}_{k}{\rm{A}},\ {K}_{2,k},\ k=0\cdot \cdot N$$
(3)

into our formulation. Here, N is the number of hydrogen dissociation steps and A represents the minimum charge state of a metabolite in the cell with K0,k, K1,k, and K2,k the respective hydrogen-dissociation and magnesium-binding constants. These reactions tend to run faster than enzymatic reaction, always remaining close to their equilibrium (ref. 25, Ch. 1). This allows determining the distribution of species independently from the extension of biochemical reactions. Therefore, we simplify our analysis by lumping all the species together, representing them as a single reactant, with respect to which the abiotic constraints are to be expressed

$${\mathcal{A}}:=\left\{{\rm{A}},\cdots \ ,{{\rm{H}}}_{N}{\rm{A}},{\rm{Mg}}{\rm{A}},\cdots \ ,{\rm{Mg}}{{\rm{H}}}_{N}{\rm{A}},{{\rm{Mg}}}_{2}{\rm{A}},\cdots \ ,{{\rm{Mg}}}_{2}{{\rm{H}}}_{N}{\rm{A}}\right\}.$$
(4)

Here, \({\mathcal{A}}\) represent a reactant in the cell, the concentration of which is given by

$$\begin{array}{l}[\![ {\rm{A}}]\!] :=\left[{\rm{A}}\right]+\cdots +\left[{{\rm{H}}}_{N}{\rm{A}}\right]+\left[{\rm{Mg}}{\rm{A}}\right]+\cdots +\left[{\rm{Mg}}{{\rm{H}}}_{N}{\rm{A}}\right]\\ + \left[{{\rm{Mg}}}_{2}{\rm{A}}\right]+\cdots +\left[{{\rm{Mg}}}_{2}{{\rm{H}}}_{N}{\rm{A}}\right].\end{array}$$
(5)

For each species \({\rm{A}}^{\prime} \in {\mathcal{A}}\), the corresponding model fraction \({\rho }_{{\rm{A}}^{\prime} }:=[{\rm{A}}^{\prime} ]/[\![ {\rm{A}}]\!]\) is calculated from hydrogen-dissociation and magnesium-binding constants, irrespective of the reaction \({\rm{A}}^{\prime}\) participates in (see “Composition of reactants” section). Accordingly, we define an effective charge

$${\bar{z}}_{{\mathcal{A}}}:=\mathop{\sum}\limits_{{\rm{A}}^{\prime} \in {\mathcal{A}}}{\rho }_{{\rm{A}}^{\prime} }{z}_{{\rm{A}}^{\prime} }$$
(6)

for the reactant \({\mathcal{A}}\). Here, \({z}_{{\rm{A}}^{\prime} }\) denotes the charge of the species \({\rm{A}}^{\prime}\). This definition ensures the consistency of formulations with respect to reactants and species by requiring \({\bar{z}}_{{\mathcal{A}}}[\![ {\rm{A}}]\!]\) to furnish the total charge of all the species in \({\mathcal{A}}\). Although species always have an integral charge, reactants can generally carry fractional charges depending on the ionic strength, pH, and pMg of the solution because of their dependence on the mole fractions (Supplementary Fig. S1). The effective buffer intensity \({\bar{\beta }}_{{\mathcal{A}}}\), effective charge squared \({\bar{\omega }}_{{\mathcal{A}}}\), effective hydrogen content \({\bar{N}}_{{\rm{H}},{\mathcal{A}}}\), and effective magnesium content \({\bar{N}}_{{\rm{Mg}},{\mathcal{A}}}\) of \({\mathcal{A}}\) are defined similarly

$${\bar{\beta }}_{{\mathcal{A}}}:=\mathop{\sum}\limits_{{\rm{A}}^{\prime} \in {\mathcal{A}}}{\rho }_{{\rm{A}}^{\prime} }{\beta }_{{\rm{A}}^{\prime} },$$
(7)
$${\bar{\omega }}_{{\mathcal{A}}}:=\mathop{\sum}\limits_{{\rm{A}}^{\prime} \in {\mathcal{A}}}{\rho }_{{\rm{A}}^{\prime} }{z}_{{\rm{A}}^{\prime} }^{2},$$
(8)
$${\bar{N}}_{{\rm{H}},{\mathcal{A}}}:=\mathop{\sum}\limits_{{\rm{A}}^{\prime} \in {\mathcal{A}}}{\rho }_{{\rm{A}}^{\prime} }{N}_{{\rm{H}},{\rm{A}}^{\prime} },$$
(9)
$${\bar{N}}_{{\rm{Mg}},{\mathcal{A}}}:=\mathop{\sum}\limits_{{\rm{A}}^{\prime} \in {\mathcal{A}}}{\rho }_{{\rm{A}}^{\prime} }{N}_{{\rm{Mg}},{\rm{A}}^{\prime} }.$$
(10)

In this article, the terms metabolite and reactant are interchangeably used. To distinguish between individual charge states and their aggregate representation in our formulation when necessary, we specifically refer to a metabolite and its charge states as reactant and species, respectively.

Buffer capacity and buffer intensity

The buffer capacity \({\mathcal{B}}\) is a measure of how much a strong base (or acid), such as NaOH, is needed to increase the pH of a solution. The buffer capacity of a mixture of reactants can be obtained from a superposition of the buffer capacities of individual reactants41. Therefore, we first derive an expression for the buffer capacity of a reactant \({\mathcal{A}}\). We consider all the charge states of \({\mathcal{A}}\) arising from hydrogen-dissociation and magnesium-binding reactions Eqs. (1)–(3) in our derivation. The charge balance for this system reads

$${z}_{{\rm{Na}}}[{\rm{Na}}]+{z}_{{\rm{H}}}[{\rm{H}}]+{z}_{{\rm{OH}}}[{\rm{OH}}]+\mathop{\sum}\limits_{{\rm{A}}^{\prime} \in {\mathcal{A}}}{z}_{{\rm{A}}^{\prime} }[{\rm{A}}^{\prime} ]=0.$$
(11)

Because NaOH is a strong base, it completely hydrolyzes, so that [Na] represents the concentration NaOH needed to neutralize \({\mathcal{A}}\). Note that, we did not explicitly state the charge that each ion carries in this equation for brevity—a notational convention we adopt throughout this document. We decompose [Na] into water [Na]w and reactant component \({[{\rm{Na}}]}_{{\mathcal{A}}}\), which we refer to as the strong-base equivalent of \({\mathcal{A}}\), according to \([{\rm{Na}}]={[{\rm{Na}}]}_{\bf{w}}+{[{\rm{Na}}]}_{{\mathcal{A}}}\), where

$${[{\rm{Na}}]}_{\mathrm{w}}:=\frac{{K}_{\mathrm{w}}}{[{\rm{H}}]}-[{\rm{H}}],$$
(12)
$${[{\rm{Na}}]}_{{\mathcal{A}}}:=-[\![ {\rm{A}}]\!] \mathop{\sum}\limits_{{\rm{A}}^{\prime} \in {\mathcal{A}}}{\rho }_{{\rm{A}}^{\prime} }{z}_{{\rm{A}}^{\prime} }$$
(13)

with Kw the water dissociation constant. Accordingly, the buffer capacity of water and reactant \({\mathcal{A}}\) are defined41

$${{\mathcal{B}}}_{\mathrm{w}}:=\frac{{\rm{d}}{[{\rm{Na}}]}_{\mathrm{w}}}{{\rm{d}}{\rm{pH}}}={\mathrm{ln}}\,(10)\left(\frac{{K}_{\mathrm{w}}}{[{\rm{H}}]}+[{\rm{H}}]\right),$$
(14)
$${{\mathcal{B}}}_{{\mathcal{A}}}:=\frac{\partial {[{\rm{Na}}]}_{{\mathcal{A}}}}{\partial {\rm{pH}}}=-[\![ {\rm{A}}]\!] \mathop{\sum}\limits_{{\rm{A}}^{\prime} \in {\mathcal{A}}}\frac{\partial {\rho }_{{\rm{A}}^{\prime} }}{\partial {\rm{pH}}}{z}_{{\rm{A}}^{\prime} }.$$
(15)

The effective buffer intensity of \({\mathcal{A}}\) is simply defined as \({\bar{\beta }}_{{\mathcal{A}}}:={{\mathcal{B}}}_{{\mathcal{A}}}/[\![ {\rm{A}}]\!]\). The buffer capacity can be approximated by linearizing \({{\mathcal{B}}}_{{\mathcal{A}}}\) around a pH of interest. It is correspondingly defined as the amont of a strong base one should add to a solution to increase its pH by one. Supplementary Fig. S2 illustrates how the strong-base equivalent and buffer intensity of \({\mathcal{A}}\) vary with pH. The total buffer capacity of a mixture of reactants is determined from those of the individual reactants

$${{\mathcal{B}}}_{{\rm{T}}}={{\mathcal{B}}}_{\mathrm{w}}+\mathop{\sum}\limits_{i\in {\mathcal{I}}}{{\mathcal{B}}}_{i},$$
(16)

where \({\mathcal{I}}\) is the index set of reactants in the solution.

Osmotic pressure and activity models

Osmosis is a prevalent phenomenon in electrolyte systems, where there are ion-concentration differentials. Consider the osmotic coefficient of a multicomponent aqueous solution42

$$\phi :=\frac{{{\Pi }}}{{C}_{\mathrm{t}}RT}=-\frac{{\vartheta }_{\mathrm{w}}}{{C}_{\mathrm{t}}}{\mathrm{ln}}\,{a}_{\mathrm{w}},$$
(17)

where Ct is the total molar concentration of solutes, R gas universal constant, and T temperature. Here, ϑw, aw, and xw denote molar density, activity, and mole fraction of water. The osmotic pressure Π measures the pressure of the solution relative to that of pure water at the same temperature. We use Pitzer’s model for the water activity43

$${\mathrm{ln}}\,{a}_{\mathrm{w}}=-\left(\frac{{C}_{\mathrm{t}}}{{\vartheta }_{\mathrm{w}}}\right)\left(1+\frac{2{I}_{\mathrm{m}}}{{m}_{\mathrm{t}}}{f}^{\phi }({I}_{\mathrm{m}})\right)+{\mathcal{O}}\left(\frac{{m}_{i}^{2}}{{m}_{\mathrm{t}}}\right),$$
(18)

where

$${f}^{\phi }({I}_{\mathrm{m}}):=-\frac{{A}^{\phi }\sqrt{{I}_{\mathrm{m}}}}{1+{B}_{{\rm{PZ}}}^{\phi }\sqrt{{I}_{\mathrm{m}}}}$$
(19)

with Aϕ = 0.391475 kg1/2 mol−1/2, \({B}_{{\rm{PZ}}}^{\phi }=1.2\) kg1/2 mol−1/2, mt total molal concentration of solutes, and Im the molal ionic strength43. We neglect the second and higher order terms in mi, corresponding to ion-ion interactions, in Eq. (18). Since the parameters required to estimate these interactions are not generally known for biological systems, they are usually neglected25. Satisfactory results have been reported in equilibrium studies of biochemical reactions using activity models based on this approximation in concentration ranges of physiological relevance, justifying this assumption25. Molal concentrations can be converted to molar concentrations according to Ci =\({\hat{\rho }}_{\mathrm{w}}{m}_{i}\), where Ci and mi are molar and molal concentrations of solute i. Moreover, \({\hat{\rho }}_{\mathrm{w}}={\rho }_{\mathrm{w}}-{\sum }_{i\in {{\mathcal{I}}}_{\mathrm{s}}}{C}_{i}{{\rm{MW}}}_{i}\), where \({\hat{\rho }}_{\mathrm{w}}\) and ρw are the reduced density and density of solution with \({{\mathcal{I}}}_{\mathrm{s}}\) and MWi the index set of solutes in the solution and molecular weight of solute i. In general, these densities are functions of temperature, pressure, and solute concentrations. However, in biological systems, they are commonly approximated by the density of water since solute concentrations in the cell are negligible compared to water25. We adopt this approximation to convert between molar and molal concentrations through a multiplicative constant. Note that, if a molar ionic strength I is passed to fϕ, the constants in Eq. (19) must be adjusted according to \({A}^{\phi }\mapsto {A}^{\phi }/\sqrt{{\hat{\rho }}_{\mathrm{w}}}\) and \({B}_{{\rm{PZ}}}^{\phi }\mapsto {B}_{{\rm{PZ}}}^{\phi }/\sqrt{{\hat{\rho }}_{\mathrm{w}}}\). Given the relationship \({I}_{\mathrm{m}}=I/{\hat{\rho }}_{\mathrm{w}}\), these ensure the consistency of ionic-strength and parameter units. Substituting Eq. (18) in Eq. (17), we derive an expression for the osmotic coefficient

$$\phi =1+\frac{2I}{{C}_{\mathrm{t}}}{f}^{\phi }(I),$$
(20)

where molality-to-molarity conversion has been applied. We use this expression to estimate the osmotic pressure of the cytoplasmic and periplasmic fluids in our model

$$\frac{{{{\Pi }}}_{\mathrm{c}}}{RT}={C}_{{\mathrm{t}},{\mathrm{c}}}+2{I}_{\mathrm{c}}{f}^{\phi }({I}_{\mathrm{c}}),$$
(21)
$$\frac{{{{\Pi }}}_{\mathrm{p}}}{RT}={C}_{{\mathrm{t}},{\mathrm{p}}}+2{I}_{\mathrm{p}}{f}^{\phi }({I}_{\mathrm{p}}),$$
(22)

where the subscripts c and p denote the quantities associated with the cytoplasm and periplasm.

The activity coefficients of solutes are also needed for equilibrium computations that will be discussed in subsequent sections. The activity coefficient of charge solutes in ionic solutions can be generally expressed as

$${\mathrm{ln}}\,{\gamma }_{i}^{({\mathrm{m}})}=-{z}_{i}^{2}{f}^{\gamma }({I}_{\mathrm{m}})+{\mathcal{O}}\left({m}_{i}\right),$$
(23)

where \({\gamma }_{i}^{({\mathrm{m}})}\) is the molality-based activity coefficient. Two widely accepted activity models are the extended Debye–Hückel and Pitzer, respectively given by25,43

$${f}^{\gamma }({I}_{\mathrm{m}}):=\frac{3{A}^{\phi }\sqrt{{I}_{\mathrm{m}}}}{1+{B}_{{\rm{DH}}}^{\phi }\sqrt{{I}_{\mathrm{m}}}},$$
(24)
$${f}^{\gamma }({I}_{\mathrm{m}}):={A}^{\phi }\left[\frac{\sqrt{{I}_{\mathrm{m}}}}{1+{B}_{{\rm{PZ}}}^{\phi }\sqrt{{I}_{\mathrm{m}}}}+\frac{2}{{B}_{{\rm{PZ}}}^{\phi }}{\mathrm{ln}}\,(1+{B}_{{\rm{PZ}}}^{\phi }\sqrt{{I}_{\mathrm{m}}})\right],$$
(25)

where \({B}_{{\rm{DH}}}^{\phi }=1.6\) kg1/2 mol−1/244. As with the osmotic coefficient, the parameter transformations \({A}^{\phi\, }\mapsto\, {A}^{\phi }/\sqrt{{\hat{\rho }}_{\mathrm{w}}}\), \({B}_{{\rm{PZ}}}^{\phi }\,\mapsto\, {B}_{{\rm{PZ}}}^{\phi }/\sqrt{{\hat{\rho }}_{\mathrm{w}}}\), and \({B}_{{\rm{DH}}}^{\phi }\,\mapsto\, {B}_{{\rm{DH}}}^{\phi }/\sqrt{{\hat{\rho }}_{\mathrm{w}}}\) apply if I is passed to fγ instead of Im. Our formulations in the forthcoming sections are with respect to molarity-based activity coefficients γi. These are related to molality-based activity coefficients by \({a}_{i}:={\gamma }_{i}{C}_{i}/{C}^{\circ}:={\gamma }_{i}^{({\mathrm{m}})}{m}_{i}/{m}^{\circ}\), where Co = 1 mol/L-w and mo = 1 mol/kg-w are the standard concentrations. As will be discussed, activity coefficients are needed to calculate the formation energies of species at a given I from standard formation energies, which we approximate using group-contribution methods45. These techniques estimate group-contribution parameters using the extended Debye–Hückel model by minimizing the difference between the predicted and measured equilibrium constants of biochemical reactions. Since equilibrium constants are highly sensitive to these parameters, we adhere to the extended Debye–Hückel model in our formulation.

Abiotic constraints

Metabolism is a hallmark of living systems. It is orchestrated through a delicate balance between metabolite concentrations, biochemical-reaction fluxes, and macromolecular levels in the cell. Metabolic networks are tightly intertwined with complex, overlapping regulatory and signaling networks, enabling the cell to sustain homeostasis or adapt to new environments by transitioning between homeostatic states14. Quantitative description of dynamical systems of such complexity using mechanistic kinetic models is often restricted to small networks46 due to computational challenges, limited kinetic data, and incomplete knowledge about the mechanisms of enzymatic reactions47.

We introduce a constrained-based formalism to characterize the metabolome. Rather than specifying a unique time-dependent concentration state, we seek a set of concentration states that respect all the constraints biological networks are subject to, allowing for a unified treatment of steady-state and oscillatory dynamics. Fundamental constraints, such as the law of mass action, charge balance, and osmotic balance, imposed by the laws of thermodynamics, electroneutrality, and osmotic pressure, are obeyed by all electrochemical systems. The law of mass action, which we refer to as thermodynamic constraints in the main text, arises from the dissipative structure of chemical-reaction systems and the second law of thermodynamics48,49, while the charge and osmotic balances are essential for cell-volume and ion-transport stability50. Evolutionary constraints, such as fixed ionic strength and buffer capacity, have emerged from the evolutionary adaptation of biological networks. The ionic strength plays a crucial role in several cellular processes, such as osmoregulation, enzyme activity, and protein structure51, while the buffer capacity is associated with pH homeostasis52. These condition-specific parameters are controlled by regulatory networks to maintain a stable metabolism under various charge-related stress conditions.

The complexities of regulatory networks impede the application of mechanistic dynamic models, even for those controlling the simplest cellular processes14. Therefore, we adopt a top-down approach, where the phenotypes arising from important regulatory processes (e.g., osmoregulation and ion homeostasis) are implicitly accounted for by incorporating them as additional constraints (the evolutionary constraints) into our model. The foregoing fundamental and evolutionary constraints are mathematically expressed as

$$\mathop{\sum}\limits_{i\in {{\mathcal{I}}}_{\mathrm{c}}}{\bar{z}}_{i}{C}_{i}+\mathop{\sum}\limits_{i\in {{\mathcal{J}}}_{\mathrm{c}}}{\bar{z}}_{i}{C}_{i}={\zeta }_{\mathrm{c}},\ ({\rm{Charge}}\ {\rm{balance}})$$
(26)
$$\mathop{\sum}\limits_{i\in {{\mathcal{I}}}_{\mathrm{c}}}{\bar{\omega }}_{i}{C}_{i}+\mathop{\sum}\limits_{i\in {{\mathcal{J}}}_{\mathrm{c}}}{\bar{\omega }}_{i}{C}_{i}=2{I}_{\mathrm{c}},\ ({\rm{Ionic}}\ {\rm{strength}})$$
(27)
$$\mathop{\sum}\limits_{i\in {{\mathcal{I}}}_{\mathrm{c}}}{\bar{\beta }}_{i}{C}_{i}+\mathop{\sum}\limits_{i\in {{\mathcal{J}}}_{\mathrm{c}}}{\bar{\beta }}_{i}{C}_{i}={{\mathcal{B}}}_{\mathrm{c}}-{{\mathcal{B}}}_{\mathrm{w}},\ ({\rm{Buffer}}\ {\rm{capacity}})$$
(28)
$$\mathop{\sum}\limits_{i\in {{\mathcal{I}}}_{\mathrm{c}}}{C}_{i}+\mathop{\sum}\limits_{i\in {{\mathcal{J}}}_{\mathrm{c}}}{C}_{i}={C}_{{\mathrm{t}},{\mathrm{c}}},\ ({\rm{Bounded}}\ {\rm{cytoplasm}})$$
(29)
$${C}_{{\mathrm{t}},{\mathrm{c}}}-{C}_{{\mathrm{t}},{\mathrm{p}}}+2\left[{I}_{\mathrm{c}}{f}^{\phi }({I}_{\mathrm{c}})-{I}_{\mathrm{p}}{f}^{\phi }({I}_{\mathrm{p}})\right]=\frac{{{\Delta }}\Pi }{RT},\ ({\rm{Osmotic}}\ {\rm{balance}})$$
(30)
$${{{\Delta }}}_{r}{G}_{j}^{^{\prime} \circ }+RT\;{\mathrm{ln}}\,{{{\Gamma }}}_{j}\le 0,\ j=1\cdot \cdot m,\ ({\rm{Thermodynamics}})$$
(31)
$${C}_{i} \sim {K}_{{{m,i,j}}},\ i\in {{\mathcal{I}}}_{\mathrm{c}},\ j=1\cdot \cdot m\ ({\rm{Enzyme}}\ {\rm{saturation}})$$
(32)

with C the vector of cytoplasmic reactant concentrations (\({C}_{{\mathcal{A}}}\) and \([\![ {\rm{A}}]\!]\) are interchangeably used to denote the concentration of \({\mathcal{A}}\). For a reactant with only one charge state, [A] and \([\![ {\rm{A}}]\!]\) are indistinguishable), Ct,c total cytoplasmic concentration, Ct,p total periplasmic concentration, ζc net cytoplasmic charge, Ic cytoplasmic ionic strength, Ip periplasmic ionic strength, \({{\mathcal{B}}}_{\mathrm{c}}\) total cytoplasmic buffer capacity, \({{\mathcal{B}}}_{\mathrm{w}}\) buffer capacity of water, ΔΠ: = Πc − Πp osmotic-pressure differential, fϕ Pitzer’s function for osmotic coefficient [Eq. (5)43], \({{\boldsymbol{\Delta }}}_{{\bf{r}}}{{\bf{G}}}^{^{\prime} \circ }\) vector of standard transformed Gibbs energy of reaction, Γ reaction quotient, Km Michaelis constant, j reaction index, i reactant index, and m number of reactions. Moreover, \({{\mathcal{I}}}_{\mathrm{c}}\) and \({{\mathcal{J}}}_{\mathrm{c}}\) are the index sets of cytoplasmic reactants and transmembrane ions (K+, Na+, Cl, Mg2+, OH, and H+)53 affecting the membrane potential Δψ. Similarly, the index sets of periplasmic reactants and transmembrane ions are denoted \({{\mathcal{I}}}_{\mathrm{p}}\) and \({{\mathcal{J}}}_{\mathrm{p}}\). We applied these constraints to characterize the CSS of a reduced metabolic network of E. coli (Supplementary Fig. S3).

Equations (26)–(31) represent charge balance, fixed ionic strength, fixed buffer capacity, bounded cytoplasmic metabolome, osmotic balance, and the law of mass action, respectively. Equation (32) corresponds to the enzyme-saturation constraint (see “Enzyme kinetics” section). In general, we did not impose it as a hard constraint because it does not hold for every reaction in the network (e.g., amino-acid degradation reactions13). However, it is used to evaluate the saturation level of enzyme and how well it describes the functional state of the cell by comparing metabolomic date with \({K}_{{\rm{m}}}^{\max }\) values. Nonetheless, in Fig. 1b, feasible concentration ranges when enzyme saturation is imposed as a hard constraint, in which case dark gray bars are infeasible, and when it is not, in which case dark gray bars are feasible, are both determined and compared.

The standard transformed Gibbs energy \({{{\Delta }}}_{r}{G}_{j}^{^{\prime} \circ }\) of reaction j is obtained from the standard transformed Gibbs energy of formation \({{{\Delta }}}_{f}{G}_{i}^{^{\prime} \circ }\) of reactants \(i\in {{\mathcal{I}}}_{\mathrm{c}}\cup {{\mathcal{I}}}_{\mathrm{p}}\) according to their coefficients in the stoichiometric matrix S. The intrinsic reaction hydrogen consumption ΔrNH,j, magnesium consumption ΔrNMg,j, and charge consumption ΔrZj are similarly defined. Reactions with ΔrZj > 0 and ΔrZj < 0 consume and produce negative charge in the cytoplasm by convention, respectively. In the main text, we generally refer to ΔrZj as the reaction charge consumption, irrespective of its sign. When necessary, we specifically refer to it as charge consumption or charge production to emphasize the sign of ΔrZj. The same conventions apply to ΔrNH,j and ΔrNMg,j. We use the same notation to denote extrinsic reaction hydrogen, magnesium, and charge consumption. These are defined as the product of their intrinsic counterparts and the respective flux. We clarify whichever is relevant in the context by explicitly stating their units (intrinsic and extrinsic consumptions are measured in mol/mol-rxn and mmol/gDW/h, receptively).

Since biochemical reactions involve transformations between reactants with multiple charge states that exchange ions with the cytoplasmic and periplasmic fluids, they generally do not balance charge, hydrogen, and magnesium ions. Indeed, the reaction charge, hydrogen, and magnesium consumption discussed above are defined to quantify this property. In the following discussion, we make these definitions more precise. Starting with charge consumption, we introduce three components for ΔrZj

$${{{\Delta }}}_{r}{Z}_{j}^{(S)}:=\left\{\begin{array}{ll}0\hfill&\;{\rm{Exporters}},\\ -\mathop{\sum}\limits_{i\in {{\mathcal{I}}}_{\mathrm{c}}\cup {{\mathcal{I}}}_{\mathrm{p}}}{S}_{ij}{\bar{z}}_{i}\hfill&\;{\rm{Otherwise}},\end{array}\right.$$
(33)
$${{{\Delta }}}_{r}{Z}_{j}^{(T)}:=\left\{\begin{array}{ll}\mathop{\sum}\limits_{i\in {{\mathcal{I}}}_{\mathrm{c}}}{S}_{ij}{\bar{z}}_{i}&\;{\rm{Exporters}},\\ \mathop{\sum}\limits_{i\in {{\mathcal{I}}}_{\mathrm{p}}}{S}_{ij}{\bar{z}}_{i}&\;{\rm{Importers}},\\ 0\hfill&\;{\rm{Otherwise}},\end{array}\right.$$
(34)
$${{{\Delta }}}_{r}{Z}_{j}^{(H)}:={\tau }_{{\rm{H}},j}$$
(35)

with τH,j the number of translocated hydrogen ions in reaction j, where τH,j = 0 for intracellular reactions, τH,j > 0 if hydrogen ion is translocated from the periplasm to cytoplasm, and τH,j < 0 if hydrogen ion is translocated from the cytoplasm to periplasm. Exporters are thermodynamically spontaneous in the direction, where reactants are transferred from the cytoplasm to periplasm, and importers are thermodynamically spontaneous in the opposite direction. Note that, \({{{\Delta }}}_{r}{Z}_{j}^{(S)}\) reflects the amount of charge exchanges between reactants and water due to hydrogen dissociation and magnesium binding. For intracellular reactions, charge exchanges occur within the cytoplasmic fluid. However, ambiguity may arise for transport reactions as to whether reactants exchange charge with the cytoplasmic or periplasmic fluids. To avoid this ambiguity, we assume that these exchanges occur on the product side of transport reactions (Supplementary Fig. S4). The definition given by Eq. (34) is in accordance with this assumption. The reaction charge consumption is defined

$${{{\Delta }}}_{r}{Z}_{j}:={{{\Delta }}}_{r}{Z}_{j}^{(S)}+{{{\Delta }}}_{r}{Z}_{j}^{(T)}+{{{\Delta }}}_{r}{Z}_{j}^{(H)}.$$
(36)

We term \({{{\Delta }}}_{r}{Z}_{j}^{(S)}\), \({{{\Delta }}}_{r}{Z}_{j}^{(T)}\), and \({{{\Delta }}}_{r}{Z}_{j}^{(H)}\) the stoichiometry, reactant-transport, and hydrogen-transport components. According to this definition, ΔrZj does not affect the overall charge balance of the cytoplasmic fluid for intracellular reactions. However, for transport reactions, it measures the net negative charge flowing into or out of the cell.

Reaction hydrogen consumption is an important quantity, elucidating the role of biochemical reactions in pH homeostasis. To properly define it, a key difference between ΔrZj and ΔrNH,j must be accounted for: While the goal of ΔrZj is to capture how biochemical reactions affect reactant-cytoplasmic fluid charge transfer and the overall charge balance of the cell, ΔrNH,j is defined to measure the contribution of a reaction to pHc. This is based on the assumption that maintaining a near-neutral pHc is a more essential constraint for the operation of biological networks than the overall hydrogen balance of the cell. Accordingly, reaction hydrogen consumption is defined by two components

$${{{\Delta }}}_{r}{N}_{{\rm{H}},j}:={{{\Delta }}}_{r}{N}_{{\rm{H}},j}^{(S)}+{{{\Delta }}}_{r}{N}_{{\rm{H}},j}^{(H)},$$
(37)

where

$${{{\Delta }}}_{r}{N}_{{\rm{H}},j}^{(S)}:=\left\{\begin{array}{ll}0\hfill&{\rm{Exporters}},\\ \mathop{\sum}\limits_{i\in {{\mathcal{I}}}_{\mathrm{c}}\cup {{\mathcal{I}}}_{\mathrm{p}}}{S}_{ij}{\bar{N}}_{{\rm{H}},j}+2{S}_{w,j}&{\rm{Otherwise}},\end{array}\right.$$
(38)
$${{{\Delta }}}_{r}{N}_{{\rm{H}},j}^{(H)}:=-{\tau }_{{\rm{H}},j}$$
(39)

with Sw,j the stoichiometric coefficient of water in reaction j (see “Standard transformed Gibbs energy of reaction” section). As with charge consumption, hydrogen-ion exchanges between reactants and water is assumed to occur on the product side of transport reactions in Eq. (38). Reaction magnesium consumption is defined in a similar manner

$${{{\Delta }}}_{r}{N}_{{\rm{Mg}},j}:={{{\Delta }}}_{r}{N}_{{\rm{Mg}},j}^{(S)},$$
(40)

where

$${{{\Delta }}}_{r}{N}_{{\rm{Mg}},j}^{(S)}:=\left\{\begin{array}{ll}0\hfill&\;{\rm{Exporters}},\\ \mathop{\sum}\limits_{i\in {{\mathcal{I}}}_{\mathrm{c}}\cup {{\mathcal{I}}}_{\mathrm{p}}}{S}_{ij}{\bar{N}}_{{\rm{Mg}},j}&\;{\rm{Otherwise}}.\end{array}\right.$$
(41)

In Eqs. (26)–(32), the concentration of cytoplasmic reactants, excluding the transmembrane ions, are the unknowns. The transmembrane-ion concentrations, Ic, and \({{\mathcal{B}}}_{\mathrm{c}}\) are condition- and strain-specific, the values of which are specified at the outset (Supplementary Table S2). All other quantities are the given parameters of the model (Supplementary Table S2). Periplasmic concentrations are assumed to be identical to those of the growth medium with a specified composition (M9 minimal medium54, detailed in Supplementary Table S1).

We define the CSS as a subset of the concentration space, where all the biophysical constraint are satisfied:

$${\mathcal{C}}:=\left\{{{\bf{C}}}_{\mathrm{c}}\in {{\mathbb{R}}}_{+}^{n}:{\rm{Eqs}}.\ (26)\!-\!(31)\ {\rm{hold}}\right\}$$
(42)

with \(n:=| {{\mathcal{I}}}_{\mathrm{c}}|\), \(n^{\prime} :=| {{\mathcal{I}}}_{\mathrm{p}}|\), and Cc a subvector of C corresponding to \({{\mathcal{I}}}_{\mathrm{c}}\). Geometrically, the CSS can be represented as the intersection of the affine subspace defined by the equality constraints Eqs. (26)–(30) and the thermodynamically concentration solution space corresponding to Eq. (31) (Fig. 1b). We characterize the CSS, which can generally be nonconvex and disconnected, using global optimization and sampling techniques. The first furnishes global bounds on metabolite concentrations and reaction energies, regardless of whether the CSS is connect or not. The second provides expectations of concentrations \({\mathbb{E}}({C}_{i})\) and reaction energies \({\mathbb{E}}({{{\Delta }}}_{r}{G}_{j}^{\prime})\), standard deviations of concentrations σ(Ci) and reaction energies \(\sigma ({{{\Delta }}}_{r}{G}_{j}^{\prime})\), and violation probabilities of thermodynamic constraints Pj. Here, we first identify an interior point q by solving a parametric optimization problem (Eq. (85)) that maximally distances q from the thermodynamic constraints, simultaneously avoiding arbitrarily small concentrations, which are biologically irrelevant. We found these solutions to be well-correlated with metabolomic data reported in the literature. We then explore a neighborhood of q by generating curves along random directions inside the CSS to sample the space and determine the thermodynamic constraints that are violated more frequently (see “Characterization of Concentration Solution Space” section).

We reformulate Eqs. (26)–(31) with respect to mole fractions to arrive at a compact dimensionless form

$$\left\{\begin{array}{l}{\bf{A}}{\bf{x}}={{\bf{f}}}^{{\rm{eq}}},\hfill\\ {C}_{{\mathrm{t}},{\mathrm{c}}}^{* }-{C}_{{\mathrm{t}},{\mathrm{p}}}^{* }+2\left[{I}_{\mathrm{c}}^{* }{f}^{\phi }({I}_{\mathrm{c}}^{* })-{I}_{\mathrm{p}}^{* }{f}^{\phi }({I}_{\mathrm{p}}^{* })\right]=1,\hfill\\ {{\bf{S}}}_{\mathrm{c}}^{{\rm{T}}}{\mathrm{ln}}\,{\bf{x}}\le {{\bf{f}}}^{{\rm{ineq}}},\hfill\\ {\bf{x}}\ge {\bf{0}}\hfill\end{array}\right.$$
(43)

with

$${\bf{A}}:=\left[\begin{array}{l}{\bar{{\bf{z}}}}^{{\rm{T}}}\\ {\bar{{\boldsymbol{\omega }}}}^{{\rm{T}}}\\ {\bar{{\boldsymbol{\beta }}}}^{{\rm{T}}}\\ {{\bf{1}}}_{1\times n}\end{array}\right],\ {{\bf{f}}}^{{\rm{eq}}}:=\left[\begin{array}{l}-({\sum }_{i\in {{\mathcal{J}}}_{\mathrm{c}}}{\bar{z}}_{i}{C}_{i}^{* })/{C}_{{\mathrm{t}},{\mathrm{c}}}^{* }\\ (2{I}_{\mathrm{c}}^{* }-{\sum }_{i\in {{\mathcal{J}}}_{\mathrm{c}}}{\bar{\omega }}_{i}{C}_{i}^{* })/{C}_{{\mathrm{t}},{\mathrm{c}}}^{* }\\ ({{\mathcal{B}}}_{\mathrm{c}}^{* }-{{\mathcal{B}}}_{\mathrm{w}}^{* }-{\sum }_{i\in {{\mathcal{J}}}_{\mathrm{c}}}{\bar{\beta }}_{i}{C}_{i}^{* })/{C}_{{\mathrm{t}},{\mathrm{c}}}^{* }\\ 1-({\sum }_{i\in {{\mathcal{J}}}_{\mathrm{c}}}{C}_{i}^{* })/{C}_{{\mathrm{t}},{\mathrm{c}}}^{* }\end{array}\right],$$
(44)

and

$${{\bf{f}}}^{{\rm{ineq}}}:=-\left[\begin{array}{l}\frac{{{{\Delta }}}_{r}{G}_{1}^{^{\prime} \circ }}{RT}+{\upsilon }_{1}{\mathrm{ln}}\,(\frac{{C}_{\mathrm{s}}}{{C}^{\circ }})+{\upsilon }_{1}^{({\mathrm{c}})}{\mathrm{ln}}\,{C}_{{\mathrm{t}},{\mathrm{c}}}^{* }+\mathop{\sum}\limits_{i\in {{\mathcal{I}}}_{\mathrm{p}}}{S}_{i1}{\mathrm{ln}}\,{C}_{i}^{^{\prime} * }\\ \vdots \\ \frac{{{{\Delta }}}_{r}{G}_{\mathrm{m}}^{^{\prime} \circ }}{RT}+{\upsilon }_{\mathrm{m}}{\mathrm{ln}}\,(\frac{{C}_{\mathrm{s}}}{{C}^{\circ }})+{\upsilon }_{\mathrm{m}}^{({\mathrm{c}})}{\mathrm{ln}}\,{C}_{{\mathrm{t}},{\mathrm{c}}}^{* }+\mathop{\sum}\limits_{i\in {{\mathcal{I}}}_{\mathrm{p}}}{S}_{i{\mathrm{m}}}{\mathrm{ln}}\,{C}_{i}^{^{\prime} * }\\ \end{array}\right],$$
(45)

where all the concentrations, ionic strengths, and buffer capacities are scaled with Cs: = ΔΠ/RT. Here, x: = Cc/Ct,c is the vector of cytoplasmic mole fractions, the superscript * denotes scaled parameters or scaled variable, \({\bf{C}}^{\prime}\) is the vector of periplasmic reactant concentrations, Sc is a submatrix of S containing all the rows corresponding to \({{\mathcal{I}}}_{\mathrm{c}}\), \({\upsilon }_{j}:={\sum }_{i\in {{\mathcal{I}}}_{\mathrm{c}}\cup {{\mathcal{I}}}_{\mathrm{p}}}{S}_{ij}\), \({\upsilon }_{j}^{({\mathrm{c}})}:={\sum }_{i\in {{\mathcal{I}}}_{\mathrm{c}}}{S}_{ij}\), \({\bf{x}}\in {{\mathbb{R}}}^{n}\), \({\bf{S}}\in {{\mathbb{R}}}^{(n+n^{\prime} )\times m}\), \({\bf{A}}\in {{\mathbb{R}}}^{\ell \times n}\), \({{\bf{f}}}^{{\rm{eq}}}\in {{\mathbb{R}}}^{\ell }\), \({{\bf{f}}}^{{\rm{ineq}}}\in {{\mathbb{R}}}^{m}\), \({\boldsymbol{\upsilon }}\in {{\mathbb{R}}}^{m}\), \({{\boldsymbol{\upsilon }}}^{({\mathrm{c}})}\in {{\mathbb{R}}}^{m}\) and \({{\bf{S}}}_{\mathrm{c}}\in {{\mathbb{R}}}^{n\times m}\) with the number of linear equalities (i.e., rows of A) in Eq. (43). Note that, because I* is passed to fϕ instead of Im in Eq. (43), the constants of Eq. (19) must be adjusted according to \({A}^{\phi }\,\mapsto\, {A}^{\phi }\sqrt{{C}_{\mathrm{s}}/{\hat{\rho }}_{\mathrm{w}}}\) and \({B}_{{\rm{PZ}}}^{\phi }\,\mapsto\, {B}_{{\rm{PZ}}}^{\phi }\sqrt{{C}_{\mathrm{s}}/{\hat{\rho }}_{\mathrm{w}}}\). In Eq. (43), x and \({C}_{{\mathrm{t}},{\mathrm{c}}}^{* }\) are unknown variables; all the other parameters are known and specified at the outset. The CSS in the mole-fraction space is defined

$${\mathcal{X}}:=\left\{{\bf{x}}\in {[0,1]}^{n}:{\rm{Eq}}.\ (43)\ {\rm{holds}}\right\}.$$
(46)

The CSS spans many orders of magnitude in the concentration and mole-fraction spaces. It has a highly irregular shape in these spaces, extending several orders of magnitude wider in some directions than others. Therefore, geometric characterization of the CSS (e.g., sampling, identifying the global minimum of a function over the CSS, or constructing bounding box) poses major computational challenges. Accumulation of large rounding errors is one of the main computational issues that arise from systems involving variables with contrasting magnitudes. To overcome these challenges, we use the logarithmic map \({{\Xi }}:={\bf{x}}\,\mapsto\, {\bf{y}}={\mathrm{ln}}\,{\bf{x}}\), reformulating Eqs. (43) and (46) into the forms

$$\left\{\begin{array}{l}{\bf{A}}\exp {\bf{y}}={{\bf{f}}}^{{\rm{eq}}},\hfill\\ {C}_{{\mathrm{t}},{\mathrm{c}}}^{* }-{C}_{{\mathrm{t}},{\mathrm{p}}}^{* }+2\left[{I}_{\mathrm{c}}^{* }{f}^{\phi }({I}_{\mathrm{c}}^{* })-{I}_{\mathrm{p}}^{* }{f}^{\phi }({I}_{\mathrm{p}}^{* })\right]=1,\\ {{\bf{S}}}_{\mathrm{c}}^{{\rm{T}}}{\bf{y}}\le {{\bf{f}}}^{{\rm{ineq}}},\hfill\\ {\bf{y}}\le {\bf{0}},\hfill\end{array}\right.$$
(47)
$${\mathcal{Y}}:=\left\{{\bf{y}}\in {{\mathbb{R}}}_{-}^{n}:{\rm{Eq.}}\ (47)\ {\rm{holds}}\right\},$$
(48)

in which it is more computationally tractable to characterize the CSS. We refer to the mole-fraction and logarithmic mole-fraction spaces as the X and Y space, respectively.

Enzyme kinetics

The reversible Michaelis–Menten mechanism55

$${\nu }_{1}^{-}{S}_{1}+\cdot \cdot +{\nu }_{{n}_{\mathrm{s}}}^{-}{S}_{{n}_{\mathrm{s}}}+E\,\rightleftharpoons\, {S}_{1}\cdot \cdot {S}_{{n}_{\mathrm{s}}}E\,\rightleftharpoons\, E{P}_{1}\cdot \cdot {P}_{{n}_{\mathrm{p}}}\,\rightleftharpoons\, E+{\nu }_{1}^{+}{P}_{1}+\cdot \cdot +{\nu }_{{n}_{\mathrm{p}}}^{+}{P}_{{n}_{\mathrm{p}}}$$
(49)

is one of the simplest mechanisms proposed to study the kinetics of biochemical reactions. It leads to the separable rate law55

$$v=[E]{k}_{{\rm{cat}}}^{+}\kappa \varsigma ,$$
(50)

where

$$\kappa :=\frac{\mathop{\prod }\nolimits_{i = 1}^{{n}_{\mathrm{s}}}{([{S}_{i}]/{K}_{{\mathrm{s}},i})}^{{\nu }_{i}^{-}}}{1+\mathop{\prod }\nolimits_{i = 1}^{{n}_{\mathrm{s}}}{([{S}_{i}]/{K}_{{\mathrm{s}},i})}^{{\nu }_{i}^{-}}+\mathop{\prod }\nolimits_{i = 1}^{{n}_{\mathrm{p}}}{([{P}_{i}]/{K}_{{\mathrm{p}},i})}^{{\nu }_{i}^{+}}},$$
(51)

and

$$\varsigma :=1-\exp ({{{\Delta }}}_{r}G^{\prime} /RT)$$
(52)

are the enzyme-saturation and thermodynamic efficiencies. Here, Ks,i and Kp,i are the Michaelis constants of the substrates and products with κ the enzyme-saturation efficiency, ς thermodynamic efficiency, [E] total enzyme concentration, \({k}_{{\rm{cat}}}^{+}\) the turnover number, ns the number of substrates, and np the number of products. Note that the specific form of Eqs. (50)–(52) depends on the rate law describing the individual steps of the underlying mechanism, and the assumption that enzyme complexes always remain at a steady state. It is clear from Eq. (50) that to draw maximum flux at minimum protein cost, both efficiencies must be maximized. However, the maximum saturation efficiency (i.e., κ → 1 achieved when [Si] Ks,i and [Pi] Kp,i) cannot be realized by every reaction in the network because an unbounded increase in all substrate and product concentrations may conflict with other constraints imposed on the network, such as osmotic balance and molecular crowding. Therefore, a more relaxed efficiency criterion is usually used, where a reaction is considered efficient when it is at least half-saturated in one direction ([Si] ≥ Ks,i in the forward or [Pi] ≥ Kp,i in the backward direction)13,17. Reversible reactions are required to be half-saturated or more in both directions to be considered efficient. For reactant i, we define the maximum Michaelis constant

$${K}_{{\rm{m}},i}^{\max }:=\mathop{\max}\limits_{j\in \{1\cdot \cdot m\}}{K}_{{{{\mathrm{m}},i,j}}},$$
(53)

where Km,i,j stands for Ks,i and Kp,i in Eq. (51) that are associated with reaction j. If the reactant i does not participate in the reaction j, then Km,i,j = 0. Accordingly, the enzyme-saturation-efficiency criterion for the entire network is defined

$${C}_{i}\ge {K}_{{\rm{m}},i}^{\max },\ \forall i\in {{\mathcal{I}}}_{\mathrm{c}}.$$
(54)

If Eq. (54) is not satisfied for reactant i, it implies that at least one of the reactions it participates in is undersaturated. Note that, the saturation-efficiency criterion in this equation is a strong notion of efficiency. It requires reversible enzymes to be half-saturated regardless of the reaction direction or the growth conditions. However, it may be relaxed to describe condition-specific phenotypes by requiring enzymes to be half-saturated only in the direction, in which the respective reaction proceeds.

Composition of reactants

We discussed in previous sections how to represent the abiotic constraints with respect to effective quantities. These were defined as weighted averages of the respective quantity for species constituting a reactant. The mole fractions of species were the weights in these definitions. Here, we show how to compute these mole fractions from binding constants for a reactant \({\mathcal{A}}\). First, we derive expressions to relate hydrogen dissociation and magnesium binding constants at a given ionic strength to those evaluated at the reference condition (infinite dilution at the same pressure and temperature, where I → 0)

$${K}_{0,k}={K}_{0,k}^{\circ \circ }\exp [-2(1-k-{z}_{{\rm{A}}}){f}^{\gamma }({I}^{* })],\ k=1\cdot \cdot N,$$
(55)
$${K}_{1,k}={K}_{1,k}^{\circ \circ }\exp [4(k+{z}_{{\rm{A}}}){f}^{\gamma }({I}^{* })],\ k=0\cdot \cdot N,$$
(56)
$${K}_{2,k}={K}_{2,k}^{\circ \circ }\exp [4(2+k+{z}_{{\rm{A}}}){f}^{\gamma }({I}^{* })],\ k=0\cdot \cdot N,$$
(57)

where \({K}_{0,k}^{\circ \circ }\), \({K}_{1,k}^{\circ \circ }\), and \({K}_{2,k}^{\circ \circ }\) are reference equilibrium constants, and zA is the charge of the minimum charge state of \({\mathcal{A}}\) (see “Reactants and species in biochemical reactions” section). Moreover, we define K0,0: = 1 for notational convenience. Next, we introduce the binding polynomial

$${\mathcal{P}}:=\mathop{\sum }\limits_{k=0}^{N}\left(1+\frac{[{\rm{Mg}}]}{{K}_{1,k}}+\frac{{[{\rm{Mg}}]}^{2}}{{K}_{1,k}{K}_{2,k}}\right)\frac{{[{\rm{H}}]}^{k}}{\mathop{\prod }\nolimits_{i = 0}^{k}{K}_{0,i}}$$
(58)

for hydrogen-dissociation and magnesium-binding equilibria given by Eqs. (1)–(3). Finally, we obtain the species mole fractions

$${\rho }_{{{\rm{H}}}_{k}{\rm{A}}}=\frac{1}{{\mathcal{P}}}\frac{{[{\rm{H}}]}^{k}}{\mathop{\prod }\nolimits_{i = 0}^{k}{K}_{0,i}},\ k=0\cdot \cdot N,$$
(59)
$${\rho }_{{\rm{Mg}}{{\rm{H}}}_{k}{\rm{A}}}=\frac{1}{{\mathcal{P}}}\frac{[{\rm{Mg}}]}{{K}_{1,k}}\frac{{[{\rm{H}}]}^{k}}{\mathop{\prod }\nolimits_{i = 0}^{k}{K}_{0,i}},\ k=0\cdot \cdot N,$$
(60)
$${\rho }_{{{\rm{Mg}}}_{2}{{\rm{H}}}_{k}{\rm{A}}}=\frac{1}{{\mathcal{P}}}\frac{{[{\rm{Mg}}]}^{2}}{{K}_{1,k}{K}_{2,k}}\frac{{[{\rm{H}}]}^{k}}{\mathop{\prod }\nolimits_{i = 0}^{k}{K}_{0,i}},\ k=0\cdot \cdot N,$$
(61)

Formation energy of species from formation energy of minimum charge state

Before proceeding to calculate the formation energy of reactants, we provide useful expressions to ascertain the formation energy of species from that of the minimum charge state. The formation energies of all the charge states arising from Eqs. (1)–(3) can be obtained from \({{{\Delta }}}_{f}{G}_{{\rm{A}}}^{\circ \circ }\) according to

$${{{\Delta }}}_{f}{G}_{{{\rm{H}}}_{k}{\rm{A}}}^{\circ \circ }={{{\Delta }}}_{f}{G}_{{\rm{A}}}^{\circ \circ }+RT{\mathrm{ln}}\,\left(\mathop{\prod }\limits_{i=0}^{k}{K}_{0,i}^{\circ \circ }\right),\ k=0\cdot \cdot N,$$
(62)
$${{{\Delta }}}_{f}{G}_{{\rm{Mg}}{{\rm{H}}}_{k}{\rm{A}}}^{\circ \circ }={{{\Delta }}}_{f}{G}_{{\rm{Mg}}}^{\circ \circ }+{{{\Delta }}}_{f}{G}_{{\rm{A}}}^{\circ \circ }+RT{\mathrm{ln}}\,\left({K}_{1,k}^{\circ \circ }\mathop{\prod }\limits_{i=0}^{k}{K}_{0,i}^{\circ \circ }\right),\ k=0\cdot \cdot N,$$
(63)
$${{{\Delta }}}_{f}{G}_{{{\rm{Mg}}}_{2}{{\rm{H}}}_{k}{\rm{A}}}^{\circ \circ }=2{{{\Delta }}}_{f}{G}_{{\rm{Mg}}}^{\circ \circ }+{{{\Delta }}}_{f}{G}_{{\rm{A}}}^{\circ \circ }+RT{\mathrm{ln}}\,\left({K}_{1,k}^{\circ \circ }{K}_{2,k}^{\circ \circ }\mathop{\prod }\limits_{i=0}^{k}{K}_{0,i}^{\circ \circ }\right),k=0\cdot \cdot N.$$
(64)

Note that, in deriving these equations, we substituted the reference formation energy of hydrogen ion \({{{\Delta }}}_{f}{G}_{{\rm{H}}}^{\circ \circ }=0\) kJ/mol, and explicitly expressed the reference formation energy of magnesium ion \({{{\Delta }}}_{f}{G}_{{\rm{Mg}}}^{\circ \circ }=-455.3\) kJ/mol25.

Standard transformed Gibbs energy of formation of reactants

Following the development of Alberty25, the standard transformed Gibbs energy of formation of the species \({\rm{A}}^{\prime} \in {\mathcal{A}}\) is written

$${{{\Delta }}}_{f}{G}_{{\rm{A}}^{\prime} }^{\prime}={{{\Delta }}}_{f}{G}_{{\rm{A}}^{\prime} }^{^{\prime} \circ }+RT{\mathrm{ln}}\,([{\rm{A}}^{\prime} ]/{C}^{\circ }),$$
(65)

where \({{{\Delta }}}_{f}{G}_{{\rm{A}}^{\prime} }^{^{\prime} \circ }\) is the standard transformed Gibbs energy of formation evaluated at the same ionic strength, pH, and pMg, accounting for all the nonidealities and external force fields. It can, in turn, be expressed as

$${{{\Delta }}}_{f}{G}_{{\rm{A}}^{\prime} }^{^{\prime} \circ } = \, {{{\Delta }}}_{f}{G}_{{\rm{A}}^{\prime} }^{\circ }-{N}_{{\rm{H}},{\rm{A}}^{\prime} }({{{\Delta }}}_{f}{G}_{{\rm{H}}}^{\circ }+RT{\mathrm{ln}}\,1{0}^{-{\rm{pH}}})\\ \, -{N}_{{\rm{Mg}},{\rm{A}}^{\prime} }({{{\Delta }}}_{f}{G}_{{\rm{Mg}}}^{\circ }+RT{\mathrm{ln}}\,1{0}^{-{\rm{pMg}}})+{z}_{{\rm{A}}^{\prime} }F\psi .$$
(66)

The individual components of \({{{\Delta }}}_{f}{G}_{{\rm{A}}^{\prime} }^{^{\prime} \circ }\) are given by

$${{{\Delta }}}_{f}{G}_{{\rm{A}}^{\prime} }^{\circ }={{{\Delta }}}_{f}{G}_{{\rm{A}}^{\prime} }^{\circ \circ }+RT{\mathrm{ln}}\,{\gamma }_{{\rm{A}}^{\prime} },$$
(67)
$${{{\Delta }}}_{f}{G}_{{\rm{H}}}^{\circ }={{{\Delta }}}_{f}{G}_{{\rm{H}}}^{\circ \circ }+RT{\mathrm{ln}}\,{\gamma }_{{\rm{H}}},$$
(68)
$${{{\Delta }}}_{f}{G}_{{\rm{Mg}}}^{\circ }={{{\Delta }}}_{f}{G}_{{\rm{Mg}}}^{\circ \circ }+RT{\mathrm{ln}}\,{\gamma }_{{\rm{Mg}}}.$$
(69)

Here, F is the Faraday constant, ψ is the external potential field at the point where Gibbs energy is evaluated, and the superscript denotes the reference state of Gibbs energy at infinite dilution of the respective species. The standard transformed Gibbs energy of formation of reactant \({\mathcal{A}}\) is expressed with respect to that of species

$$\exp \left(-\frac{{{{\Delta }}}_{f}{G}_{{\mathcal{A}}}^{^{\prime} \circ }}{RT}\right)=\mathop{\sum}\limits_{{\rm{A}}^{\prime} \in {\mathcal{A}}}\exp \left(-\frac{{{{\Delta }}}_{f}{G}_{{\rm{A}}^{\prime} }^{^{\prime} \circ }}{RT}\right),$$
(70)

which can be reformulated to

$$\exp \left(-\frac{{{{\Delta }}}_{f}{G}_{{\mathcal{A}}}^{^{\prime} \circ }}{RT}\right)=\exp \left(-\frac{{{{\Delta }}}_{f}{G}_{{\rm{A}}}^{^{\prime} \circ }}{RT}\right)\mathop{\sum}\limits_{{\rm{A}}^{\prime} \in {\mathcal{A}}}\exp \left(\widehat{\frac{{{{\Delta }}}_{f}{G}_{{\rm{A}}^{\prime} }^{^{\prime} \circ }}{RT}}\right),$$
(71)

where \({{{\Delta }}}_{f}{G}_{{\mathcal{A}}}^{^{\prime} \circ }\) is the standard transformed Gibbs energy of formation of \({\mathcal{A}}\), \({{{\Delta }}}_{f}{G}_{{\rm{A}}}^{^{\prime} \circ }\) the standard transformed Gibbs energy of formation of the minimum charge state, and \({\widehat{{{\Delta }}_{f}{G}_{{\rm{A}}^{\prime} }^{^{\prime} \circ }}}:={{{\Delta }}}_{f}{G}_{{\rm{A}}}^{^{\prime} \circ }-{{{\Delta }}}_{f}{G}_{{\rm{A}}^{\prime} }^{^{\prime} \circ }\). Substituting Eqs. (66)–(69) in Eq. (71), we arrive at

$$\frac{{{{\Delta }}}_{f}{G}_{{\mathcal{A}}}^{^{\prime} \circ }}{RT}= \, \frac{{{{\Delta }}}_{f}{G}_{{\rm{A}}}^{\circ \circ }}{RT}+{N}_{{\rm{H}},{\rm{A}}}[{\mathrm{ln}}\,(10){\rm{pH}}-{\mathrm{ln}}\,{\gamma }_{{\rm{H}}}]\\ \,-{\mathrm{ln}}\,\left\{\mathop{\sum }\limits_{k=0}^{N}\frac{{\gamma }_{{\rm{H}}}^{k}1{0}^{-k{\rm{pH}}}}{{\gamma }_{{{\rm{H}}}_{k}{\rm{A}}}\mathop{\prod }\nolimits_{i = 0}^{k}{K}_{0,i}^{\circ \circ }}\exp \left[-({z}_{{\rm{A}}}+k)\frac{F\psi }{RT}\right]\right.\\ \, +\mathop{\sum }\limits_{k=0}^{N}\frac{{\gamma }_{{\rm{H}}}^{k}1{0}^{-k{\rm{pH}}}{\gamma }_{{\rm{Mg}}}1{0}^{-{\rm{pMg}}}}{{\gamma }_{{\rm{Mg}}{{\rm{H}}}_{k}{\rm{A}}}{K}_{1,k}^{\circ \circ }\mathop{\prod }\nolimits_{i = 0}^{k}{K}_{0,i}^{\circ \circ }}\exp \left[-({z}_{{\rm{A}}}+k+2)\frac{F\psi }{RT}\right]\\ \, +\left.\mathop{\sum }\limits_{k=0}^{N}\frac{{\gamma }_{{\rm{H}}}^{k}1{0}^{-k{\rm{pH}}}{\gamma }_{{\rm{Mg}}}^{2}1{0}^{-2{\rm{pMg}}}}{{\gamma }_{{{\rm{Mg}}}_{2}{{\rm{H}}}_{k}{\rm{A}}}{K}_{1,k}^{\circ \circ }{K}_{2,k}^{\circ \circ }\mathop{\prod }\nolimits_{i = 0}^{k}{K}_{0,i}^{\circ \circ }}\exp \left[-({z}_{{\rm{A}}}+k+4)\frac{F\psi }{RT}\right]\right\}$$
(72)

with NH,A and \({{{\Delta }}}_{f}{G}_{{\rm{A}}}^{\circ \circ }\) denoting the hydrogen content and reference formation energy of the minimum charge state. Note that, the second and third sums in Eq. (72) correspond to magnesium-bound states, while the first sum arises from protonation states. We estimated the standard transformed Gibbs energy of reactions with and without magnesium-bound states. However, all the results presented in the main text were computed by neglecting the charge states arising from magnesium bindings. As previously stated, we obtained reaction energies using the reference formation energies of species furnished by group-contribution methods45. These methods tune their parameters based on expressions that only account for protonation states. Therefore, using these formation energies in Eq. (72) with all the charge states included introduces errors into reaction-energy computations, which are no longer bounded by the guaranteed uncertainty bounds of group-contribution methods.

Standard transformed Gibbs energy of reaction

The standard transformed Gibbs energy of reactions and apparent equilibrium constants are obtained from the standard transformed Gibbs energy of formation of reactants

$$\frac{{{{\Delta }}}_{r}{G}_{j}^{^{\prime} \circ }}{RT}=\mathop{\sum}\limits_{i\in {{\mathcal{I}}}_{\mathrm{c}}\!\cup {{\mathcal{I}}}_{\mathrm{p}}}{S}_{ij}\frac{{{{\Delta }}}_{f}{G}_{i}^{^{\prime} \circ }}{RT}+\frac{{{{\Delta }}}_{r}{G}_{{\rm{H}},j}}{RT}+{S}_{{\mathrm{w}},j}\frac{{{{\Delta }}}_{f}{G}_{\mathrm{w}}^{^{\prime} \circ }}{RT},$$
(73)
$${K}_{j}^{\prime}:=\exp \left(-\frac{{{{\Delta }}}_{r}{G}_{j}^{^{\prime} \circ }}{RT}\right),$$
(74)

where j is the reaction index, and

$$\frac{{{{\Delta }}}_{r}{G}_{{\rm{H}},j}}{RT}={\tau }_{{\rm{H}},j}\left[{\mathrm{ln}}\,\left(\frac{{\gamma }_{{\rm{H}}}({I}_{\mathrm{c}}){[{\rm{H}}]}_{\mathrm{c}}}{{\gamma }_{{\rm{H}}}({I}_{\mathrm{p}}){[{\rm{H}}]}_{\mathrm{p}}}\right)+\frac{F{{\Delta }}\psi }{RT}\right]$$
(75)

is the proton motive force associated with reaction j with Δψ: = ψc − ψp the membrane potential. The standard transformed Gibbs energy of reaction j is

$$\frac{{{{\Delta }}}_{r}{G}_{j}^{\prime}}{RT}=\frac{{{{\Delta }}}_{r}{G}_{j}^{^{\prime} \circ }}{RT}+{\mathrm{ln}}\,{{{\Gamma }}}_{j},\ {{{\Gamma }}}_{j}:=\mathop{\prod}\limits_{i\in {{\mathcal{I}}}_{\mathrm{c}}\cup {{\mathcal{I}}}_{\mathrm{p}}}{\left(\frac{{C}_{i}}{{C}^{\circ }}\right)}^{{S}_{ij}},$$
(76)

where Γj is the quotient of reaction j. Note that, in this formulation, the concentrations of hydrogen ion and water do not contribute to the reaction quotients nor do their stoichiometric coefficient to S. Instead, the concentrations and stoichiometric coefficients of water and hydrogen ion are implicitly accounted for in Eq. (73). The standard transformed Gibbs energy of formation of water is given by

$$\frac{{{{\Delta }}}_{f}{G}_{\mathrm{w}}^{^{\prime} \circ }}{RT}=\frac{{{{\Delta }}}_{f}{G}_{\mathrm{w}}^{\circ \circ }}{RT}+{\mathrm{ln}}\,{\gamma }_{\mathrm{w}}-2{\mathrm{ln}}\,{\gamma }_{{\rm{H}}}+2{\mathrm{ln}}\,(10){\rm{pH}}.$$
(77)

In biological systems, γw ≈ 1 because the concentration of reactants are negligible compared to water25. In group-contribution methods, γw is assumed to be a constant and lumped into the reference formation energy \({{{\Delta }}}_{f}{G}_{\mathrm{w}}^{\circ \circ }\), and the resulting expression is used to fit the model parameters (i.e., formation energies of species at infinite dilution) to equilibrium data. In this work, we adopt \({{{\Delta }}}_{f}{G}_{\mathrm{w}}^{\circ \circ }=-238.7\) kJ/mol45.

Growth medium and model parameters

As previously stated, the concentration of cytoplasmic metabolites, excluding the transmembrane ions, are the only unknowns in Eqs. (26)–(32). The goal in our model is to characterize the CSS for any given growth medium, the composition and parameters of which are fully specified at the outset. The results presented in the main text all describe the metabolic state of E. coli growing in an M9 minimal medium (Supplementary Table S1) during the exponential growth phase. From the composition of the growth medium, we computed the concentrations of all the periplasmic ions, ascertaining effective charges and Ip.

Besides the composition of the growth medium, several other parameters must be specified at the outset to determine intracellular concentration ranges using Eqs. (26)–(32). We chose characteristic parameters associated with the exponential growth of E. coli (Supplementary Table S2) to study feasible intracellular concentrations, and how they are restricted by abiotic constraints. All other parameters used in this work to characterize the CSS for all carbon sources are tabulated in Supplementary Data 14.

Characterization of concentration solution space

In this section, we present two methods for characterizing the CSS constructed by the abiotic constraints outlined previously. First, we compute global bounds on cytoplasmic concentrations and reaction energies using global optimization techniques to determined their respective biologically feasible ranges. The lower bounds are obtained by solving

$${y}_{i,\min }:=\mathop{\min }\limits_{{\bf{y}}\in {\mathcal{Y}}}{y}_{i},\ i\in {{\mathcal{I}}}_{\mathrm{c}},$$
(78)
$${{{\Delta }}}_{r}{G}_{j,\min }^{\prime}:=\mathop{\min }\limits_{{\bf{y}}\in {\mathcal{Y}}}{{{\Delta }}}_{r}{G}_{j}^{\prime},\ j=1\cdot \cdot m,$$
(79)

where \({\mathcal{Y}}\) is given by Eqs. (47) and (48). The upper bounds are similarly determined. Note that, because the CSS is generally nonconvex and can be disconnected, these global bounds only provide intervals, into which feasible concentrations can fall. For example, when \({\mathcal{Y}}\) is disconnected, \({y}_{i}\in [{y}_{i,\min },{y}_{i,\max }]\) does not guarantee that \(y\in {\mathcal{Y}}\). Accordingly, \({y}_{i}\in [{y}_{i,\min },{y}_{i,\max }]\) provides a necessary, but not sufficient, condition for \(y\in {\mathcal{Y}}\). Therefore, the feasible concentration ranges defined by global concentration bounds should not be interpreted as intervals, in which the concentration of individual metabolites can freely vary independently of other metabolites.

In the concentration space, the feasible concentration ranges can span several orders of magnitude possibly over a disconnected CSS. Therefore, it is more convenient to perform global optimization computations in the Y space. Second, we compute expectations and standard deviations of concentrations and reaction energies. Given a random variable \(X:{\mathcal{C}}\,\mapsto\, {\mathbb{R}}\) defined over the CSS, the expectation and standard deviation of X are defined as

$${\mathbb{E}}(X):={\int}_{{\mathcal{C}}}X{\rm{d}}\mu ,$$
(80)
$$\sigma (X):=\sqrt{{\mathbb{E}}({(X-\hat{X})}^{2})},$$
(81)

where \(\hat{X}:={\mathbb{E}}(X)\), and μ is a probability measure of X, encapsulating all the intracellular dynamics and the related parameter uncertainties (e.g., kinetic parameters, protein structure, enzyme saturation, and spacial gradients). This probability measure, which is expected to be condition specific, is poorly understood in biological systems. Without any information about the probability measure for the organism of interest, the most natural choice for μ is the volume measure. However, as previously stated, \({\mathcal{C}}\) is generally a nonconvex and disconnected set, rendering volume integrals difficult to compute. Therefore, we further simplify our analysis by using a line measure to approximate expectations and standard deviations. We compute these line integrals along random curves generated in the equality-constraint manifold (Supplementary Fig. S5a, b). Since all these computations are performed in the Y space, we first describe briefly the mathematical formulation of the CSS in this space.

Let \({\mathcal{M}}\) denote the abstract (n − )-dimensional manifold in the Y space defined by the linear equalities in Eq. (43) and \(\bar{{\mathcal{M}}}\) its isometric embedding in \({{\mathbb{R}}}^{n}\) furnished by the embedding map \(\iota :{\mathcal{M}}\,\hookrightarrow\, Y\) (Supplementary Fig. S5c). We chose this particular embedding to simplify the construction and computation of the foregoing line integrals. Moreover, let \({\bf{{\mathscr{A}}}}:={{\bf{A}}}^{{\rm{T}}}\), \({\bf{b}}:={({{\bf{f}}}^{{\rm{eq}}})}^{{\rm{T}}}\), and

$${{\mathcal{F}}}^{i}(y):=\exp {y}^{j}{{\mathscr{A}}}_{j}^{i}-{b}^{i}.$$
(82)

Note that, in previous sections, we did not distinguish between subscripts and superscripts to denote the components of vectors and covectors. Vectors were treated as n-tuples, the components of which were represented by subscripts (column vector in matrix form) without reference to the space they were associated with. However, in this section, the components of vectors and covector are denoted by superscripts and subscripts, which are represented by a row and column vectors in matrix form, respectively. The same convention applies in general to covariant and contravariant components of tensors. We also adopt the Einstein summation rule, where repeating a dummy index as a superscript and subscript implies summation over an appropriate range of the index. Assuming that A has a full rank (i.e., rank(A) =  with  < n), we have

$$\bar{{\mathcal{M}}}:=\iota ({\mathcal{M}})=\{{\bf{y}}\in {{\mathbb{R}}}^{n}:{\mathcal{F}}({\bf{y}})=0\}.$$
(83)

The tangent space at a point y on \(\bar{{\mathcal{M}}}\) is given by \({T}_{{\bf{y}}}\bar{{\mathcal{M}}}\) = ker \((D{\mathcal{F}})\)56. In matrix form, the derivative of \({\mathcal{F}}\) can be written \(D{\mathcal{F}}={\bf{A}}{\bf{E}}\), where \({\bf{E}}:={\rm{diag}}(\exp {\bf{y}})\). Explicit expressions can be derived to construct coordinate charts φ for \({\mathcal{M}}\) using null-space bases of A. Let \({\bf{N}}\in {{\mathbb{R}}}^{n\times (n-\ell )}\) be a matrix containing an orthonormal basis of ker (A). Then, \({{\Phi }}:{\boldsymbol{\chi }}\,\mapsto\, {\mathrm{ln}}\,({\bf{x}}+{\boldsymbol{\chi }}{{\bf{N}}}^{{\rm{T}}})\) with \({\boldsymbol{\chi }}\in {{\mathbb{R}}}^{n-\ell }\) contains information about the coordinate chart φ at the point \({\bf{y}}={\mathrm{ln}}\,{\bf{x}}\) because Φ = ι φ−1. The components of χ can be regarded as the coordinates of φ at the point y. We can now define an induced metric57 for \({\mathcal{M}}\) according to

$${g}_{kl}: ={g}_{q}\left(\frac{\partial }{\partial {\chi }^{k}},\frac{\partial }{\partial {\chi }^{l}}\right)= {\bar{g}}_{y}\left(\iota_{\ast}\left(\frac{\partial }{\partial {\chi }^{k}}\right),\iota_{\ast} \left(\frac{\partial }{\partial {\chi }^{l}}\right)\right)\\ ={\bar{g}}_{y}\left({{\mathcal{G}}}_{k}^{i}\frac{\partial }{\partial {y}^{i}},{{\mathcal{G}}}_{l}^{j}\frac{\partial }{\partial {y}^{j}}\right)={{\mathcal{G}}}_{k}^{i}{{\mathcal{G}}}_{l}^{j}{\delta }_{ij},$$
(84)

where δ is the Kronecker delta tensor, y = ι(q), and \({\bar{g}}_{ij}:={\bar{g}}_{y}(\partial /\partial {y}^{i},\partial /\partial {y}^{j})={\delta }_{ij}\) is the standard Euclidean metric that \(Y\equiv {{\mathbb{R}}}^{n}\) is equipped with. Here, ι* is the derivative of ι, furnishing \({T}_{{\bf{y}}}\bar{{\mathcal{M}}}{ = \iota }_{* }({T}_{q}{\mathcal{M}})\), and \({\mathcal{G}}:=D{{{\Phi }}}^{{\rm{T}}}={{\bf{N}}}^{{\rm{T}}}{{\bf{E}}}^{-1}\) in matrix form.

Before generating random curves in \(\bar{{\mathcal{M}}}\), we first require a feasible point of the CSS, preferably far away from all its boundaries, in the Y space. We generate a one-parameter family of interior points by solving

$$\left\{\begin{array}{l}{{\bf{y}}}_{q}(w):={\rm{arg}}\ \mathop{\max}\limits_{r,{\bf{y}}}\ r-w\parallel {\bf{y}}{\parallel }_{2},\hfill\\ {\rm{s.t.}}\hfill\\ \exp {\bf{y}}{\bf{{\mathscr{A}}}}={\bf{b}},\hfill\\ {\bf{y}}{{\bf{s}}}_{(j)}+r\parallel {{\bf{s}}}_{(j)}{\parallel }_{2}\le {f}_{j}^{{\rm{ineq}}},\ j=1\cdot \cdot m,\end{array}\right.$$
(85)

where s(j) denotes the jth column of Sc. Note that, to derive Eq. (85), we solved the second equation in the system Eq. (47) for \({C}_{{\mathrm{t}},{\mathrm{c}}}^{* }\), the value of which is substituted in feq and fineq, so it is not explicitly included in the system Eq. (85). Observe that, yq(0) is the Chebyshev center (ref. 58, Sect. 8.5.1) of the polyhedron defined by the thermodynamic constraints in the Y space that is restricted to \(\bar{{\mathcal{M}}}\). For w > 0, the objective function identifies an interior point of this polyhedron that is maximally distanced from all the thermodynamic constraints with respect to the second norm in the Y space, while avoiding arbitrarily large negative values for y. This provides biologically relevant points inside the CSS.

Several approaches can be taken to generate random curves in \(\bar{{\mathcal{M}}}\) from y. Here, we outline a computationally tractable technique. First, a vector u = uk∂/∂χk is generated in \({T}_{q}{\mathcal{M}}\) using the basis ∂/∂χk associated with the coordinate chart φ discussed before, where the components uk [−1, 1] are random numbers. The embedding of u in \({T}_{{\bf{y}}}\bar{{\mathcal{M}}}\) can be readily obtained according to \(\bar{{\bf{u}}}{ = \iota }_{* }({\bf{u}})={\bar{u}}^{i}\partial /\partial {y}^{i}\), where \({\bar{u}}^{i}={u}^{k}{{\mathcal{G}}}_{k}^{i}\). Next, a line \(\bar{{\mathcal{L}}}\) is constructed in \({T}_{{\bf{y}}}\bar{{\mathcal{M}}}\) that passes through y along the tangent vector \(\bar{{\bf{u}}}\) with the parametric representation \(\bar{{\mathcal{L}}}:=\{{\bf{y}}^{\prime} \in {{\mathbb{R}}}^{n}:{\bf{y}}^{\prime} ={\bf{y}}+\bar{{\bf{u}}}t,t\in [0,{t}_{\max }]\}\). Finally, a trajectory \(\bar{{\mathcal{T}}}\) is constructed in \(\bar{{\mathcal{M}}}\) by finding the orthogonal projection of \(\bar{{\mathcal{L}}}\) onto \(\bar{{\mathcal{M}}}\). This is accomplished by solving

$$\left\{\begin{array}{l}{\bf{y}}^{\prime} (t):={\rm{arg}}\ \mathop{\min}\limits_{{\bf{y}}^{\prime\prime} \in {{\mathbb{R}}}^{n}}\ \frac{1}{2}\parallel {\bf{y}}^{\prime\prime} -{\bf{y}}-\bar{{\bf{u}}}t{\parallel }_{2},\\ {\rm{s.t.}}\hfill\\ {\bf{A}}\exp {{\bf{y}}}^{^{\prime\prime} {\rm{T}}}={{\bf{b}}}^{{\rm{T}}}\hfill\\ \end{array}\right.$$
(86)

with

$$\bar{{\mathcal{T}}}:=\{{\bf{y}}^{\prime} \in {{\mathbb{R}}}^{n}:{\bf{y}}^{\prime} \,{\rm{solves}}\ {\rm{Eq}}.\;(86),t\in [0,{t}_{\max }]\}.$$
(87)

The Karush–Kuhn–Tucker (KKT) conditions (ref. 58, Sect. 5.5.3) for Eq. (86) reads

$$\left\{\begin{array}{l}{\bf{y}}^{\prime\prime} +{{\boldsymbol{\lambda }}}^{{\rm{T}}}{\bf{A}}{\bf{E}}={\bf{y}}+\bar{{\bf{u}}}t,\\ {\bf{A}}\exp {{\bf{y}}}^{^{\prime\prime} {\rm{T}}}={{\bf{b}}}^{{\rm{T}}},\hfill\\ \end{array}\right.$$
(88)

where λ is the dual variable corresponding to the equalities in Eq. (86). It is also a column vector associated with the co-normal space \({N}_{{\bf{y}}}^{* }\bar{{\mathcal{M}}}\). Moreover, b is associated with the normal space \({N}_{{\bf{y}}}\bar{{\mathcal{M}}}\) and is a row vector. All the other vectors are associated with \({T}_{{\bf{y}}}\bar{{\mathcal{M}}}\) and are row vectors. Differentiating the KKT system Eq. (88) with respect to t, we arrive at

$$\left\{\begin{array}{l}\dot{{\bf{y}}}^{\prime\prime} +{\dot{{\boldsymbol{\lambda }}}}^{{\rm{T}}}{\bf{A}}{\bf{E}}+{{\boldsymbol{\lambda }}}^{{\rm{T}}}{\bf{A}}{\mathcal{E}}{\dot{{\bf{y}}}}^{^{\prime\prime} {\rm{T}}}=\bar{{\bf{u}}},\\ {\bf{A}}{\bf{E}}{\dot{{\bf{y}}}}^{^{\prime\prime} {\rm{T}}}={\bf{0}},\hfill\\ \end{array}\right.$$
(89)

where overdot denotes derivatives with respect to t, and \({\mathcal{E}}\in {{\mathbb{R}}}^{n\times n\times n}\) is defined as

$${{\mathcal{E}}}_{k}^{ij}:=\left\{\begin{array}{ll}\exp {y}^{^{\prime\prime} i},\ &i=j=k,\\ 0,\hfill\ &{\rm{Otherwise}}.\end{array}\right.$$
(90)

Equation (89) is integrated with respect to t subject to the initial conditions y″(0) = y and λ(0) = 0 until one of the thermodynamic constraints is violated. Note that, the constraint y ≤ 0 need not be checked directly because one of the equality constraints in Eq. (89) forces the sum of the metabolite mole fractions to be strictly less than one. As a result, yi = 0 can never be realized for any metabolites along trajectories constructed by solving Eq. (89). We formulated the global optimization problems Eqs. (78), (79), and (85) in the General Algebraic Modeling System (GAMS)59 and solved using the global solver BARON60 equipped with the local nonlinear programming solver CONOPT and the linear programming solver CPLEX. Equation (89) was solved using standard time integrators in MATLAB. All the custom codes used to general the results presented in this paper can be found in Supplementary Code 1.

Once random trajectories \({\{{\bar{\mathcal{T}}}_{i}\}}_{1}^{{n}_{{\rm{traj}}}}\) are constructed, the expectation of a random variable X, defined by Eq. (80), can be approximated

$${\mathbb{E}}(X)\approx \frac{\mathop{\sum }\limits_{i=1}^{{n}_{{\rm{traj}}}}{\int}_{{\bar{\mathcal{T}}}_{i}}X{\rm{d}}l}{\mathop{\sum }\limits_{i=1}^{{n}_{{\rm{traj}}}}{\int}_{{\bar{\mathcal{T}}}_{i}}{\rm{d}}l}$$
(91)

with the line measure \({\rm{d}}l=\sqrt{\left\langle \dot{{\bf{x}}}^{\prime} ,\dot{{\bf{x}}}^{\prime} \right\rangle }{\rm{d}}t=\sqrt{\left\langle \dot{{\bf{y}}}^{\prime} {\bf{E}},\dot{{\bf{y}}}^{\prime} {\bf{E}}\right\rangle }{\rm{d}}t=\sqrt{\dot{{\bf{y}}}^{\prime} {{\bf{E}}}^{2}{\dot{{\bf{y}}}}^{^{\prime} {\rm{T}}}}{\rm{d}}t\). The standard deviation of X is similarly approximated. The violation probability of thermodynamic constraint associated with reaction j is given by

$${P}_{j}\approx \frac{{n}_{{\rm{viol}},j}}{{n}_{{\rm{traj}}}},\ j=1\cdot \cdot m,$$
(92)

where nviol,j is the number of trajectories intersecting the thermodynamic constraint of reaction j, and ntraj the total number of randomly generated trajectories. Expectations, standard deviations, and violation probabilities in Fig. 2 are estimated by generating 50,000 trajectories in the CSS.

Known osmoregulation mechanisms

Under physiological conditions, there is a balance between reactions donating negative charge to (Fig. 4b, orange arrows) and accepting negative charge from (Fig. 4b, brown arrows) the cytoplasmic fluid, furnishing a stable electroneutral buffer for cellular metabolism. For example, in response to osmotic stress, E. coli imports osmoprotectants (e.g., glycine betaine and trehalose) or K+ from the extracellular environment to modulate its intracellular osmolarity, keeping a fixed osmotic pressure across the cell membrane61. The imbalance caused by these charged solutes can be alleviated by upregulating reactions producing counterions or downregulating those consuming counterions.

To maintain electroneutrality, the cell has many degrees of freedom, as to which reactions to upregulate and which ones to downregulate. Importantly, the robustness of E. coli to various environmental stresses suggests that the transcriptional regulatory network performing osmoregulation has evolved to sustain homeostasis with respect to other simultaneous disturbances (e.g., carbon-source changes and pH stress), minimally impacting pathways performing other critical cellular functions.

Many genes in the E. coli genome are involved in osmoregulation. Primary transcriptional regulatory targets include potassium transporters (e.g., trkA and kdpA)27,61, osmoprotectant transporters (e.g., proV)27, carbon-source transporters (e.g., crr and ptsG)31, sodium transporters (e.g., nhaA and nhaB)26, glutamate transporter (e.g., gadC and gltI)28, and the ETC (e.g., atpE and cyoD)29. Secondary responses affect several intracellular pathways, such as amino-acid biosynthesis, osmoprotectant biosynthesis, central-carbon metabolism, and nucleotide biosynthesis31.

Glutamate role in osmoregulation

To elucidate the glutamate-accumulation mechanism by which electroneutrality and pH homeostasis are achieved, we compared the intrinsic hydrogen and charge consumption (see Eqs. (37) and (36)) of the impacted pathways. Interestingly, all major glutamate-biosynthesis pathways (i.e., GLUDy, GLUSy, and GLUDC) simultaneously consume hydrogen ion and produce negative charge with similarly high intrinsic values (ΔrNH ≈ 1 mol-H/mol-rxn and ΔrZ ≈ −1 mol-e/mol-rxn), whereas glutamate antiporters only produce negative charge (ΔrNH ≈ 0 mol-H/mol-rxn and ΔrZ ≈ −1 mol-e/mol-rxn). These features highlight the importance of glutamate antiporters during phase I when the cytoplasmic solution is alkalinized, allowing the cell to rapidly accumulate glutamate and potassium from the extracellular environment, so as to maintain electroneutrality without a significant disruption to pH homeostasis. Therefore, exporting excess glutamate, possibly through mechanosensitive channels (e.g., mscL)26, when the cell is under no stress could be part of a robust regulatory network in bacteria, enabling them to create an extracellular glutamate pool, which can be used to flexibly handle various stress conditions causing ion imbalances.

Biochemical reactions can affect the cytoplasmic charge in two different ways, both of which are captured by our definition of ΔrZ (see “Abiotic constraints” section). Recall, ΔrZ reflects the amount of charge exchange between metabolites and the cytoplasmic fluid for intracellular reactions, while it measures the net charge transfer from the periplasm to the cytoplasm for transport reactions. If the overall cytoplasmic electroneutrality is regarded as a more essential osmoregulation objective than balanced intracellular charge exchanges, then it follows from our analysis that glutamate antiporters play a more significant role in glutamate accumulation during osmoregulation than glutamate-biosynthesis pathways—a conclusion supported by differential expression analyses of RNA-sequencing data.

Flux distribution

To compute extrinsic reaction charge, hydrogen, and magnesium consumptions, the flux distribution across the network is required. We determine the flux state of the reduced network during the exponential growth phase using the genome-scale reconstruction model iML151515. To reliably estimate the metabolic fluxes, we identify the parsimonious solution62 of the resulting FBA model that best matches experimentally measured fluxes14. The parsimonious solution, furnished by minimizing a norm of the flux vector, postulates that the optimal flux state minimizes the total protein cost of the network, assuming that all biochemical reactions have the same protein cost. This objective has been shown to be associated with the flux state of several organisms under various conditions17,38, and, therefore, is adopted in our study. We formulate the FBA model as a bi-level optimization problem, incorporating the foregoing objectives. To simplify our analysis, we performed several numerical experiments and found that the upper-level objective, measuring the distance between the experimental and predicted fluxes, has the largest sensitivity to the upper bounds on the GLGC and NDPK1 fluxes in our reduced network. Accordingly, we determine the flux state by solving

$$\left\{\begin{array}{l}{{\bf{v}}}_{0}^{* }:={\rm{arg}}\ \mathop{\mathrm{min}}\limits_{{\boldsymbol{\theta }},{\bf{v}}}\mathop{\sum}\limits_{j\in {{\mathcal{J}}}^{{\rm{ex}}}}{\left(\frac{{v}_{j}-{v}_{j}^{{\rm{ex}}}}{{\sigma }_{j}^{{\rm{ex}}}}\right)}^{2},\\ {\rm{s.t.}}\hfill\\ {\bf{v}}\in {v}_{1}^{* }({\boldsymbol{\theta }}),\hfill\\ \left\{\begin{array}{l}{v}_{1}^{* }({\boldsymbol{\theta }}):={\rm{arg}}\ \mathop{\mathrm{min}}\limits_{{\bf{v}}}\parallel {\bf{v}}{\parallel }_{2},\\ {\rm{s.t.}}\hfill\\ {\bf{S}}{\bf{v}}={\bf{0}},\hfill\\ {{\bf{v}}}^{{\rm{lb}}}\le {\bf{v}}\le {{\bf{v}}}^{{\rm{ub}}}({\boldsymbol{\theta }}),\hfill\end{array}\right.\end{array}\right.$$
(93)

where S is the stoichiometric matrix of iML1515, \({{\mathcal{J}}}^{{\rm{ex}}}\) the index set of reactions with experimentally measured flux, \({\sigma }_{j}^{{\rm{ex}}}\) the standard deviation reported for the measurement of flux through reaction j, vlb and vub the lower and upper bounds on the flux vector, and θ a vector containing the upper bounds of GLGC and NDPK1. These upper bounds, which are decision variables in the upper level, are linearly expressed in the respective inequality constraints of the lower level. Thus, the lower-level problem of Eq. (93) is a convex quadratic programming problem, which we solve using the quadratic programing solvers of CPLEX. We solve the upper-level problem using gradient-descent methods.

To evaluate the impact of the confidence intervals \({\sigma }_{j}^{{\rm{ex}}}\) on the solutions of Eq. (93), we performed flux computations with and without accounting for \({\sigma }_{j}^{{\rm{ex}}}\) for all four carbons sources and compared the results. Overall, incorporating the confidence intervals into computations did not significantly affect flux values or their directions. However, the flux distribution provided by Eq. (93) for the case where confidence intervals were taken into account generally agreed better (except when pyruvate was the carbon source) with experimental measurements (Supplementary Data 5).

Granted, the procedure that we introduced in this section cannot generally guarantee the correct direction of reactions, for which experimental flux data are unavailable. However, given the size of the reduced network and the number of reactions with flux data, it is unlikely that the enzymatic reactions associated with the reduced network proceed in the opposite direction of their canonical directions determined from the solutions of Eq. (93). Accordingly, the results presented in this work are not expected to be significantly influenced by flux-direction uncertainty.

Role of phosphate transporters

We compared the upper bound on the cytoplasmic-phosphate concentration in mutants, where either the Pit or Pst system is active (Supplementary Fig. S6). We first note that the upper bound on the cytoplasmic-phosphate concentration furnished by the ABC-bases analysis and TMFA coincide at low and high periplasmic concentrations when Pit is active. In contrast, when Pst is active, the upper bound provided by the ABC-based approach is more restrictive than TMFA. This implies that thermodynamics places a more restrictive constraint on the Pit system than the Pst system. On the other hand, we observe from Supplementary Fig. S6 that phosphate import through the Pit system cannot maintain the cytoplasmic-phosphate concentration at a level that is essential for the growth of E. coli. These results indicate that the transition from the Pit to Pst systems with decreasing \([\![ {\rm{pi}}]\!]_{\mathrm{p}}\) is induced by the Pit system becoming thermodynamically infeasible in phosphate-limited environments.

Glycolysis and gluconeogenesis in pH homeostasis

In the main text, we highlighted an important feature of glycolysis with regards to pH homeostasis. Namely, it produces hydrogen ion at a much lower rate than the ETC, whether it operates in forward direction or reverse (due carbon-source change). Here, we examine this feature in more detail.

We computed the flux state of the reduced network (see “Flux distribution” section) using glucose or pyruvate as the sole carbon source. When the cell grows on glucose, glycolytic reactions are active. However, when the carbon source is changed to pyruvate, glycolysis is reversed and gluconeogenetic reactions become active. We used the computed flux distributions to estimate the extrinsic hydrogen production of glycolysis and gluconeogenesis and found that the rate of hydrogen production remains almost unchanged when the carbon source is altered (Supplementary Fig. S7). We found that gluconeogenesis produces hydrogen ion at a slightly higher rate (ΔrNH = −29.08 mmol-H/gDW/h) than glycolysis (ΔrNH = −21.14 mmol-H/gDW/h). This may be attributed to lactate dehydrogenase and lactate exporters, which are active only when pyruvate is the carbon source. These reactions consume hydrogen ion, so they offset the slightly larger hydrogen-ion production of gluconeogenesis.

Our results suggest that the evolution of glycolytic and gluconeogenetic pathways enables E. coli to robustly restore pH homeostasis when it is simultaneously subject to carbon-source-starvation and acid stress.

Pathway classification

We classified reactants and reactions based on the pathway they participate in to contrast the energetics and metabolite distribution across major pathways in the reduced network. The abbreviations used to label these pathways in Supplementary Figs. S2 and S3 are defined in Supplementary Table S3.

Consistency of the ABCs with metabolomic data

To quantify the consistency of the ABCs with metabolomic data, we identified the closets point inside the CSS to measured intracellular concentrations14 when the CSS is defined by (i) thermodynamic constraints alone and (ii) all the ABCs in Eqs. (26)–(31). As expected, the closest point inside the CSS in the first case is better correlated with experimental data than the second case (Supplementary Fig. S12). This is a consequence of the ABC-based analysis being more restrictive than TMFA because there is more freedom in the feasible set of the optimization problem, in which to search for the best fit.

Confidence intervals for Gibbs energy of reactions

To impose the thermodynamic constraints given by Eq. (31), we estimated the reference Gibbs energy of formation of reactants using group contribution methods. These formation energies are subject to uncertainties that are bounded by confidence intervals accompanying group-contribution estimates. Confidence intervals for Gibbs energies were not accounted for in Eq. (31). They were also neglected when computing the feasible ranges of metabolite concentrations and  transformed Gibbs energy of reactions in Figs. 2 and 3. Confidence intervals can be incorporated into the formulation of thermodynamic constraints according to

$${{{\Delta }}}_{r}{G}_{j}^{^{\prime} \circ }+RT{\mathrm{ln}}\,{{{\Gamma }}}_{j}+{\delta }_{j}\le 0,\ j=1\cdot \cdot m,$$
(94)
$$-{\delta }_{r}{G}_{j}^{^{\prime} \circ }\le {\delta }_{j}\le {\delta }_{r}{G}_{j}^{^{\prime} \circ },\ j=1\cdot \cdot m.$$
(95)

Here, \({\delta }_{r}{G}_{j}^{^{\prime} \circ }\) is the standard deviation of errors associated with group-contribution computation of the standard transformed Gibbs energy of reaction j10, and δj are auxiliary variables introduced into the ABC-based framework to account for these errors.

Inclusion of confidence intervals in the ABC-based analysis relaxes the feasible ranges of metabolite concentrations and transformed Gibbs energy of reactions. Specifically, it increases the upper bounds and decreases the lower bounds. We repeated the computations of the feasible ranges reported in Figs. 2 and 3 with confidence intervals and observed no significant difference in the upper and lower bounds. Accordingly, all the phenotypic consequences of the ABCs discussed in the main text remained unchanged. For example, the ABC-based analysis predicts three more irreversible reactions (i.e., GLUN, GLUSy, NH4tpp) than TMFA when confidence intervals are included (Supplementary Fig. S13). We also compared the extent, to which the inclusion of confidence intervals could relax the upper and lower bounds provided by the ABC-based analysis and TMFA and found that the predictions of TMFA were more sensitive to confidence intervals than those of the ABC-based analysis (Supplementary Table S4).

Reporting summary

Further information on research design is available in the Nature Research Reporting Summary linked to this article.