ProteInProgress: a cellular assembly line for protein manufacturing

Motivation Solutions
Implementation & Results



Proteins play an essential role in most of the life processes, such as biocatalysis, metabolic reactions, cell signaling, immune response, cell adhesion and cell cycle regulation. The enzyme industry is in continuous expansion and enables the production of proteins for therapeutic, vaccines and agricultural manufacturing. Genetic and protein engineering play an essential role in these processes, allowing the massive production of the desired protein, thus meeting the abundant market demand. Escherichia coli is the most commonly used host for recombinant protein production (about 40% of recombinant proteins are produced by using E. coli), because of its rapid growth, its high yields of expression, the ease of propagation and DNA manipulation and the high cost-effectiveness for mass production. This bacterium is mainly used to produce proteins limited in size and with relatively simple structure: in fact, there are still many drawbacks in using E. coli as an expression host for proteins with disulfide bonds, glycosylated proteins or proteins to be secreted (its natural secretion mechanism is still not fully understood). Yeasts are often used to produce recombinant proteins that are not produced well in E. coli because of problems dealing with dimensions (generally, proteins that are larger than 100 kD are expressed in a eukaryotic system while those smaller than 30 kD are expressed in a prokaryotic system), folding (protein misfolding in E. coli often leads to inclusion bodies formation), glycosylation or secretion (S. cerevisiae can secrete heterologous proteins into the extracellular broth when proper signal sequences have been fused to the recombinant protein). However, using the recombinant DNA techniques, E. coli has also been successfully employed for production of relatively complex proteins and progress over the recent years has widened the use of this organism even further. Many expression vectors for E. coli are commercially available and widely used to over-express the peptide of interest. A well-designed expression vector is composed by a set of optimally configured genetic elements that affect both transcriptional and translational aspects of protein production:

  • the replicon of plasmid, containing the origin of replication, that controls plasmid copy number in the cell,
  • the resistance marker, to allow the selection of the cells expressing the protein,
  • the transcriptional promoter and the ribosome binding site (RBS), controlling the expression level of the protein.
Expression vector for the production of recombinant proteins


In order to achieve high expression levels, heterologous proteins are often cloned in vectors replicating thanks to the p15A or the ColE1 replicons, which allow them to replicate in the cells at medium or high copy number (from 15–20 to few hundreds copies) respectively. When co-overexpression of additional gene products is desired, derivatives of ColE1 and p15A replicons are often combined. These multiple-copy plasmids are stably replicated and maintained by the cell under selective pressure and plasmid-free daughter cells are infrequent. In the absence of selective pressure, plasmid loss frequency remains low (10-5/10-6 per generation) for medium copy number plasmids, but can increase significantly for high and very high copy number plasmids. The simplest way to exert a selective pressure is to provide the plasmid with a sequence encoding for a resistance marker. Usually an antibiotic resistance is used, such as Beta-lactamase from bla gene for Amipicillin resistance, cat gene encoding Chloramphenicol resistance. Other well known selection markers are Kanamycin and Tetracycline resistances. This approach is commonly used, but for some applications it is not desirable to add antibiotic to the culture for the following reasons:

  • cost of antibiotic to be used for medium supplementation;
  • loss of selective pressure as a result of antibiotic degradation or inactivation due to the leakage of detoxifying enzymes in the growth medium;
  • interference with protein synthesis by binding to critical areas of the ribosome;
  • contamination of the product with antibiotic, not acceptable for therapeutic or medical applications (for many applications, antibiotics are normally removed readily during product purification, but validation of removal is necessary and may be expensive);
  • expression of antibiotic resistance can result in slower growth rate and higher metabolic burden for cells;
  • the retention of antibiotic resistance in the host is potentially unsafe for the environment, especially in high-volume industrial applications.

Many solutions to these drawbacks have been proposed, such as expression of plasmid-encoded genes that causes cell death upon plasmid loss (e.g. the Toxin-Antitoxin systems).

A radically different solution to the problem of plasmid loss is the direct insertion of the desired part within the chromosome of E. coli. This approach was attempted with different strategies, such as using simple delivery vehicles (e.g. bacteriophage lambda) or exploiting the Tn1545 site specific recombination module to randomly integrate a gene in the host strain, obtaining a library of clones containing the gene integrated in different positions. These solutions avoid instability problems: in fact, chromosomally based expression usually provides genetically stable organisms and acceptable expression levels. This strategy may be compromised if essential loci in the chromosome are disrupted. Chromosomal integration also avoids the need of supplementing culture medium with antibiotic. A different approach was proposed in literature, based on genome targeting systems that use vectors carrying a conditional-replication origin and a phage attachment (attP) site. These plasmids, named CRIM (Conditional-Replication, Integration and Modular) can be used for the construction of the desired expression systems that can be directly integrated into bacterial chromosomes in single copies. Recently, the essential elements of this system have been refactored in standard parts and improved in order to get a more robust method [Anderson JC et al., 2010]. The performance of the CRIM systems has been proved, but a standard user-friendly solution for the stable integration of the desired genes in the host genome is not yet available.


Promoters are small DNA portions that allow the transcription of downstream genes. These DNA parts can be catalogued by their transcriptional strength, by their activity or by their regulation mode. The transcriptional strength is a very important parameter to be evaluated, because it determines the amount of produced protein. For this reason, many efforts were performed in synthetic biology towards the definition of standard measurement techniques that can evaluate the promoter activity.

The kind of regulation is another crucial element to evaluate, because it might affect the yield of the whole process of protein production:

  • constitutive promoters are always active and the transcription of the downstream genes and their activity doesn’t depend on any external signal;
  • inducible promoters are characterized by an activity dependent on exogenous or endogenous signals, such as the addition of an inducer molecule to the growth medium or the exposition to a physical stimulus, that activate the transcription, which is normally off. A primary necessity in protein manufacturing is that a promoter should be simple and inexpensive to induce.

For protein manufacturing, many expression systems are available, with several regulation mechanisms, to suite every necessity. Usually, in order to achieve high yields in the amount of produced protein, it is preferable to use inducible expression systems. The culture of cells harbouring the plasmid encoding for the desired peptide is grown undisturbed (i.e. the expression of the recombinant protein is off) until it reaches a desired culture density. When the cell population reaches the desired growth phase, the production is triggered by the inducer stimulus. This solution may be preferred to a constitutive production, because it eases the burden on the organisms, allowing the cultures to grow without metabolic stress before initiating production and prevents the lowering of growth rates if the produced peptide is toxic to the cell. In these cases, a tightly regulation is fundamental, because leakage activity can result in an early overproduction of the heterologous protein, due to a non-silent promoter. This might impair cell growth. It is therefore desirable to be able to repress the promoter during the initial cell growth phase, in order to achieve high cell densities, after which the high-rate protein production would be initiated by induction of the promoter. For these applications, promoter repressor signals are used to maintain the promoter activity off during the initial growth phase (e.g. in the DE3/pLys system where the T7 phage lysozyme, an inhibitor for T7 polymerase, reduces and almost eliminates the leaky expression of the inducible T7 promoter).

As already explained, expression initiation by promoter induction can be obtained in several ways. For laboratory-scale production the isopropyl beta-D-thiogalactopyranoside (IPTG)-inducible promoters, which are regulated by the product of the lacI gene (the lac repressor), are widely used. A disadvantage with some of these promoters is that they are not completely unactive under non-induced conditions, and thus are not suitable if the target-gene product is toxic for the host. Lactose has been shown to be an inexpensive, but somewhat weaker, alternative for induction of the lac promoter in some applications. For large-scale cultivations, either the trp promoter or heat-induced promoters are commonly used. Systems using the Plambda promoter/cI repressor, Trc promoter, Tac promoter and hybrid lac/T7 promoters are common.

Self-inducible systems which trigger gene expression upon carbon- or phosphate- starvation are also used. The drawback of this kind of systems is that they are usually not titratable, the media options are limited and the protein production may be limited to a time interval in which part of the nutrients is already depleted.

Despite these approaches are largely used in laboratories, where a small amount of an expensive inducer is not a problem, it becomes quite expensive to obtain the desired regulation control when these approaches are moved to the large scale of an industrial process.


Another topic of recombinant protein production deals with the purification of products: in fact a highly purified and well-characterized form has become a major task in this field. On the other hand, the goal of a rapid and economical purification of recombinant proteins represents a persistent challenge in the field of biotechnology: in fact, purification is generally very costly and can represent up to 80% of the total production costs. A large number of commercial vectors that facilitate single step purification via the use of different fusion tags have been developed. These systems are based on the fusion of the target protein with affinity tags that allow one-step absorption purification, minimally affecting the tertiary structure and the biological activity of the protein. Some commonly used tags are, for example, Polyarginine-tag (Arg-tag), Polyhistidine-tag (His-tag) or FLAG-tag. They allow easy and specific removal to produce the native protein and can be applied to several different proteins.

The removal of the tag from a protein of interest can be accomplished with a site-specific protease, and cleavage should not affect protein activity. Several drawbacks affect this purification step: in fact, the proteases are expensive, the cleavage is not always specific, elevated temperatures are required for many proteolytic cleavage reactions and may affect protein stability or activity. Furthermore, the cleavage is sometimes inefficient, due to the inaccessibility of the cleavage site on the fusion protein or additional chromatographic steps could be required to separate the target protein from the affinity tag and the protease.

Another important drawback affecting the purification process is the high cost of the affinity resins that are typically used in separations.

For all these reasons, many researchers are working on novel purification methods in order to improve the process yield and lower the manufacturing overall cost.