Saturday, 22 November 2025

Working with proteins

Proteome

The entire complement of proteins, expressed by an organism, cell, or tissue at a particular time is called proteome. It's much more complex and dynamic than the genome.

The proteome is not static; it is highly dynamic and varies significantly because it represents the actual functional expression of genetic information. The specific set of proteins expressed is constantly changing based on factors like:

· Cell Type (e.g., muscle cell vs. nerve cell)

· Developmental Stage (e.g., embryo vs. adult)

· Environmental Conditions (e.g., $pH,$ temperature, or the presence of hormones or nutrients).

Protein Isolation and Purification

Extraction from Cells

Lysis: Breaking open the cell membrane (and cell wall, if present) using physical methods (e.g., sonication, French press, homogenization) or chemical methods (e.g., detergents like Triton X-100).
Centrifugation: Separating the soluble protein fraction (supernatant) from insoluble debris (pellet).

Extra Steps for Plant Tissue:

· Grinding tissue in liquid nitrogen to powderize and deactivate proteases.

· Using high-salt buffers or chaotropic agents to disrupt strong cell walls and release bound proteins.

· Adding polyphenol oxidase inhibitors (e.g., PVPP) to prevent protein damage by phenols.

Salting In and Salting Out

Salting In: Proteins have surface charges that, in the absence of salt, can lead to unfavorable protein-protein aggregation and precipitation. At low salt concentrations (e.g., NaCl, KCl), the added ions shield these charges, reducing inter-protein attraction, increasing protein-solvent interaction, and thereby increasing solubility.
Salting Out: At high ionic strength (high salt concentration, typically with ammonium sulfate), salt ions compete with proteins for water molecules (hydration shell). This effectively reduces the water available to solvate the proteins, causing increased hydrophobic-hydrophobic interactions between proteins, leading to aggregation and precipitation (fractionation). Different proteins precipitate at different salt concentrations.

Dialysis

A technique to remove small molecules (like salts or detergents) from a protein solution based on size.

Mechanism:

o A protein solution is placed inside a semi-permeable membrane (dialysis bag) with a defined Molecular Weight Cut-Off (MWCO).

o The bag is immersed in a large volume of dialysis buffer (dialysate).

o Small molecules diffuse freely across the membrane down their concentration gradient until equilibrium is reached, while large proteins are retained inside the bag. Repeated changes of the dialysate buffer efficiently remove the small contaminants or exchange the buffer system.

Protein Chromatography Techniques

Gel Filtration / Size Exclusion Chromatography (SEC)

· Principle: Separation based on hydrodynamic radius (molecular size and shape).

· Stationary Phase: Inert, porous beads (e.g., cross-linked dextran or agarose, like Sephadex or Sepharose) with a defined range of pore sizes.

· Mechanism:

o The total volume of the column is summation of the void volume (volume outside the beads, V_o), the inner volume (volume inside the pores, V_i), and the gel matrix volume (V_g).

o Large proteins are completely excluded from the pores, travel only through the void volume (V_o), and elute first (they have the smallest elution volume, $V e \approxV o).$

o Small proteins can fully enter the pores, travel the longest path ( $V e \approx V o + V i$ ), and elute last.

o Proteins of intermediate size are partially excluded.

· Key Application: Determining the molecular weight of a native protein (by comparing $V e$ to known standards) and separating proteins from small molecules (like salts/dyes).

· Elution Volume ( $V e$ ): The volume of mobile phase required to elute a specific protein. The relationship between $log(MW) and Ve is linear within the fractionation range of the column.$

Ion Exchange Chromatography (IEX)

· Principle: Separation based on the net electrical charge of the protein, which is determined by the buffer $pH$ relative to the protein's isoelectric point ( $pI$ ).

· Stationary Phase: An insoluble polymer matrix with covalently attached charged functional groups (the ion exchanger).

· Mechanism: The binding of the protein to the column is an electrostatic interaction (ionic bond).

o Anion Exchanger (e.g., DEAE-cellulose): Has a positive charge; binds negatively charged proteins. Used when the $pH$ of the buffer is $> pI$ (protein is anionic).

o Cation Exchanger (e.g., CM-cellulose): Has a negative charge; binds positively charged proteins. Used when the $pH$ of the buffer is $< pI$ (protein is cationic).

· Elution: Proteins are released (eluted) by disrupting the electrostatic bond, typically by:

o Increasing the salt concentration ( $NaCl$ or $KCl): Salt ions compete with the protein for binding to the resin. Proteins with the lowest net charge elute first.$

o Changing the $pH$ of the buffer to alter the net charge of the protein or the resin.

Affinity Chromatography

· Principle: Highly specific separation based on biological specificity (a specific, reversible non-covalent binding) between the protein of interest and a specialized ligand.

· Stationary Phase: An insoluble matrix to which the ligand (e.g., substrate analog, inhibitor, antibody, metal ion) is covalently attached.

· Mechanism:

o Loading and Washing: Only the target protein binds specifically to the immobilized ligand. All other non-binding proteins are washed away.

o Elution: The target protein is released by methods that disrupt the specific protein-ligand interaction:

§ Competitive Elution: Adding a high concentration of the free ligand in the mobile phase, which competes for the protein's binding site.

§ Non-Specific Elution: Changing the $pH$ or ionic strength (e.g., high salt) to destabilize the binding.

· Key Example: IMAC (Immobilized Metal Affinity Chromatography): Used for His-tagged proteins. The tag binds to immobilized metal ions ( $Ni 2+$ or $Co 2+). Elution is done with high concentrations of imidazole, which competitively binds to the metal ions.$

High-Performance Liquid Chromatography (HPLC)

· Description: An advanced, highly precise form of column chromatography utilizing high pressure to pump the mobile phase through densely packed columns.

· Key Characteristics:

o Finer Stationary Phase: Uses very small, uniform particles (typically 3–5 $µm$ ), which significantly increases the surface area and efficiency.

o High Pressure: Requires high-pressure pumps (≥ $5,000 psi$ ) to overcome the flow resistance caused by the tightly packed column.

o High Resolution: Provides superior separation quality and narrower peaks.

o Fast Separation: Enables quick analysis due to rapid flow and high efficiency.

· Application: Often used for analytical protein and peptide separation, particularly in Reverse-Phase HPLC (RP-HPLC), where peptides are separated based on their hydrophobicity using a non-polar stationary phase and a polar-to-non-polar solvent gradient.

Monday, 17 November 2025

Protein Folding

Protein Folding: Sequence, Structure, and Pathway

The fundamental question of how the amino acid sequence of a protein determines its unique three-dimensional structure (conformation) is central to biochemistry. The native structure is the most thermodynamically stable form.

1. Determining Secondary Structure

Secondary structures are local, regularly repeating conformations like the α-helix and β-strand.

Amino Acid Preferences: Certain amino acid residues have preferences for specific secondary structures, revealed by their frequency of occurrence in known proteins.

α-Helix Promoters: Alanine, Glutamate, Leucine.
β-Strand Promoters: Valine, Isoleucine.
Turn/Loop Promoters: Glycine, Asparagine, Proline.

Steric and Structural Constraints: These preferences are based on the compatibility of the residue's side chain with the local geometry:

Steric Clash: Residues with branching at the β-carbon (Val, Thr, Ile) destabilize α-helices due to clashes but are easily accommodated in β-strands.
Competing H-Bonds: Serine and Asparagine often disrupt α-helices because their polar side chains compete for main-chain hydrogen bonding groups.
Proline: Disrupts both α-helices and β-strands because it lacks a main-chain NH group (essential for H-bonding) and its cyclic structure severely restricts its geometry.
Glycine: Its high flexibility makes it well-suited for tight reverse turns.

Prediction Difficulty: Predicting secondary structure based only on local sequence preferences is difficult (only 60–70% accurate) because:

Preferences are marginal (e.g., Glutamate prefers α-helix by only a factor of three).
Tertiary Interactions (interactions between residues far apart in sequence) are often decisive in stabilizing the final structure of a segment.
Context is crucial; the environment provided by the rest of the protein is essential.

2. Protein Folding as a Cooperative Transition

Denaturation and Unfolding: Proteins can be denatured (unfolded) by disrupting the weak non-covalent bonds that stabilize the structure using heat or chemical agents (e.g., urea).

"All or None" Process: The transition between the folded (native) and unfolded (denatured) states, observed when changing denaturant concentration, is sharp, suggesting an "all or none" or cooperative transition.

Cooperative Folding Rationale: The stability of one part of the structure depends on interactions with the rest of the protein. If one region begins to unfold, the loss of its stabilizing interactions destabilizes the surrounding structure, leading to a rapid, complete unraveling of the entire protein.

Molecular Reality: Although the solution appears to be a mixture of only fully folded and fully unfolded molecules (a two-state model) at the transition midpoint, protein folding cannot occur in a single step. Unstable, transient intermediate structures must exist between the two states.

3. The Folding Paradox and Pathway

Levinthal's Paradox: A random, exhaustive search of all possible conformations for even a small 100-residue protein would take astronomically long (1.6 x 10²⁷ years). Since small proteins fold in milliseconds, they cannot sample every conformation.

This paradox proves that folding must follow a defined pathway with intermediates.

Cumulative Selection: The solution to the paradox is that proteins fold by the principle of cumulative selection—partly correct intermediate structures are retained (stabilized by free energy) and used as a foundation for further folding.

Nucleation-Condensation Model (Energy Funnel):

Mechanism: Local regions with strong structural preferences rapidly form an initial structure (nucleation), which then guides and stabilizes the formation of additional structure (condensation).

Energy Funnel: The folding process can be visualized as moving down an energy funnel. The wide rim represents the high-energy, unfolded state with many conformations. As the protein folds, free energy decreases, and the accessible conformational space narrows, leading to the bottom: the low-energy, unique native state.

4. Exceptions to the Single-Structure Paradigm

While the general paradigm is that a sequence specifies one structure, exceptions exist:

Intrinsically Unstructured Proteins (IUPs): These proteins (or regions, often rich in charged/polar residues) lack a discrete 3D structure under physiological conditions. They assume a defined structure only upon interacting with a specific molecular partner. This versatility allows a single protein to interact with multiple partners, performing different functions (especially in signaling).

Metamorphic Proteins: These proteins exist as an ensemble of two or more distinct structures (conformations) that are in equilibrium and have approximately equal energy.

Example: The chemokine lymphotactin exists in equilibrium between a receptor-activating chemokine structure and a β-sheet dimer structure that binds carbohydrates. Both distinct structures have mutually exclusive but required functions.

5. Approaches to Structure Prediction

Predicting the native 3D structure from the amino acid sequence is an unsolved challenge, with two main approaches:

A. Homology Modeling (Comparative Modeling)

Principle: This is the most accurate and reliable method when applicable. It is based on the evolutionary principle that proteins with similar amino acid sequences share very similar 3D structures (structure is more conserved than sequence).³
Applicability: Used when the target protein has a high degree of sequence identity (at least $>30\%$ ) with one or more proteins of known structure, called templates, in the Protein Data Bank (PDB).
Key Steps:
1. Template Recognition and Selection: Search the PDB for homologous proteins with known structures (templates) using tools like BLAST.
2. Target-Template Alignment: Generate a sequence alignment between the target protein and the template(s). The quality of this alignment is crucial.
3. Model Construction (Backbone & Loops): Copy the backbone coordinates from the template to the target for regions of high sequence similarity. Highly divergent regions, especially loop regions and insertions/deletions, are modeled using separate algorithms (e.g., ab initio methods for short loops or knowledge-based methods).
4. Side-chain Modeling: Predict the conformation (rotamers) of the side chains for residues in the target.
5. Model Refinement and Validation: Minimize the model's energy and evaluate its quality using stereochemical checks and environmental fitness scores.

B. Protein Threading (Fold Recognition)

Principle: This approach exploits the observation that the number of distinct protein folds in nature is relatively small (estimated to be around 1,300–2,000). A protein's sequence might adopt a known fold even if the sequence similarity to known structures is low.
Applicability: Used when the sequence identity between the target and any known structure is low (the "twilight zone," typically $<20\%$ ), but the protein is likely to adopt a common fold.
Key Steps:
1. Template Library: Use a database of representative protein folds (templates).
2. Threading/Alignment: "Thread" (fit) the target protein's sequence onto the backbone of each structural template in the library. This involves creating a sequence-to-structure alignment.
3. Scoring Function: Evaluate the fit using a sophisticated scoring function (or potential function) that measures the compatibility of the target sequence with the template structure's environment. This score typically includes terms for:
  - Local environment fitness (e.g., preference for a specific residue in a particular secondary structure/solvent accessibility).
  - Pairwise residue-residue interactions (contacts).
  - Sequence similarity (though weighted less than structure features).
4. Fold Selection: The fold that yields the best (lowest energy or highest compatibility) score is selected as the predicted structure.

C. Ab Initio Modeling (Template-Free Modeling)

Principle: Predicts the 3D structure from the amino acid sequence alone, based on physical principles like energy minimization (Anfinsen's dogma: the native structure is the global minimum of the free energy). This is also known as de novo prediction.
Applicability: Used for proteins with entirely novel folds (no detectable homology or known fold in the PDB). This is the most computationally demanding method and is generally reserved for small proteins (under ~100 residues).
Key Concepts:
- Levinthal's Paradox: The astronomically large number of possible conformations a polypeptide chain can adopt, making a complete search impossible. Ab initio methods seek efficient search strategies.
- Energy Landscape: The concept that protein folding follows an energetically favorable pathway toward the native state, which is the global energy minimum.
Computational Methods:
- Conformational Search: Algorithms like Monte Carlo (MC) simulations or Molecular Dynamics (MD) simulations are used to explore the conformational space.
- Fragment Assembly: A popular technique (e.g., used by ROSETTA). The target sequence is broken down into small fragments, and the best-fitting fragments from known structures are assembled to generate candidate models.
- Deep Learning (e.g., AlphaFold/AlphaFold 2): Modern advancements have revolutionized this area. These tools use neural networks trained on large datasets to accurately predict inter-residue distances and contacts, which significantly constrains the possible 3D structures, effectively solving a major part of the ab initio problem.

Feature	Homology Modeling	Protein Threading	Ab Initio Modeling
Template Requirement	High sequence identity template needed.	Template of the correct fold (low sequence identity) needed.	No template required.
Sequence Identity	High (>30%)	Low (<20%, "twilight zone")	Very low or None (novel folds)
Accuracy	Highest (Near-experimental quality if identity >50%)	Moderate (Good fold, variable side-chain accuracy)	Variable (Generally lower, but high with AlphaFold)
Computational Cost	Lowest	Moderate to High	Highest (Traditionally)
Basis	Evolutionary conservation of structure/sequence.	Small, finite number of protein folds.	Physical/chemical principles (energy minimization).

Significance of CASP

The Critical Assessment of protein Structure Prediction (CASP) experiment is a community-wide, biennial competition that objectively evaluates the performance of different prediction methods. It has been instrumental in driving progress in the field, culminating in the groundbreaking performance of DeepMind's AlphaFold 2 in recent years, which demonstrated near-experimental accuracy for many targets, effectively closing the gap between prediction and experiment for most template-based and many ab initio cases.

Disulfide Bridges and Protein Conformation

Disulfide bridges (or bonds) are crucial covalent cross-links in many proteins, playing a vital role in determining, stabilizing, and sometimes changing their three-dimensional structure (conformation) and, consequently, their function.

1. Role in Determining and Stabilizing Structure

· Covalent Stabilization: Disulfide bridges are formed by the oxidation of the sulfhydryl (-SH) groups of two nearby Cysteine amino acid residues, resulting in a stable bond (-S-S-) between them (forming a Cystine residue).

· Fixing Conformation: Unlike weaker non-covalent interactions (like hydrogen bonds or van der Waals forces), the covalent nature of the disulfide bridge provides a permanent constraint that locks parts of the polypeptide chain into a specific spatial arrangement. This greatly increases the stability of the protein's native, active structure against denaturing conditions (e.g., heat or mild chemical changes).

· Sequence Determines Bridge Location: The formation of the correct disulfide pairings is typically guided by the overall folding process, which is dictated by the protein's primary amino acid sequence (as demonstrated by Anfinsen). Only one specific arrangement of bridges (out of many possibilities) yields the biologically active conformation.

2. Impact of Cleavage (Reduction)

· Denaturation and Loss of Stability: Treating a protein with a strong reducing agent (e.g., β-mercaptoethanol) cleaves the disulfide bonds back into two free sulfhydryl groups.

· Conformational Collapse: This cleavage removes the major covalent cross-links stabilizing the structure. If simultaneously treated with agents that disrupt non-covalent bonds (like urea), the protein unfolds from its ordered native state into a random coil—a process called denaturation.

· Functional Inactivation: The loss of the specific three-dimensional structure results in the complete loss of biological activity (e.g., an enzyme becomes inactive).

3. Impact of Oxidation (Re-formation)

· Renaturation: When a fully reduced and unfolded polypeptide is allowed to re-oxidize under appropriate conditions (e.g., without denaturing agents like urea), the protein can spontaneously refold into its original native conformation.

o The correct folding pathway ensures the correct, native disulfide pairings are reformed.

o This confirms that the information required to specify the active structure is inherent in the amino acid sequence.

· Mis-folding and Scrambling: If the reduced polypeptide is oxidized while it is prevented from folding correctly (e.g., by the presence of a strong denaturant), the disulfide bonds form randomly, creating incorrect or scrambled pairings.

o These mis-paired structures are generally inactive or have severely reduced activity.

· Thermodynamic Correction: Scrambled proteins can often be corrected. Trace amounts of a reducing agent can catalyze the rearrangement of the incorrect disulfide bonds. This process is driven by thermodynamics, as the protein spontaneously converts from unstable, scrambled conformations to the most stable, native conformation with the correct bridges.

Protein Misfolding and Diseases

Protein misfolding, where a protein fails to acquire or maintain its native 3D structure, is linked to a variety of diseases known as amyloidoses.

1. Amyloidosis:

· Diseases characterized by the deposition of insoluble protein aggregates called amyloid fibrils or plaques.

· Normally soluble proteins are converted into insoluble forms that are rich in $β$ -sheets.

· Misfolded proteins aggregate and act as a seed (nucleation site), forcing other correctly folded proteins to adopt the incorrect, aggregated conformation.

2. Prion Diseases (Transmissible Spongiform Encephalopathies)

· Infectious neurological diseases transmitted by prions—agents composed solely of a misfolded protein.

· Examples: Mad cow disease (BSE), Creutzfeldt-Jakob disease (CJD).

· Prion Conversion:

o Normal Form ( $PrP C$ ): Rich in $α$ -helices.

o Infectious Form ( $PrP Sc$ ): Has converted to a structure rich in $β$ -strands that aggregate into fibers.

· Transmission: Transfer of the $PrP Sc$ aggregate seed spreads the disease.

3. Non-Infectious Amyloidoses

· Alzheimer Disease: Associated with amyloid plaques composed of the $β$ -strand rich polypeptide Aβ.

· Parkinson Disease: Also involves neurotoxic protein aggregates.

· Toxicity: Small aggregates (oligomers) of the misfolded proteins are currently hypothesized to be the primary culprits, possibly by damaging cell membranes.

4. Post-Translational Modifications (PTMs)

Protein function is often augmented by chemical modifications applied after synthesis:

Modification	Purpose/Function	Example
Phosphorylation	Reversible "on/off" switch for regulation.	Addition to Ser, Thr, or Tyr for signal transduction (e.g., insulin).
Hydroxylation	Stabilizes structure.	Addition to Proline stabilizes collagen (vitamin C required).
Glycosylation	Makes proteins hydrophilic; aids signaling.	Adding carbohydrate units to cell-surface or secreted proteins.
Cleavage/Trimming	Activates function or creates hormones.	Inactive digestive enzymes are activated by cleavage.

5. Protein Cleavage (Proteolytic Processing)

Many proteins require cleavage and trimming of the polypeptide chain after synthesis to become active or functional:

· Activation: Digestive enzymes are synthesized as inactive precursors (zymogens) and are activated by cleavage after release into the intestine.

· Structure/Function: In blood clotting, soluble fibrinogen is converted into insoluble fibrin via peptide-bond cleavage.

· Hormones/Viral Proteins: Many polypeptide hormones and viral proteins are produced by the cleavage of a single, large precursor polyprotein.