LET'S LEARN PLANTS: PG: Biochemistry

Protein Folding: Sequence, Structure, and Pathway

The fundamental question of how the amino acid sequence of a protein determines its unique three-dimensional structure (conformation) is central to biochemistry. The native structure is the most thermodynamically stable form.

1. Determining Secondary Structure

Secondary structures are local, regularly repeating conformations like the α-helix and β-strand.

Amino Acid Preferences: Certain amino acid residues have preferences for specific secondary structures, revealed by their frequency of occurrence in known proteins.

α-Helix Promoters: Alanine, Glutamate, Leucine.
β-Strand Promoters: Valine, Isoleucine.
Turn/Loop Promoters: Glycine, Asparagine, Proline.

Steric and Structural Constraints: These preferences are based on the compatibility of the residue's side chain with the local geometry:

Steric Clash: Residues with branching at the β-carbon (Val, Thr, Ile) destabilize α-helices due to clashes but are easily accommodated in β-strands.
Competing H-Bonds: Serine and Asparagine often disrupt α-helices because their polar side chains compete for main-chain hydrogen bonding groups.
Proline: Disrupts both α-helices and β-strands because it lacks a main-chain NH group (essential for H-bonding) and its cyclic structure severely restricts its geometry.
Glycine: Its high flexibility makes it well-suited for tight reverse turns.

Prediction Difficulty: Predicting secondary structure based only on local sequence preferences is difficult (only 60–70% accurate) because:

Preferences are marginal (e.g., Glutamate prefers α-helix by only a factor of three).
Tertiary Interactions (interactions between residues far apart in sequence) are often decisive in stabilizing the final structure of a segment.
Context is crucial; the environment provided by the rest of the protein is essential.

2. Protein Folding as a Cooperative Transition

Denaturation and Unfolding: Proteins can be denatured (unfolded) by disrupting the weak non-covalent bonds that stabilize the structure using heat or chemical agents (e.g., urea).

"All or None" Process: The transition between the folded (native) and unfolded (denatured) states, observed when changing denaturant concentration, is sharp, suggesting an "all or none" or cooperative transition.

Cooperative Folding Rationale: The stability of one part of the structure depends on interactions with the rest of the protein. If one region begins to unfold, the loss of its stabilizing interactions destabilizes the surrounding structure, leading to a rapid, complete unraveling of the entire protein.

Molecular Reality: Although the solution appears to be a mixture of only fully folded and fully unfolded molecules (a two-state model) at the transition midpoint, protein folding cannot occur in a single step. Unstable, transient intermediate structures must exist between the two states.

3. The Folding Paradox and Pathway

Levinthal's Paradox: A random, exhaustive search of all possible conformations for even a small 100-residue protein would take astronomically long (1.6 x 10²⁷ years). Since small proteins fold in milliseconds, they cannot sample every conformation.

This paradox proves that folding must follow a defined pathway with intermediates.

Cumulative Selection: The solution to the paradox is that proteins fold by the principle of cumulative selection—partly correct intermediate structures are retained (stabilized by free energy) and used as a foundation for further folding.

Nucleation-Condensation Model (Energy Funnel):

Mechanism: Local regions with strong structural preferences rapidly form an initial structure (nucleation), which then guides and stabilizes the formation of additional structure (condensation).

Energy Funnel: The folding process can be visualized as moving down an energy funnel. The wide rim represents the high-energy, unfolded state with many conformations. As the protein folds, free energy decreases, and the accessible conformational space narrows, leading to the bottom: the low-energy, unique native state.

4. Exceptions to the Single-Structure Paradigm

While the general paradigm is that a sequence specifies one structure, exceptions exist:

Intrinsically Unstructured Proteins (IUPs): These proteins (or regions, often rich in charged/polar residues) lack a discrete 3D structure under physiological conditions. They assume a defined structure only upon interacting with a specific molecular partner. This versatility allows a single protein to interact with multiple partners, performing different functions (especially in signaling).

Metamorphic Proteins: These proteins exist as an ensemble of two or more distinct structures (conformations) that are in equilibrium and have approximately equal energy.

Example: The chemokine lymphotactin exists in equilibrium between a receptor-activating chemokine structure and a β-sheet dimer structure that binds carbohydrates. Both distinct structures have mutually exclusive but required functions.

5. Approaches to Structure Prediction

Predicting the native 3D structure from the amino acid sequence is an unsolved challenge, with two main approaches:

A. Homology Modeling (Comparative Modeling)

Principle: This is the most accurate and reliable method when applicable. It is based on the evolutionary principle that proteins with similar amino acid sequences share very similar 3D structures (structure is more conserved than sequence).³
Applicability: Used when the target protein has a high degree of sequence identity (at least $>30\%$ ) with one or more proteins of known structure, called templates, in the Protein Data Bank (PDB).
Key Steps:
1. Template Recognition and Selection: Search the PDB for homologous proteins with known structures (templates) using tools like BLAST.
2. Target-Template Alignment: Generate a sequence alignment between the target protein and the template(s). The quality of this alignment is crucial.
3. Model Construction (Backbone & Loops): Copy the backbone coordinates from the template to the target for regions of high sequence similarity. Highly divergent regions, especially loop regions and insertions/deletions, are modeled using separate algorithms (e.g., ab initio methods for short loops or knowledge-based methods).
4. Side-chain Modeling: Predict the conformation (rotamers) of the side chains for residues in the target.
5. Model Refinement and Validation: Minimize the model's energy and evaluate its quality using stereochemical checks and environmental fitness scores.

B. Protein Threading (Fold Recognition)

Principle: This approach exploits the observation that the number of distinct protein folds in nature is relatively small (estimated to be around 1,300–2,000). A protein's sequence might adopt a known fold even if the sequence similarity to known structures is low.
Applicability: Used when the sequence identity between the target and any known structure is low (the "twilight zone," typically $<20\%$ ), but the protein is likely to adopt a common fold.
Key Steps:
1. Template Library: Use a database of representative protein folds (templates).
2. Threading/Alignment: "Thread" (fit) the target protein's sequence onto the backbone of each structural template in the library. This involves creating a sequence-to-structure alignment.
3. Scoring Function: Evaluate the fit using a sophisticated scoring function (or potential function) that measures the compatibility of the target sequence with the template structure's environment. This score typically includes terms for:
  - Local environment fitness (e.g., preference for a specific residue in a particular secondary structure/solvent accessibility).
  - Pairwise residue-residue interactions (contacts).
  - Sequence similarity (though weighted less than structure features).
4. Fold Selection: The fold that yields the best (lowest energy or highest compatibility) score is selected as the predicted structure.

C. Ab Initio Modeling (Template-Free Modeling)

Principle: Predicts the 3D structure from the amino acid sequence alone, based on physical principles like energy minimization (Anfinsen's dogma: the native structure is the global minimum of the free energy). This is also known as de novo prediction.
Applicability: Used for proteins with entirely novel folds (no detectable homology or known fold in the PDB). This is the most computationally demanding method and is generally reserved for small proteins (under ~100 residues).
Key Concepts:
- Levinthal's Paradox: The astronomically large number of possible conformations a polypeptide chain can adopt, making a complete search impossible. Ab initio methods seek efficient search strategies.
- Energy Landscape: The concept that protein folding follows an energetically favorable pathway toward the native state, which is the global energy minimum.
Computational Methods:
- Conformational Search: Algorithms like Monte Carlo (MC) simulations or Molecular Dynamics (MD) simulations are used to explore the conformational space.
- Fragment Assembly: A popular technique (e.g., used by ROSETTA). The target sequence is broken down into small fragments, and the best-fitting fragments from known structures are assembled to generate candidate models.
- Deep Learning (e.g., AlphaFold/AlphaFold 2): Modern advancements have revolutionized this area. These tools use neural networks trained on large datasets to accurately predict inter-residue distances and contacts, which significantly constrains the possible 3D structures, effectively solving a major part of the ab initio problem.

Feature	Homology Modeling	Protein Threading	Ab Initio Modeling
Template Requirement	High sequence identity template needed.	Template of the correct fold (low sequence identity) needed.	No template required.
Sequence Identity	High (>30%)	Low (<20%, "twilight zone")	Very low or None (novel folds)
Accuracy	Highest (Near-experimental quality if identity >50%)	Moderate (Good fold, variable side-chain accuracy)	Variable (Generally lower, but high with AlphaFold)
Computational Cost	Lowest	Moderate to High	Highest (Traditionally)
Basis	Evolutionary conservation of structure/sequence.	Small, finite number of protein folds.	Physical/chemical principles (energy minimization).

Significance of CASP

The Critical Assessment of protein Structure Prediction (CASP) experiment is a community-wide, biennial competition that objectively evaluates the performance of different prediction methods. It has been instrumental in driving progress in the field, culminating in the groundbreaking performance of DeepMind's AlphaFold 2 in recent years, which demonstrated near-experimental accuracy for many targets, effectively closing the gap between prediction and experiment for most template-based and many ab initio cases.

Disulfide Bridges and Protein Conformation

Disulfide bridges (or bonds) are crucial covalent cross-links in many proteins, playing a vital role in determining, stabilizing, and sometimes changing their three-dimensional structure (conformation) and, consequently, their function.

1. Role in Determining and Stabilizing Structure

· Covalent Stabilization: Disulfide bridges are formed by the oxidation of the sulfhydryl (-SH) groups of two nearby Cysteine amino acid residues, resulting in a stable bond (-S-S-) between them (forming a Cystine residue).

· Fixing Conformation: Unlike weaker non-covalent interactions (like hydrogen bonds or van der Waals forces), the covalent nature of the disulfide bridge provides a permanent constraint that locks parts of the polypeptide chain into a specific spatial arrangement. This greatly increases the stability of the protein's native, active structure against denaturing conditions (e.g., heat or mild chemical changes).

· Sequence Determines Bridge Location: The formation of the correct disulfide pairings is typically guided by the overall folding process, which is dictated by the protein's primary amino acid sequence (as demonstrated by Anfinsen). Only one specific arrangement of bridges (out of many possibilities) yields the biologically active conformation.

2. Impact of Cleavage (Reduction)

· Denaturation and Loss of Stability: Treating a protein with a strong reducing agent (e.g., β-mercaptoethanol) cleaves the disulfide bonds back into two free sulfhydryl groups.

· Conformational Collapse: This cleavage removes the major covalent cross-links stabilizing the structure. If simultaneously treated with agents that disrupt non-covalent bonds (like urea), the protein unfolds from its ordered native state into a random coil—a process called denaturation.

· Functional Inactivation: The loss of the specific three-dimensional structure results in the complete loss of biological activity (e.g., an enzyme becomes inactive).

3. Impact of Oxidation (Re-formation)

· Renaturation: When a fully reduced and unfolded polypeptide is allowed to re-oxidize under appropriate conditions (e.g., without denaturing agents like urea), the protein can spontaneously refold into its original native conformation.

o The correct folding pathway ensures the correct, native disulfide pairings are reformed.

o This confirms that the information required to specify the active structure is inherent in the amino acid sequence.

· Mis-folding and Scrambling: If the reduced polypeptide is oxidized while it is prevented from folding correctly (e.g., by the presence of a strong denaturant), the disulfide bonds form randomly, creating incorrect or scrambled pairings.

o These mis-paired structures are generally inactive or have severely reduced activity.

· Thermodynamic Correction: Scrambled proteins can often be corrected. Trace amounts of a reducing agent can catalyze the rearrangement of the incorrect disulfide bonds. This process is driven by thermodynamics, as the protein spontaneously converts from unstable, scrambled conformations to the most stable, native conformation with the correct bridges.

Protein Misfolding and Diseases

Protein misfolding, where a protein fails to acquire or maintain its native 3D structure, is linked to a variety of diseases known as amyloidoses.

1. Amyloidosis:

· Diseases characterized by the deposition of insoluble protein aggregates called amyloid fibrils or plaques.

· Normally soluble proteins are converted into insoluble forms that are rich in $β$ -sheets.

· Misfolded proteins aggregate and act as a seed (nucleation site), forcing other correctly folded proteins to adopt the incorrect, aggregated conformation.

2. Prion Diseases (Transmissible Spongiform Encephalopathies)

· Infectious neurological diseases transmitted by prions—agents composed solely of a misfolded protein.

· Examples: Mad cow disease (BSE), Creutzfeldt-Jakob disease (CJD).

· Prion Conversion:

o Normal Form ( $PrP C$ ): Rich in $α$ -helices.

o Infectious Form ( $PrP Sc$ ): Has converted to a structure rich in $β$ -strands that aggregate into fibers.

· Transmission: Transfer of the $PrP Sc$ aggregate seed spreads the disease.

3. Non-Infectious Amyloidoses

· Alzheimer Disease: Associated with amyloid plaques composed of the $β$ -strand rich polypeptide Aβ.

· Parkinson Disease: Also involves neurotoxic protein aggregates.

· Toxicity: Small aggregates (oligomers) of the misfolded proteins are currently hypothesized to be the primary culprits, possibly by damaging cell membranes.

4. Post-Translational Modifications (PTMs)

Protein function is often augmented by chemical modifications applied after synthesis:

Modification	Purpose/Function	Example
Phosphorylation	Reversible "on/off" switch for regulation.	Addition to Ser, Thr, or Tyr for signal transduction (e.g., insulin).
Hydroxylation	Stabilizes structure.	Addition to Proline stabilizes collagen (vitamin C required).
Glycosylation	Makes proteins hydrophilic; aids signaling.	Adding carbohydrate units to cell-surface or secreted proteins.
Cleavage/Trimming	Activates function or creates hormones.	Inactive digestive enzymes are activated by cleavage.

5. Protein Cleavage (Proteolytic Processing)

Many proteins require cleavage and trimming of the polypeptide chain after synthesis to become active or functional:

· Activation: Digestive enzymes are synthesized as inactive precursors (zymogens) and are activated by cleavage after release into the intestine.

· Structure/Function: In blood clotting, soluble fibrinogen is converted into insoluble fibrin via peptide-bond cleavage.

· Hormones/Viral Proteins: Many polypeptide hormones and viral proteins are produced by the cleavage of a single, large precursor polyprotein.

Monday, 17 November 2025

Protein Folding