Type I collagen peptides were extracted and sequenced from ~ 68 million years old fossils of Tyrannosaurus rex
(Museum of the Rockies [MOR] 1125) 
, (). However, despite multiple lines of evidence to support the presence of collagen, including in situ
antibody binding, the endogeneity of MOR 1125 peptides was disputed, and the sequences instead were suggested to arise from either microbial invasion 
, extant collagens introduced in laboratory experiments 
, or even statistical artifact 
. Collagen peptide sequences were subsequently derived from a second dinosaur, Brachylophosauraus canadensis
(MOR 2598) 
, and included many of the earlier lines of supporting evidence as well as independent replication of data in multiple labs.
Tyrannosaurus rex femur (MOR 1125) from which demineralized matrix (insets; bars, 20 µm) and peptides were obtained.
Surprisingly, advances in collagen biology also support the authenticity of the fossil peptides. The molecular structure of collagen favors preservation. The triple-helical arrangement and intra- and intermolecular cross-links confer stability upon this ubiquitous structural molecule 
. Additionally, when collagen is surrounded by or adsorbed to mineral surfaces, as in bone, its preservation potential is greatly enhanced (e.g. 
). In fibrillar collagens, individual triple-helical molecules aggregate, forming a fibril with a characteristic 67 nm banding pattern that is readily recognized by electron microscopy () 
. Within each 67 nm wide D-period, segments of neighboring molecules are referred to as monomers 1–5 (), and specific functional regions have been mapped to each monomer using a variety of experimental approaches 
The collagen fibril (A) is composed of triple-helical monomers that polymerize in an overlapping fashion (B), and are derived from proteolysis of the soluble procollagen precursor (C).
The stability and unique function conferred by the triple-helical structure of collagen has been known for over forty years, but just how molecules assemble into microfibrils to form the massive cable-like fibrils in tissues has been less well understood. However, recent advances in technology have allowed molecular resolution images of type I collagen microfibrils and fibrils 
. This new information, coupled with non-random distribution of collagen functional sequences and mutations 
, has led to the formation of a testable model linking structure to function in this massive protein assemblage. Discrete cell- and matrix- interaction domains have been identified, and collagen-binding ligands that cooperatively carry out fibril functions have been recognized.
We reasoned that particular functional molecular regions may contribute to their preferential resistance to biological degradation throughout the lifetime of an individual organism. This property not only needs to remain highly conserved through species but also may render those regions resistant to degradation in the burial environment. Thus, molecular models for differential functions of collagen fibril domains or sequences may provide a chemical or structural rationale for preservation. We mapped eleven fossil-derived peptide sequences from two dinosaurs, Tyrannosaurus rex
and Brachylophosauraus canadensis 
on molecular models of extant human and rat collagens 
). These peptides represent eight sequences which localize to seven regions of the monomer, and comprise less than fifteen percent of the length of the collagen triple helix. They were non-randomly distributed in several respects ( and Statistical Analyses
[see Materials and Methods
]). In particular, fossil sequences mapped to regions of the protein partly shielded by tight molecular packing () 
, which may physically stabilize and protect them from enzymatic degradation, thus contributing to their preservation. Comparing the amino acid compositions of fossil peptides with sequences of the entire human protein for predicted properties such as hydrophobicity, polarity and charge revealed that most fossil peptides were from regions of collagen which contain relatively few acidic residues 
, and eight of the peptides (five sequences) lacked such residues altogether, which would limit their solubility and propensity for proteolytic degradation (). Also, five peptides mapped to a uniquely hydrophobic fibril region 
. The results imply that the most stable regions of the protein are those with a more hydrophobic, less acidic nature. That the more exposed, charged regions of collagen with high densities of trypsin cleavage sites yielded fewer fossil peptides suggested their susceptibility to proteolysis in early diagenesis, and supports non-random degradation and preservation patterns for the diverse type I collagen sequence set in fossil bone. It is also interesting to note that perhaps the least stable region, the hydroxyproline deficient thermally-labile domain located towards the C-terminal end of the molecule 
, is not represented by any of the fossil peptides.
Dinosaur peptide sequence positions were mapped on the two dimensional human collagen fibril D-period schematic33.
Figure 4 X-ray diffraction model of the rat collagen microfibril in situ; Integrins, predominant cell-binding site; MMP, matrix metalloproteinase cleavage site; FN, fibronectin binding site; decoron, decorin proteoglycan core protein binding sites; putative cell (more ...)
All fossil-derived peptides mapped to monomers 2, 3, and 4 on the extant collagen models. The remaining monomers, 1 and 5, are joined across microfibrillar layers by intermolecular cross-links that, while stabilizing the molecule and protecting from enzymatic attack, may also hinder peptide extraction. In fact, the only position where alpha 1 chain peptides (Peptides 3 and 8) co-localize with an alpha 2 chain peptide (Peptide 11) mapped to the integrin binding site that promotes cell-collagen interactions, angiogenesis, and osteoblast differentiation; its fibril location and association with severe mutations also suggest its crucial nature 
and hence strong selective pressure for conservation of sequence. One peptide (Peptide 4) mapped to the Matrix Metalloproteinase-1 (MMP-1) cleavage domain crucial for collagen remodeling, and a site for fibronectin binding. In living tissues, the integrin binding site and MMP-1 cleavage/fibronectin binding sequences are somewhat buried under the surface of the collagen fibril, thus fibril proteolysis or injury may be needed to render them available for cell-collagen interactions and tissue regeneration 
. The molecularly “sheltered” environment required to protect crucial biological function may also account for enhanced survival of those protein regions in fossils. Although the majority of the dinosaur peptides are from highly conserved regions of the molecule, both of the alpha 2 chain peptides are highly variable 
. That they are not exclusively from sequences with a high similarity to residues in public databases, suggests that the peptides were not identified solely because they derive from highly conserved sequences; thus, the gaps in our model are not simply due to the lack of peptide identification due to divergence from known organisms. Additional preservation potential may be conferred by association with biomineral, especially if some regions of the collagen molecule are more intimately associated with mineral than others. Conversely, the absence of peptide matches elsewhere in the molecule may be due to lack of response to trypsin resulting from unusual post-mortem modifications which may also confer resistance to proteolytic degradation and contribute to preservation over time 
. Additional collagen sequences may have survived over time, but because of chemical modification or lack of representation in current databases, may not have been recognized by existing search algorithms and therefore not identified in original analyses.
Our results add to the evidence provided by sequence data 
, molecular phylogenetic analyses 
, microstructure 
and immunoreactivity to anti-collagen antibodies 
, that supports persistence of elements of native collagen fibril structure across geological time in some fossils. Most of the peptide sequences aligned perpendicularly with one or more other sequences on the fibril model, implying that neighboring triple-helical segments, or fragments thereof, may have been preserved en bloc.
If supported by further peptide recovery and mapping, this observation would validate current models of collagen monomer arrangement in the fibril 
Mapping the distribution of fossil collagen peptides observed using mass spectrometry to models of collagen function demonstrates that preservation of fossil-derived collagen sequences concurs with current concepts of collagen biology, and provides a molecular mechanism for the preservation of this protein in fossil bone. Moreover, these findings support the endogeneous source and longevity of fossil-derived peptides, because peptides arising from recent contamination are expected to be more concentrated and random in distribution. They would not be expected to be over-represented in regions that so well reflect collagen fibril structure/function relationships in native vertebrate tissue 
Finally, by showing that functionally crucial protein regions are more stable than others over geologic time, we provide insight into selective pressures constraining the molecular structure, function, and hence sequence, of collagen. Paleoproteomics therefore not only holds significant promise for elucidating evolutionary relationships between extinct and extant organisms, but is potentially useful for enhancing our understanding of protein function in living animals. Also, elucidating molecular functions of extant proteins may help predict proteins or protein regions most likely to preserve in fossils, as has also been shown for the highly-conserved and structurally sheltered mineral-binding mid-region of the bone protein osteocalcin 
. As technologies continue to improve in both sensitivity and resolution, the recovery of additional protein sequences from fossils will be enhanced. The understanding of preferential preservation driven by molecular function may be used to adapt search algorithms to optimize studies of ancient molecules recovered from multiple extinct taxa. The recovery of additional sequences, allowed by these advances, may shed further light on the biology of extracellular matrix superstructures of living organisms.