By combining synthetic poliovirus genome constructs with the large read depth conferred by Illumina sequencing, we describe a recombination map covering 82% of the poliovirus 1 coding region with over 50 thousand recombinant molecules observed. A whole genome recombination rate of 0.10 to 0.12 crossovers per genome per infectious cycle was observed for biological replicates. This rate is within the previously published estimates of 1–20% for near identical strains in cell culture 
. It is important to note that our recombination estimate differs in form from most previous experiments by examining the RNA of all virions produced rather than examining viable isolates. We used a large number of input viral RNA genomes compared to observed genomes equivalents (50 million vs. 723 thousand) in order to minimize multiple observations of the same viral genome amplified by PCR with an estimated 87.5% recombinants observed being from unique input starting molecules. It is important to note that our estimates of recombination frequency and location are thus presented with the caveat that we have observed the frequency of recombinants at the conclusion of infection and not necessarily the recombination events themselves, as it is not possible to identify recombinant molecules that are phylogenetically unique. For example, a recombination event could be followed by replication and subsequent positive strand amplification, resulting in a bias that would not be distinguishable in this data.
This mapping technique is amenable to any virus for which there is an infectious clone and suitable cell line for transfection and coinfection, and could subsequently be applied to animal infections. Notably, this strategy is also possible in poorly studied viruses as no pair of selectable mutations need be identified and characterized prior to construct design. We note that this mapping strategy is intended for homologous recombination and is unsuited for the mapping of non-homologous recombination due to the reliance on specific PCR and amplicon size selection, which in turn selects against large deletions or duplications. While requiring a different methodology and controls, such maps should be feasible and comparison of those to homologous recombination maps could prove informative.
Poliovirus was used here as a well-understood model, but was also advantageous due to robust growth in cell culture. While our synthetic virus had an identical protein coding sequence to the wild type, there are presumably undiscovered RNA secondary structure elements in the poliovirus genome that were disrupted by the markers. Three mutations in the C1 strain arose, however none of these coincided with markers and thus cannot be considered direct revertants. Whether these mutations represent compensatory changes to currently unknown secondary structure elements or rose to prominence in the population for other reasons is unknown. It is important to note that there could be unanticipated selection forces operating within this mapping system that could result in bias for or against recombinant viruses. While it is impossible to eliminate the possibility, we have attempted to minimize the likelihood of such selection by avoiding the use of selectable markers and by collecting virus progeny after only one infectious cycle, conducted at high multiplicity of infection.
The sample preparation requirements of ultra-high throughput sequencing are prone to artifactual recombination by template switching during library production. Previous studies using RT-PCR to characterize recombination frequency may have avoided this issue by using extremely low starting concentrations of template. Library preparation techniques require quantities of template orders of magnitude greater than that required for RT-PCR, necessitating the development of the emulsion-based library generation protocol described here. We note that our emulsion generation method (bead milling) produces variable vesicle sizes that require generous template dilutions, and it is likely that this could be improved by utilizing microfluidic droplet makers 
. Alternatively, Ozsolak et al 
have sequenced RNA molecules directly without reverse transcription, which could provide a more direct means of assaying recombination with a similar viral construct design.
Phylogenetic studies rarely observe enterovirus recombinants with crossovers in the capsid region. This observation could be the result of protein incompatibility affecting viability, low nucleotide homology preventing recombination from occurring at all, or some sequence-based factor dampening recombination. Our results do not support a significant difference in recombination rate between the capsid and the non-structural region, even including the large hotspot at the RNAseL element.
The extremes of GC content, and in particular long tracts of only AU or GC nucleotides, are also associated with bias in recombination frequency. In the simplest interpretation, incomplete RNAs terminating in GC-rich sequences could be expected to anneal to a new template genome more robustly than AU-rich sequences as a straightforward matter of thermodynamics and in line with the established copy-choice mechanism (treated in King 1988 
). This interpretation suggests that in poliovirus, thermodynamic factors influence annealing of the nascent strand to the recipient genome to a greater extent than the initial dissociation of the donor genome. In the converse scenario, GC-rich regions would instead be less prone to fraying or dissociation from the original template and be associated with reduced recombination. The inverse symmetry of GC and AU effects further favors a simple thermodynamic model. An alternate and not exclusive model would consider RNA secondary structure to be the mechanism for recombination modulation, with GC and AU content influencing recombination indirectly by altering secondary structure stability.
Our results support earlier associations of the RNAseL element with recombination and further suggest that local secondary structure, as predicted in silico, also globally influences recombination rate. We also note that a recently described RNA secondary structure (Burril et al, personal communication) also corresponds to a recombination hotspot in the 3D region. While these two biologically functional secondary structures correspond to regions of high recombination, our in silico prediction simply examines the potential for local secondary structure, and not biological function. These conclusions suggest that it is plausible that a global redesign of the poliovirus genome could be implemented with the intent of reducing recombination potential by disrupting secondary structure elements and modulating nucleotide use.
The frequency of AU and GC tracts is associated with the genomic GC content in Picornavirus species. Poliovirus represents a moderate case with a GC content of 46%. Other Enterovirus species, the genus Cardiovirus and most newly described or proposed genera have a similar GC content and AU/GC tract frequency (). The genera Parechovirus, Hepatovirus and the Rhinovirus species all possess higher than average AU content, while the genera Apthovirus and Kobuvirus are GC rich relative to other picornaviruses. Based on the AU and GC tract associations described, we would predict that intra-typic homologous recombination rates within the GC-rich clades would be greater than poliovirus (eg. Aichivirus, FMDV), and that the AT-rich clades (parechoviruses, hepatoviruses, rhinoviruses) would have less intra-typic recombination potential than poliovirus. A major caveat of this prediction is that other factors, such as replication kinetics, the formation of replication rosettes, and differences in the viral polymerase could potentially confound such a simple relationship. Further, the tendency of each clade towards mixed heterotypic infections as a function of number of strains, shared cell tropism or frequency of infection are all confounding variables. No comparable recombination studies in vitro using nearly identical strains have been performed in these other picornaviruses, thus we cannot directly compare recombination frequency as opposed to other limits on homologous recombination.
AU- and GC-tract frequency in Picornavirus species.
The GC/AU and secondary structure motifs are straightforward to identify and can be engineered, with caveats. We modified a test region representing 4.5% of the genome to create or extend GC-rich tracts with synonymous mutations and eliminate AU tracts. The net effect of this modification was an increase in GC content (by 12%) and an increase in predicted folding energy (by 26%). This redesign underscores the difficulty of modifying coding sequence while leaving other, possibly vital, sequence factors in place. GC-content in virus sequences may be a form of adaptation to the host 
and it is possible that making GC-content changes across an entire genome will render a virus non-viable or adjust its growth parameters, such as cell tropism and permissive temperature. CpG and UpA elements in RNA are underrepresented in mammalian RNA viruses 
and have been associated with immune stimulation 
and endonuclease susceptibility 
. Notably, Burns et al (2009) re-engineered Poliovirus 2 to increase GC content by 15% while maintaining CpG and UpA frequency without compromising viability in cell culture, however when only 9% of the genome was saturated with UpA and CpG elements the virus was rendered almost nonviable 
Lessons from poliovirus vaccines clearly teach the need for a better understanding of recombination potential and the factors that influence it. Ultimately, knowledge and manipulation of these factors may assist in the development and validation of recombination deficient attenuated vaccine strains.