Integrons were discovered about two decades ago as a result of their role in the evolution of multi-drug-resistant bacteria [
1,
2]. These genetic elements can perform acquisition, rearrangement and expression of genetic material that is part of gene cassettes. Gene cassettes are one of the simplest known mobile elements. They comprise gene(s) associated with a recombination site most commonly referred to as
attC [
3] and less commonly as 59-base elements (59-be) [
4]. The integron captures gene cassettes through site-specific recombination carried-out by the encoded tyrosine recombinase (IntI). These captured cassettes are most commonly inserted by this recombination activity at the integron attachment site (
attI) [
4] (Figure ). Such capture events can occur repeatedly and, in the case of some chromosomal integrons, this process can lead to the creation of large arrays encoding hundreds of gene cassettes [
5]. A promoter, P
c, often located upstream from
attI, is thought to enhance expression of proximal cassette-associated genes in some integrons [
6]. The ability to capture disparate individual genes and physically link them in arrays suitable for co-expression is a trait unique to this genetic element. The result is an assembly of functionally interacting genes theoretically facilitating the rapid evolution of new phenotypes [
7].
Initially, integrons were thought of as specialized elements mostly involved in the accumulation of gene cassettes encoding antibiotic resistance determinants in pathogenic bacteria [
8]. The advent of genomics and the availability of numerous genome sequences from environmental bacteria made it clear that the integron is a more ancient and widespread gene capture system [
9]. Despite the fact that this genetic element is found in about 10% of all sequenced genomes and that cassette arrays can be as large as 150 kb [
10], few integrons have been properly identified and annotated as such. Even when the integron integrase gene is annotated due to its sequence similarity to characterized homologs, the gene cassettes associated with it are labeled as simple open reading frames (ORFs). This is because
attC sites, the most distinctive feature of gene cassettes, are non-coding regions and therefore, not recognized by standard automated genome annotation pipelines.
Integrons are a flexible and fast-evolving part of microbial genomes, and their associated cassette arrays represent a unique and segregated gene pool (the genes they carry are rarely found outside these genetic elements). However, because integrons (other than the specialized variants carrying antibiotic resistance genes) rarely have a detectable phenotype under laboratory conditions, they are seldom emphasized in genomic studies. Their proper identification and annotation could help us understand their role in microbial adaptation.
We have created ACID, which stands for Annotation of Cassette and Integron Data, as a resource for the biology community. ACID contains all integrons and gene cassette sequences available in public databases manually curated and accurately annotated. Users can freely access and download these data and/or automatically annotate and submit their own sequences. Tools for the visualization and comparison of cassette arrays are also made available.