Differentiation of alleles from paralogues. a) The solid boxes highlight assembly issues with the current genome. The solid gray boxes (locus "A") show a helicase gene where 1 allele is fully sequenced on the P chromosome and the other is in 2 pieces on S. The solid brown boxes (locus "B") show a hypothetical protein with no fully sequenced copy: the 2 genes on the P chromosome and 2 genes on the S chromosome both reside at contig boundaries and should be merged. Finally, the solid blue boxes (locus "C") show a case of 2 copies (at least partially sequenced) on the P chromosome and 1 presumed fully sequenced copy on S of the same hypothetical protein. In all cases, the dotted line boxes indicate the predicted correct coding sequences. Note that the circled genes show other cases of gene truncations at contig boundaries. b) Alignment of 3 annotated genes in the A locus (a helicase gene). The A1 allele is full length, while A2 and A3 are small pieces which exist on the ends of contigs and are not fully sequenced. Note that A2 has near perfect identity with the N-terminus of A1, while A3 has perfect identity with the C-terminus suggesting that these are pieces of the same gene.
Articles from BMC Genomics are provided here courtesy of