Here is a Sanger FASTQ read from the NCBI SRA (shown earlier in the FASTA and QUAL formats):
There are four line types in the FASTQ format. First a ‘@’ title line which often holds just a record identifier. This is a free format field with no length limit—allowing arbitrary annotation or comments to be included, as in the example above where the NCBI have included an alternative ID and the sequence length. Some sequencing centers encode paired end read information here (alternatively two matched FASTQ files are often used).
Second comes the sequence line(s), which as in the FASTA format can be line wrapped. Also like FASTA format, there is no explicit limitation on the characters expected, but restriction to the IUPAC single letter codes for (ambiguous) DNA or RNA is wise, and upper case is conventional. In some contexts, the use of lower or mixed case or the inclusion of a gap character may make sense. White space such as tabs or spaces is not permitted.
Third, to signal the end of the sequence lines and the start of the quality string, comes the ‘+’ line. Originally this also included a full repeat of the title line text (as shown in the NCBI example above); however, by common usage and the MAQ tool convention, this is optional and the ‘+’ line can contain just this one character, reducing the file size significantly. The OBF tools follow this MAQ convention on output, and omit the optional repeated title text.
Finally, comes quality line(s) which again can be wrapped. As discussed above, these use a subset of the ASCII printable characters (at most ASCII 33–126 inclusive) with a simple offset mapping. Crucially, after concatenation (removing line breaks), the quality string must be equal in length to the sequence string.
It is vital to note that the ‘@’ marker character (ASCII 64) may occur anywhere in the quality string—including at the start of any of the quality lines. This means that any parser must not treat a line starting with ‘@’ as indicating the start of the next record, without additionally checking the length of the quality string thus far matches the length of the sequence.
Because of this complication, most tools output FASTQ files without line wrapping of the sequence and quality string. This means each read consists of exactly four lines (sometimes very long lines), ideal for a very simple parser to deal with. The OBF tools follow this convention on output, as does the MAQ conversion script. We recommend this for maximum compatibility with (simplistic) parsers.
Because FASTQ files (like FASTA files) are plain text, the new line characters will normally follow the operating system convention. However, as data are shared between machines, any parser should cope with both Unix style new lines (line feed only, ASCII 10) and DOS/Windows style (carriage return and line feed, ASCII 13 then 10).