Genomic Sequence Coordinate System

 

Like the sequence coodinate system in the SAM format or in the SAMtools program, SVA uses a 1-based coordinate system to represent sequence positions. However, to intuitively represent particularly the insertions and deletions, SVA uses a slightly different way to represent the coordinates of variants from the standard output of SAMtools (see figure below).

Please note that you do not need to modify the SAMtools outputs to fit into the SVA coordinate system, as long as the coordinates in the files you have can be illustrated as below for "SAMtools" format. SVA will take care of it from here and convert.

But you do need to be careful when you compare the coordinates of the results, particularly for the insertions and deletions.

Here is a brief explanation of this figure illustrating the coordinate system:

1. SNV

Both SAMtools and SVA represent the SNVs based on their physical location, starting from 1 (as opposed to 0), in the reference sequence coordinate system.

2. Deletion

SAMtools represents a deletion using a location before the first deleted base in the reference sequence coordinate system.

SVA represents a deletion using two locations: start and end of the deleted bases in the reference sequence coordinate system.

3. Insertion

SAMtools represents an insertion using a location before the first inserted base in the reference sequence coordinate system.

SVA represents an insertion using two locations: the locations of the two bases between which the inserted bases are inserted, in the reference sequence coordinate system.