Functional categories defined by SVA

 

A number of different functional classifications are used to categorize variants in SVA. The status of protein-coding for a given transcript, is determined by the annotation given in Ensembl release 50_36l (build 36, June 2008 version) by default.  The use of the phrase “known gene” below indicates that this region is annotated as a transcribed gene according to Ensembl release 50 (or build 36, June 2008 version) by default. The database version is determined by the SVA supporting databases specified in the project file, and newer versions of supporting databases will be released in the future.

Please note one variant may have more than category of functions, and/or a same function for several transcripts belonging to a same gene or several different genes.

We define the functional categories separately for three different types of variants:

Single nucleotide variants (SNVs):

Name

Definition used in SVA

stop gained a variant introducing a premature TAG, TAA, or TGA stop codon in a transcript of a protein-coding gene
stop lost a variant causing a loss of a TAG, TAA, or TGA stop codon in a transcript of a protein-coding gene
non-synonymous coding a variant located in a codon resulting in a change from one amino acid residue to another, excluding variants that can be defined as the above two categories
essential splice site a variant changing the highly conserved GU in the first two positions of the intron or the last two positions of the intron (AG)
intron-exon boundary a variant occurring in any of the first N1 nucleotides into the intron, or N2 nucleotides into the exon . N1 by default is 8, and can be defined by the user with the "[INTRON_EXON_BOUNDARY_INTO_INTRON]=" statement in the .gsap project file. N2 by default is 3, and can be defined by the user with the "[INTRON_EXON_BOUNDARY_INTO_EXON]=" statement in the .gsap project file.

Note: if a variant can be defined as "essential splice site ", it will not be included in this category again.
5' UTR a variant located within the 5' UTR of a transcript of a protein-coding gene
3' UTR a variant located within the 3' UTR of a transcript of a protein-coding gene
exonic non-coding RNA a variant occurring in an exon of a non-coding RNA
upstream an intergenic variant occurring with N nucleotides from the transcript start site of a known gene. N by default is 1000, and can be defined by the user
downstream an intergenic variant occurring with N nucleotides from the transcript end site of a known gene. N by default is 1000, and can be defined by the user
intronic a variant in the intron of a known gene, and cannot be defined as splice site or intron-exon boundary
synonymous coding a variant located in a codon but not resulting in a change from one amino acid residue to another
intergenic a variant not located within a known gene, and cannot be defined as an upstream or a downstream variant
splice site reserved blank field in SNV records. Replaced by "intron-exon boundary" functional category
frameshift coding reserved blank field in SNV records. Replaced by INDEL functional category
regulation region reserved field
reference reserved field
cannot annotate reserved field

Small insertion/deletions (INDELs)

Name

Definition used in SVA

coding disrupting frameshift an indel (insertion or deletion) that is located in a protein-coding sequence and that is not a multiple of three, and thus will cause a frameshift in the resulting protein
coding disrupting other an indel (insertion or deletion) that is:
(1) located in a protein-coding sequence and is a multiple of three, and thus will cause coding changes but will not cause a frameshift in the resulting protein ;
or
(2) located in a non-protein-coding sequence .
transcript included used to indicate that the indel occurs in a location that overlaps a known gene transcript in its full length. Please note that this is commonly not possible for an INDEL defined in a whole-genome sequencing study, but is included here as a separate functional category.
5' UTR a variant located in a location that overlaps the 5' UTR of a transcript of a protein-coding gene
3' UTR a variant located in a location that overlaps the 3' UTR of a transcript of a protein-coding gene
intron-exon boundary a variant occurring in any of the first N nucleotides into the intron. N by default is 6, and can be defined by the user
upstream an intergenic variant occurring with N nucleotides from the transcript start site of a known gene. N by default is 1000, and can be defined by the user
downstream an intergenic variant occurring with N nucleotides from the transcript end site of a known gene. N by default is 1000, and can be defined by the user
intronic a variant in the intron of a known gene, and cannot be defined as intron-exon boundary
intergenic a variant not located within a known gene, and cannot be defined as an upstream or a downstream variant
cannot annotate reserved field

Large structural variations (SVs)

Name

Definition used in SVA

coding disrupted a structural variant that is located overlapping part of the coding sequence of a known protein-coding gene, but is not covering the whole length of any transcrpt of that gene
possible coding disrupted same with the above category, but reserved for SVs where the break points are particularly difficult to estimate
transcript included used to indicate that the structural variation occurs in a location that overlaps a known gene transcript in its full length.
5' UTR a variant located in a location that overlaps the 5' UTR of a transcript of a protein-coding gene
3' UTR a variant located in a location that overlaps the 3' UTR of a transcript of a protein-coding gene
intron-exon boundary a variant occurring in any of the first N nucleotides into the intron. N by default is 6, and can be defined by the user
upstream an intergenic variant occurring with N nucleotides from the transcript start site of a known gene. N by default is 1000, and can be defined by the user
downstream an intergenic variant occurring with N nucleotides from the transcript end site of a known gene. N by default is 1000, and can be defined by the user
intronic a variant in the intron of a known gene, and cannot be defined as splice site or intron-exon boundary
intergenic a variant not located within a known gene, and cannot be defined as an upstream or a downstream variant
cannot annotate reserved field