Functional categories defined by SVA
A number of different functional classifications are used to categorize variants in SVA. The status of protein-coding for a given transcript, is determined by the annotation given in Ensembl release 50_36l (build 36, June 2008 version) by default. The use of the phrase “known gene” below indicates that this region is annotated as a transcribed gene according to Ensembl release 50 (or build 36, June 2008 version) by default. The database version is determined by the SVA supporting databases specified in the project file, and newer versions of supporting databases will be released in the future.
Please note one variant may have more than category of functions, and/or a same function for several transcripts belonging to a same gene or several different genes.
We define the functional categories separately for three different types of variants:
Single nucleotide variants (SNVs):
Name
|
Definition used in SVA
|
stop gained |
a variant introducing a premature TAG, TAA, or TGA stop codon in a transcript of a protein-coding gene |
stop lost |
a variant causing a loss of a TAG, TAA, or TGA stop codon in a transcript of a protein-coding gene |
non-synonymous coding |
a variant located in a codon resulting in a change from one amino acid residue to another, excluding variants that can be defined as the above two categories |
essential splice site |
a variant changing the highly conserved GU in the first two positions of the intron or the last two positions of the intron (AG)
|
intron-exon boundary |
a variant occurring in any of the first N1 nucleotides into the intron, or N2 nucleotides into the exon . N1 by default is 8, and can be defined by the user with the "[INTRON_EXON_BOUNDARY_INTO_INTRON]=" statement in the .gsap project file. N2 by default is 3, and can be defined by the user with the "[INTRON_EXON_BOUNDARY_INTO_EXON]=" statement in the .gsap project file.
Note: if a variant can be defined as "essential splice site ", it will not be included in this category again. |
5' UTR |
a variant located within the 5' UTR of a transcript of a protein-coding gene |
3' UTR |
a variant located within the 3' UTR of a transcript of a protein-coding gene |
exonic non-coding RNA |
a variant occurring in an exon of a non-coding RNA |
upstream |
an intergenic variant occurring with N nucleotides from the transcript start site of a known gene. N by default is 1000, and can be defined by the user |
downstream |
an intergenic variant occurring with N nucleotides from the transcript end site of a known gene. N by default is 1000, and can be defined by the user |
intronic |
a variant in the intron of a known gene, and cannot be defined as splice site or intron-exon boundary |
synonymous coding |
a variant located in a codon but not resulting in a change from one amino acid residue to another |
intergenic |
a variant not located within a known gene, and cannot be defined as an upstream or a downstream variant |
splice site |
reserved blank field in SNV records. Replaced by "intron-exon boundary" functional category |
frameshift coding |
reserved blank field in SNV records. Replaced by INDEL functional category |
regulation region |
reserved field |
reference |
reserved field |
cannot annotate |
reserved field |
Small insertion/deletions (INDELs)
Name
|
Definition used in SVA
|
coding disrupting frameshift |
an indel (insertion or deletion) that is located in a protein-coding sequence and that is not a multiple of three, and thus will cause a frameshift in the resulting protein |
coding disrupting other |
an indel (insertion or deletion) that is:
(1) located in a protein-coding sequence and is a multiple of three, and thus will cause coding changes but will not cause a frameshift in the resulting protein ;
or
(2) located in a non-protein-coding sequence . |
transcript included |
used to indicate that the indel occurs in a location that overlaps a known gene transcript in its full length. Please note that this is commonly not possible for an INDEL defined in a whole-genome sequencing study, but is included here as a separate functional category. |
5' UTR |
a variant located in a location that overlaps the 5' UTR of a transcript of a protein-coding gene |
3' UTR |
a variant located in a location that overlaps the 3' UTR of a transcript of a protein-coding gene |
intron-exon boundary |
a variant occurring in any of the first N nucleotides into the intron. N by default is 6, and can be defined by the user |
upstream |
an intergenic variant occurring with N nucleotides from the transcript start site of a known gene. N by default is 1000, and can be defined by the user |
downstream |
an intergenic variant occurring with N nucleotides from the transcript end site of a known gene. N by default is 1000, and can be defined by the user |
intronic |
a variant in the intron of a known gene, and cannot be defined as intron-exon boundary |
intergenic |
a variant not located within a known gene, and cannot be defined as an upstream or a downstream variant |
cannot annotate |
reserved field |
Large structural variations (SVs)
Name
|
Definition used in SVA
|
coding disrupted |
a structural variant that is located overlapping part of the coding sequence of a known protein-coding gene, but is not covering the whole length of any transcrpt of that gene |
possible coding disrupted |
same with the above category, but reserved for SVs where the break points are particularly difficult to estimate |
transcript included |
used to indicate that the structural variation occurs in a location that overlaps a known gene transcript in its full length. |
5' UTR |
a variant located in a location that overlaps the 5' UTR of a transcript of a protein-coding gene |
3' UTR |
a variant located in a location that overlaps the 3' UTR of a transcript of a protein-coding gene |
intron-exon boundary |
a variant occurring in any of the first N nucleotides into the intron. N by default is 6, and can be defined by the user |
upstream |
an intergenic variant occurring with N nucleotides from the transcript start site of a known gene. N by default is 1000, and can be defined by the user |
downstream |
an intergenic variant occurring with N nucleotides from the transcript end site of a known gene. N by default is 1000, and can be defined by the user |
intronic |
a variant in the intron of a known gene, and cannot be defined as splice site or intron-exon boundary |
intergenic |
a variant not located within a known gene, and cannot be defined as an upstream or a downstream variant |
cannot annotate |
reserved field |