25 May 2011:: SVA is published in Bioinformatics.

21 Mar 2011:: SVA V1.10 is released: [1]Supports GRCh build 37/hg19; [2] Supports user annotation track in GFF3 or BED formats.

9 Sep 2010:: The characterization of twenty sequenced human genomes. [Article]

12 Jul 2010:: LabCorp Launches Interleukin 28B Polymorphism (IL28B) Genotype Test to Support Individualized Treatment Decisions for Patients with Hepatitis C Viral Infection.

17 Jun 2010:: Causal variants for metachondromatosis are identified.
[Article] [SVA screenshot]
[GenomeWeb: The Daily Scan]

18 Mar 2010:: SVA 1.02[beta] is released.

11 Mar 2010:: SVA 1.01[beta] is released with a command line tool.

8 Mar 2010:: A lite evaluation edition is released for Windows. Play with it on your laptop!

23 Jan 2010:: SVA 1.00[beta] is released.

Proof of concept

[Study 1]

We use a next-generation sequencing (NGS) project funded by the The Bill & Melinda Gates Foundation to David Goldstein at Duke University Center for Human Genome Variation, as a proof of concept, or a positive control, for the verification of the feasibility and capability of SVA. This NGS project involves sequencing a number of type A hemophilia patients (with a separate second phenotype too), and a number of control genomes (with a number of different other phenotypes, funded by grants from other agencies to David Goldstein).

The genetic cause for type A hemophilia is known. Therefore, as a positive control experiment, it is interesting for us to ask: can SVA identify this known genetic cause?

Step 1.

We use SVA to fully annotate the functions of genetic variants identified from 10 type A hemophilia patients (cases), and 10 control genomes (controls). These variants include Single Nucleotide Variants (SNVs), Insertion/Deletions (INDELs), and larger Structural Variations (SVs). Here is a table listing the functional categories defined in SVA. Here is a diagram illustrating how the annotation process works.

We use the human reference genome build 36, Ensembl database build 50, and a set of other biomedical resources to perform this annotation.

Step 2.

Assuming a recessive model, we then want to answer a question: in this dataset of 10 case genomes and 10 control genomes, if we rank each gene in human genome by the number of case genomes that carry at least one homozygous gene-disrupting variant, where that homozygous variant is not carried by any of the control genomes, what will that list be like? Note the eligible gene-disrupting variants interrupt the same gene in all case genomes, but the variant itself may vary in different case genomes.

We define 'gene-disrupting variant' here as premature stop SNVs and/or frameshifting INDELs. Users also have options to include other functional categories.

We answer this question through a 'Gene Prioritization' function in SVA.


We find that the Factor VIII (F8) gene, is ranked number 1 in this list. It is well known and well confirmed that type A hemophilia is caused by Factor VIII deficiency.

This positive control experiment demonstrates that SVA is capable of not only annotating the genetic variants identified in a considerable number of genomes using a next-generation sequencing design, but also of helping identify the causal genetic locus for a genetic disorder with a specific genetic model.

Example dataset

We include part of these data as an example dataset released with SVA. We only release part of the data for the purpose of reducing the size, but the concept is the same. Interested users may play with this dataset and replicate the above process.


[Study 2]

Sobreira, N.L.M. et al. (2010) Whole-Genome Sequencing of a Single Proband Together with Linkage Analysis Identifies a Mendelian Disease Gene. PLoS Genet 6 , e1000991. doi:10.1371/journal.pgen.1000991. [SVA screenshot] [GenomeWeb: The Daily Scan]