Using this software0. For impatient users
1. Data inputs
2. Create a project
3. Annotate a project
Using user annotation track
4. Filter for quality scores
5. Main user interface
6. SVA genome browser
7. SVA tables
8. Selecting genes or regions
Bio-pathway or GO
Fisher's exact test
10. Exome or targeted capture sequencing
This document describes the old input format required by SVA versions prior to 1.1. For newer versions of SVA 1.1 and onwards, where the vcf format is supported, see here.
SVA users need to prepare four (4) types of input files for an SVA project. All these files, except for type 3, can be generated from a pileup file, a format first used by Tony Cox and Zemin Ning at the Sanger Institute. This pileup file can be generated from software tools, for example, SAMtools, in a next-generation sequencing study. However, please note that the specific pileup format we used here is a bit different from the default SAMtools output format described here. Later in this page I will include detailed information and programs to generate those files.
In addition, there is an optional pedinf file for an SVA project. This file lists the subjects in a linkage format. This file is not necessary for SVA annotation tasks, but is necessary for some SVA analysis and exporting functions.
I will assume that the SVA users are already familiar with next-generation sequencing data pipelines, particularly using BWA/SAMtools. The file name extensions in the above box is only for SVA to conveniently recognize the relative format. Although we do ourselves use BWA/SAMtools, the file extensions do not indicate that SVA only takes outputs from SAMtools. SVA does not distinguish which software generates the alignment results, as long as the format is in the pileup file format described below.
There is another important note:
The basic data generation flow described below is based on our experience for your reference.
Step 1. Generating pileup file
We used SAMtools to generate the pileup file:
There is an important note regarding the chromosome designatations, which will affect the following data generation.
Step 2. Generating variant file
We used SAMtools to generate the variant file (Please note this is a basic example. Your actual parameters may vary.):
Step 3. Generate SNV file .samtools
We used a simple perl script snp_filter.pl (download it here) to generate the SNV file:
Here is an example of the generated .samtools file:
The columns are: chromosome name, coordinate, reference allele, variant allele, Phred-like consensus score, SNP quality, RMS score, read depth, pileup bases.
Step 4. Generate INDEL file .samtoolsindels
We used a simple perl script indel_filter.pl (download it here) to generate the INDEL file:
Here is an example of the generated .samtoolsindels file:
The columns are: chromosome name, coordinate, a star, the genotype, consensus quality, SNP quality, RMS mapping quality, # covering reads, the first alllele, the second allele, # reads supporting the first allele, # reads supporting the second allele.
(Optional) Step 5. Generate SV file .events
We used a separate program (ERDS) to generate the SV file:
Here is an example of the generated .events file:
The columns are: chromosome name, start coordinate, end coordinate, SV status (diploid=2), LOD score.
Step 6. Generate coverage and quality score file .bco
We used a simple JAVA program pileup2bco.jar (download it here) to generate the chromosome-wise .bco file. Please be noted that the output parameter is in this particular format: [YOUR_BCO_OUTPUT_DIRECTORY]/[YOUR PREFIX TO THE OUTPUT]. For example, this could be: /usr/jack/bco/subject1, where the bco output will be like : /usr/jack/bco/subject1_.1.bco through /usr/jack/bco/subject1_.Y.bco.
Note: This small JAVA program (pileup2bco.jar) accepts pileup file with chromosome designations (column 1) as an integer from 1-22, and X, Y, M.
After you generate these four types of files (with step 5 as optional), you may proceed to create your project.
© 2011 Dongliang Ge, PhD.