Segmental Duplication DB

This page contains updated data regarding recent segmental duplications in the human genome build30 (June 2002).   It focuses on genomic duplications >1 kb and >90% identity.

The Whole Genome Shotgun Sequence Detection (WSSD) Database
The April 2002 WSSD was compared to the assembly and used for filtering.

Mapping  of sequence coordinates to build30 (June 2002)
    Redundant:           (Unix gzip format)  (PC zip format)
    Non-redundant:     (Unix gzip format)  (PC zip format)
    (Unique regions >90% and >500 bp were mapped.)


The Whole Genome Assembly Comparison (WGAC) for genome build30 (June 2002)
An all by all comparision of duplications (>90%, > 500 bp in length) using a previously described method (Bailey et al, 2001).

Unfiltered Set (covering ~8% of assembly)
    Pairwise Alignments  (Unix gzip format)  (PC zip format)

Filtered Set (covering ~4.5% of assembly) 
(Alignments (>98%) with insufficent WSSD evidence were removed.)  
Pairwise Alignments    (Unix gzip format)  (PC zip format)
    Fasta Sequence of underlying Assembly  
       (Unix gzip format)  (PC zip format)

Coordinates of Merged NonRedundant Filtered WGAC and WSSD 
    Pairwise Alignments    (Unix gzip format)  (PC zip format)

   WSSD/WGAC Header Descriptions

Additional Data Access

Chromosomal Views of Duplications with Gaps Emphasized
Views of Chromosomes emphasizing gaps (green).  The WSSD duplication regions (top track) are black.  The assembly (WGAC) duplications (red and blue for inter and intra chromosomal, respectively) are broken into >98% similar (top) and <98% similar (bottom).