All reads were mapped to the template set sequence database

In order to remove false positives, potential distant spliced reads in Step 1 were re-tested using GSNAP parameters that favor local alignment. Each alignment from the GSNAP re-run was examined, any reads meeting all of the following criteria were considered false positive distant splicing reads in the original GSNAP output, and removed from further analyses: the total matched length in the local alignment was at least 44 bp with a gap alignment tolerance of 1 bp. Reads that successfully passed through this step were considered to include a distant spliced junction. Ion Torrent Suite software was used to generate FASTQ files in which the barcode adaptors and 39 end low quality sequences were removed as recommended. To recover read sequences longer than the desired 100 bp in a case of an expected amplicon of 126 bp, the end quality trimming was turned off for this design. All reads were mapped to the template set sequence database containing the fusion templates. For each of expected fusion amplicons in a given sample, the most abundant reads mapped to the fusion template was selected as the PCR amplicon. The sequence of this read was compared to the sequence of the expected amplicon. If the PCR amplicon matches the expected fusion amplicon, the fusion junction sequence is considered as confirmed. The underlying fusion transcript method is based on the detection of distant splicing within a single read feature as detected by the RNA-Seq aligner GSNAP. The utility of GSNAP for fusion transcript detection has been demonstrated in fusion transcript detection methods such as GSTRUCT-fusions and GFP. Both of these methods depend on GSNAP to provide fusion read candidates, and then apply a set of filtering modules to remove false positives in paired-end RNA-Seq datasets. To compensate for the short FFPE RNA length, we leveraged data from the two patient cohorts as shown in Figure 1A. The sample based strategy interrogates each RNA-Seq sample individually and nominates candidate fusion junctions for the following cohort based analysis, which confirms the presence of each fusion candidate in each individual sample across the whole cohort by examining read alignment and expression profiling evidence.

Leave a Reply