OP-HELICOS-CAGE-Filtering-v1.0
From FANTOM5_SSTAR
Protocol: OP-HELICOS-CAGE-Filtering-v1.0
Author: Katayama, Shintaro
Created: July 8, 2010
Updated: July 16, 2010
Parameters:
Description:
FilterSMS in helisphere-0.14.a015 package is used to filter out artificial or short/long reads using the following options. In short, all raw reads were filtered according to the following criteria by filterSMS in helisphere-0.14.a015 package; (1) read length is 20~70-nt, (2) AT content <= 90% and fraction of CT/TA/AG/GA dinucleotides <= 80%, (3) the longest prefix consisting of < 75% T, and (4) non similality to the base-addition-order sequence (BAO) and to some oligonucleotides added in the wet experiments and in the sequencing. In the last rule, any read with an alignment score (=(5m-4e)/l, where m is the number of matches, e is the number of errors of any type, and l is the read length) >= 3.5 was removed.
-- FilterSMS option begin ------ --minlen 20 \ --maxlen 70 \ --quality 10 \ --dinuc ${DINUC} \ --trim_hp T/H/1/0.75 \ --align ${BAOCNT} \ --minscore 3.5 \ --percent_error 30 \ --config_file ${CONF} -- FilterSMS option end ------
-- DINUC file begin ------ Filter AA AC AG AT CA CC CG CT GA GC GG GT TA TC TG TT Thresh BAO 0 0 1 0 0 0 0 1 0 1 0 0 1 0 0 0 0.80 AT 1 0.5 0.5 1 0.5 0 0 0.5 0.5 0 0 0.5 1 0.5 0.5 1 0.90 -- DINUC file end ------
-- BAOCNT file begin ------ >BASE_ADDTION_ORDER_REFERENCE CTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAGCTAG >dge_spike_low AGATGCATCAATGGCGACACTGAGAGGCTGATGAGCCAATGCCTTCAAGAGACTCTTCTC ATCATTAGTAGGTACGTCTTGGTGTCCATTAATGGTTACT >dge_spike_medium CTGACGGTCCACAGAAGTTTGAGCCTGACTCTTGAGTGTTGTGAGACCGTTGCAGCAGAG GGTTGGGACCGGTCCGCCCTGAGTCACGTAGGATAAGCAA >dge_spike_high TCTCCTGCGTTTCCACTCTCAAGCTCTCCAGCACTCATCATGATTGGGTTGATACCCATC TTGGCCATGACAAGCTCACACTGGAAGGATTTACCTTGAC >dge_tailing_A CAGGGCAGAGGATGGATGCAAGGATAAGTGGA >dge_tailing_B GACACTCACTTCTTACGACTCAGCGATGATGG >dge_tailing_C TTAGCCAACCGCGGACAGCTACATGGACTTCT -- BAOCNT file end ------
-- CONF file begin ------ LocalGlobalOption GL HomoPolymerOptionInReference 0 TagNumReads 1 HomoPolymerOptionInTag 0 -- CONF file end ------
-- SCORE file begin ------ ReferenceNonHomoPolymerGap -4 ReferenceHomoPolymerGap -1 TagNonHomoPolymerGap -4 TagHomoPolymerGap -1 NucleotideMatch 5 NucleotideMismatch -4 NucleotideToNMatch -2 -- SCORE file end ------