Team:Heidelberg/Modeling/descriptions

From 2010.igem.org

Revision as of 19:19, 26 October 2010 by Janulrich (Talk | contribs)

Modeling Training Set

Parameterization Concept

One of the hardest tasks in the development of our models was to come up with good strategy to generate input parameters from the raw data. In our case, the raw data is the binding site sequence and the corresponding sh/miRNA-sequence. The final parameterization concept unites a basic distinction between perfect, bulged (near-perfect) and endogenous miRNA like BS, with the advanced 3'-scoring and AU-content evaluation. The endogenous miRNA like BS parameter is further split into the three [Modeling#shRNA_binding_sites|types of binding site seed].

Training Set Overview

mi/shRNA-name sequence BSsequence number of BS perfect? bulged? bulge size seed type 3'-score AU-score knockdown% experimental conditions
miR122_102 TGGAGTGTGACAATGGTGTT- TGT GACAAACACCATTGTCACAC- TCCA 1 1 0 0 0 0 0
miR122_106 TGGAGTGTGACAATGGTGTT- TGT ACAAACACCATGAAGACACT- CCA 1 0 1 4 3 7.5 0.624
miR122_134 TGGAGTGTGACAATGGTGTT- TGT ACAAACACCATACGGACACT- CCAGAGACACAAACACCA- TGAAGACACTCCA 2 0 1 4 3 7.5 0.576
miR122_136 TGGAGTGTGACAATGGTGTT- TGT ACAAACACCATACGGACACT- CCA 1 0 1 4 3 7.5 0.595
miR122_197 TGGAGTGTGACAATGGTGTT- TGT ACAAACACCATGTCGACACT- CCA 1 0 1 4 3 7.5 0.597
miR122_199 TGGAGTGTGACAATGGTGTT- TGT ACAAACACCATGCCAACACT- CCA 1 0 1 4 3 7.5 0.603
miR122_201 TGGAGTGTGACAATGGTGTT- TGT ACAAACACCATACGAACACT- CCA 1 0 1 4 3 7.5 0.624
miR122_203 TGGAGTGTGACAATGGTGTT- TGT ACAAACACCATGCAGACACT- CCA 1 0 1 4 3 7.5 0.6
miR122_277 TGGAGTGTGACAATGGTGTT- TGT ACAAACACCATGCCTACACT- CCA 1 0 1 4 3 7.5 0.603
miR122_138 TGGAGTGTGACAATGGTGTT- TGT GGCCAGCACCATTTCACACA- CACTCCTTCTAGAGGCCGCT- GGC 1 0 0 0 2 5 0.336
miR122_140 TGGAGTGTGACAATGGTGTT- TGT GCCCCTGATGGGGGCGACAC- TCCATCTAGAGGCCGCTGGC 1 0 0 0 3 1.5 0.327
miR122_142 TGGAGTGTGACAATGGTGTT- TGT GACTAAGGCTGCTCCATCAA- CACTCCATCTAGAGGCCGCT- GGC 1 0 0 0 3 4 0.314
miR122_144 TGGAGTGTGACAATGGTGTT- TGT GCAATGGAGAGTCACCTAGA- CACTCCATCTAGAGGCCGCT- GGC 1 0 0 0 3 2.5 0.314
miR122_146 TGGAGTGTGACAATGGTGTT- TGT GACTTGAGCAGAACAAACAC- TCCATCTAGAGGCCGCTGGC 1 0 0 0 3 2 0.327
miR122_148 TGGAGTGTGACAATGGTGTT- TGT GCAAATCATGATCAAAAACA- CTCCCTCTAGAGGCCGCTGG- C 1 0 0 0 2 2.5 0.221
sAg_19_bs_r10_12_acg_fw GAACAAATGGCACTAGTAA TTACTAGACGCATTTGTTC 1 0 1 3 2 5.5 0.442
sAg_19_bs_r10_12_taa_fw GAACAAATGGCACTAGTAA TTACTAGTAACATTTGTTC 1 0 1 2 2 6 0.492
sAg_19_bs_r9_12_acgg_fw GAACAAATGGCACTAGTAA TTACTAGACGGATTTGTTC 1 0 1 4 2 5.5 0.442
sAg_19_bs_r9_12_atgt_fw GAACAAATGGCACTAGTAA TTACTAGATGTATTTGTTC 1 0 1 4 2 5.5 0.495
sAg_19_bs_m10cg_fw GAACAAATGGCACTAGTAA TTACTAGTGGCATTTGTTC 1 0 1 1 2 6.5 0.442
sAg_19_bs_m10ca_fw GAACAAATGGCACTAGTAA TTACTAGTGACATTTGTTC 1 0 1 1 2 6.5 0.496
sAg_19_bs_m11gc_fw GAACAAATGGCACTAGTAA TTACTAGTCCCATTTGTTC 1 0 1 1 2 6 0.442
sAg_19_bs_m11ga_fw GAACAAATGGCACTAGTAA TTACTAGTACCATTTGTTC 1 0 1 1 2 6 0.466
sAg_19_bs_onlyseed_fw_E GAACAAATGGCACTAGTAA AATGATCACGGATTTGTTC 1 0 0 0 2 0 0.442
sAg_19_bs_p_fw_E GAACAAATGGCACTAGTAA TTACTAGTGCCATTTGTTC 1 1 0 0 0 0 0
sag_25_1 GAACAAATGGCACTAGTAAA- CTGAG ATAATTTGTTCATTTGTTC 2 0 0 0 2 1.5 0.491
sag_25_2 GAACAAATGGCACTAGTAAA- CTGAG ATAATTTGTTCATTTGTTCA- TTTGTTC 3 0 0 0 2 1.5 0.491
sag_25_3 GAACAAATGGCACTAGTAAA- CTGAG AGTTTACTAGTGCCATTTGT- TCAAAUAUAGCC 1 1 0 0 0 0 0
sag_25_4 GAACAAATGGCACTAGTAAA- CTGAG AGTTTACTAGTGCAATTTGT- TAAAAUUUAGCC 1 0 1 1 3 8.5 0.587
sag_25_5 GAACAAATGGCACTAGTAAA- CTGAG AGTTTACTAGTGAAATTTGT- TAAAAUUUAGCC 1 0 1 2 3 8 0.587
sag_25_6 GAACAAATGGCACTAGTAAA- CTGAG AGTTTACTAGTAAAATTTGT- TAAAAUUUAGCC 1 0 1 3 3 7.5 0.587
sag_25_7 GAACAAATGGCACTAGTAAA- CTGAG AGTTTACTAGAAAAATTTGT- TAAAAUUUAGCC 1 0 1 4 3 7 0.587
sag_25_8 GAACAAATGGCACTAGTAAA- CTGAG CTGGGCAATTATAAATTTGT- TAAAAUUUAGCC 1 0 0 0 3 2 0.603
sag_25_9 GAACAAATGGCACTAGTAAA- CTGAG CTGGGCAGCCGCAAATTTGT- TAAAGGCCCGCC 1 0 0 0 3 2 0.305
sag_25_10 GAACAAATGGCACTAGTAAA- CTGAG CTGGGCAGCTATAATTTTG- TTAAAAUUUAGCC 1 0 0 0 1 2.5 0.69
sag_25_11 GAACAAATGGCACTAGTAAA- CTGAG AGTTTACGCCGTAAATTTG- TTGAAGGCCCGCC 1 0 0 0 2 4 0.226
sag_25_12 GAACAAATGGCACTAGTAAA- CTGAG CTGGGCAATTATAAATTTG- TTGAAAUUUAGCC 1 0 0 0 2 2 0.526
sag_25_13 GAACAAATGGCACTAGTAAA- CTGAG TCCTTACTAGTGCAATTTG- TTAAAGGCCCGCC 1 0 0 0 3 7 0.305
sag_25_14 GAACAAATGGCACTAGTAAA- CTGAG CTGAATATAGTGAAATTTG- TTAAAAUUUAGCC 1 0 0 0 3 4 0.603
sag_25_15 GAACAAATGGCACTAGTAAA- CTGAG AGTTTACTACCTAATTTTG- TTAAAAUCCGGCC 1 0 0 0 1 4 0.497
sag_25_16 GAACAAATGGCACTAGTAAA- CTGAG CTGGGCCTAGTGGATTTTG- TTAAAGGCCCGCC 1 0 0 0 1 5 0.366
sag_25_17 GAACAAATGGCACTAGTAAA- CTGAG AGTTTACATTGCAAATTTG- TTGAAGGCCCGCC 1 0 0 0 2 4 0.226
sag_25_18 GAACAAATGGCACTAGTAAA- CTGAG CTGGGACTAGTGCAATTTG- TTGAAGGCCCGCC 1 0 0 0 2 6 0.226
sag_25_20 GAACAAATGGCACTAGTAAA- CTGAG AGTTTACTAGAAAATTTTG- TTAAAAUUUAGCC 1 0 0 0 1 7 0.69
sag_25_21 GAACAAATGGCACTAGTAAA- CTGAG AGTTTACTAGAAAAATTTG- TTGAAAUUUAGCC 1 0 0 0 2 7 0.526
sag_25_23 GAACAAATGGCACTAGTAAA- CTGAG CTGGGCATAGATAATTTTG- TTAAAAUUUAGCC 1 0 0 0 1 3 0.69
sag_25_24 GAACAAATGGCACTAGTAAA- CTGAG CTGGTACTAGCTAATTTTG- TTAAAAUCCGGCC 1 0 0 0 1 5 0.497
sag_25_25 GAACAAATGGCACTAGTAAA- CTGAG AGTTTACTAGCCGATTTTG- TTAAAGGCCCGCC 1 0 0 0 1 7 0.366
sag_25_26 GAACAAATGGCACTAGTAAA- CTGAG CTGGGCCTAGAAAAATTTG- TTGAAAUCCGGCC 1 0 0 0 2 4 0.348
sag_25_27 GAACAAATGGCACTAGTAAA- CTGAG CTTTTACTAGAAAAATTTG- TTGAAUUUAGCC 1 0 0 0 2 6 0.51
sag_25_28 GAACAAATGGCACTAGTAAA- CTGAG CTGGTACTAGGCAAATTTG- TTGAAGGCCCGCC 1 0 0 0 2 5 0.226
sag_25_29 GAACAAATGGCACTAGTAAA- CTGAG GCTTTACTAGAAAAATTTG- TTAAAAUUUAGCC 1 0 0 0 3 6 0.603
sag_25_30 GAACAAATGGCACTAGTAAA- CTGAG AGTTTACTTTAAAAATTTG- TTAAAAUUUAGCC 1 0 0 0 3 5 0.603
haat_bs_p_fw AAACATGCCTAAACGCTTC GAAGCGTTTAGGCATGTTT 1 1 0 0 0 0 0
haat_bs_r10_12_aat_fw AAACATGCCTAAACGCTTC GAAGCGTAATGGCATGTTT 1 0 1 3 2 5.5 0.799
haat_bs_r10_12_agc_fw AAACATGCCTAAACGCTTC GAAGCGTAGCGGCATGTTT 1 0 1 3 2 5.5 0.749
haat_bs_m10at_fw AAACATGCCTAAACGCTTC GAAGCGTTTTGGCATGTTT 1 0 1 1 2 6.5 0.799
haat_bs_m10ac_fw AAACATGCCTAAACGCTTC GAAGCGTTTCGGCATGTTT 1 0 1 1 2 6.5 0.773
haat_bs_m11ta_fw AAACATGCCTAAACGCTTC GAAGCGTTAAGGCATGTTT 1 0 1 1 2 1.5 0.38
haat_bs_m11tg_fw AAACATGCCTAAACGCTTC GAAGCGTTGAGGCATGTTT 1 0 1 1 2 1.5 0.38
haat_bs_r9_12_aatc_fw AAACATGCCTAAACGCTTC GAAGCGTAATCGCATGTTT 1 0 1 4 2 1.5 0.38
haat_bs_r9_12_agcc_fw AAACATGCCTAAACGCTTC GAAGCGTAGCCGCATGTTT 1 0 1 4 2 1.5 0.38
haat_bs_onlyseed_fw AAACATGCCTAAACGCTTC CTTCGCAAATCGCATGTTT 1 0 1 0 2 2 0.799