chipD user's guide

(last update: 2010/5/5)

chipD is a program that computes sets of oligonucleotide probes for genome-scale microarray applications. chipD can be used to design genome tiling arrays that are used for chromatin immuno-precipitation on a chip (ChIP-chip), or to design expression arrays that are used to monitor changes in transcript abundance.

This guide provides background information about the chipD algorithm and describes the functions of the optional input parameters.

Motivation and design goals
Algorithm and terminology
Examples
Option descriptions
Links
References
Appendix

1. Motivation and design goals

For experiments such as ChIP-chip that requires tiling arrays, two factors are particularly important. First, there should be no gaps in the sequence coverage, otherwise critical information about particular genomic regions could be missed. Second, all probes need to have identical or close to identical hybridization characteristics to obtain consistent data. These two imperatives can be hard to reconcile in some regions of the sequences, especially if they contain sequences of unusual composition, such as stretches of identical bases or stable secondary structures.

The chipD program has been created in an effort to obtain a chip design which offers complete and uniform sequence coverage of a small genome such as bacterial and yeast genomes. Candidate probes are scored according to three criteria: melting temperature, number of targets in the genome and sequence complexity. Then, instead of defining an arbitrary score threshold, the probes are ranked according to their scores and the best probes are selected in an iterative fashion until complete coverage is achieved. In this way, no genomic regions are left unrepresented while picking the best possible probes.

2. Algorithm and terminology

contig

A contig refers to single contiguous stretch of DNA, usually the entire sequence of a specific plasmid or chromosome. One or more contigs are read by chipD from a FASTA file. Individual contigs are indicated in the FASTA file using a special header line for each, which must have a "greater than" symbol (>) as the first character.

>Contig1
GTCGTACGTAGAT...

To design probes for a tiling array, the full sequences of all the contigs should be used as an input for chipD. The sequences may be pre-processed by the user prior to submission to mask repeated sequences or regions irrelevant to the study.

To design probes for expression microarrays, the FASTA file should contain only the coding strand sequences of the genes that are being targeted. Each gene sequence should be treated as a contig with an unique identifier preceded by the ">" symbol in the FASTA file.

>Locus1
ATGAGATACACAGT...
>Locus2
ATGATATGTCTGAT...

Due to memory limitations, the chipD server cannot handle sequence files larger than around 8 megabases. However, users can partition the sequences, submit portions in multiple instances to the server, and concatenate the resulting lists of probes.

ShortOligo

The term ShortOligo will be used in this document to refer to a short oligomer of DNA consisting of 15 base pairs occurring in input sequences. The characteristics of the ShortOligos will be used to determine the overall score of the probes.

Scoring ShortOligos

The overall score for each ShortOligo is obtained by summing the following 2 parameters:

Frequency: all contigs from the FASTA file are scanned and a global count (based on all contigs) is made of each unique ShortOligo and its complement. This count is then used to define a frequency score for each ShortOligo.

Complexity: a complexity score is calculated for each ShortOligo sequence based on the information content of the sequence. For example, the sequence AAAAAGGGGGCCCCC has less information than AAGTGATTAGCGTCA, and is thus said to be less complex.

LongOligo

The term LongOligo will be used in this document to refer to any oligomer that is composed of multiple overlapping ShortOligos. LongOligos are used by the program to determine the set of candidate probes entering the final selection step.

At each position in the contigs, the program extracts sequences within the range of permissible lengths (set between the parameter values 40 and 70 bases by default) and calculates their hybridization characteristics. Only the best LongOligo for each position is added to the list of candidate probes.

Scoring LongOligos

[THIS SECTION IS CURRENTLY BEING REVISED.]

The overall score for each LongOligo is obtained by adding the following 4 parameters:

Target Length: the chipD parameter Ideal Probe Length specifies the length given the best score. Any probes with lengths greater or less than the ideal have their score linearly down-weighted.

Target Melting Temperature: this scoring feature measures how far the calculated melting temperature deviates from the target melting temperature. The square or the cube of the distance is evaluated if the calculated melting temperature is above or below the target temperature, respectively. Three different models for calculating Tm, the melting temperature, are optionally available in chipD.

Model 1 is based on the work of SantaLucia[1] and uses a set of parameters for estimating the total deltaS and deltaH of a perfectly complemented strands of helical DNA (no base pair mis-matches). This model is in the class of so-called "nearest neighbor" models. The value of deltaS is adjusted for the concentration of sodium ion and the concentration of excess DNA. The ratio of deltaH over deltaH then gives an estimate of Tm. This value is known to give good values for short DNA oligomers less than thirty or so base pairs in length. Note, however, that the hypridization process on the surface of the chips may have rather different behavior than the free solution thermodynamics upon which the model parameters are based. The goal in using any temperature model for scoring probes in chipD is not so much as to obtain accurate predictions of Tm, but rather to determine sets of probes which are likely to have similar properties.

Model 2, from the work of Wetmur[2], uses the following formula:

Tm = 81.5 + 16.6*(log10([Na+]/(1 + 0.7[Na+])) + (0.41*(%GC) - 500)/(probe length)

This model works best for longer oligomers in the range of 50 base pairs and up.

Model 3, a hybrid model, which starts with Model 1 for small probes, and at 43 bp smoothly switches over to model 2. At 60 bp and up, only Model 2 is used.

For a graphical view of the Tm models, please see Appendix A.

Cycles: the number of cycles necessary to synthesize each LongOligo is calculated according to Nimblegen specifications (bases are added in this order: A, C, G, T) and LongOligos requiring more than the set limit (148 cycles by default) are discarded.

ShortOligos: the scores of all the ShortOligos that compose a LongOligo are summed according to a weight function that gives more weight to the ShortOligo at the center of the LongOligo sequence.

Selecting the final list of probes

Once the best LongOligo has been determined for each position in the sequences, the list of candidate probes is ranked according to their scores and the iterative selection process begins. The best scoring probe is selected and the neighboring probes, according to the interval specifications, are removed from the list. The next best probe is selected and the process continues until the list is depleted. This process ensures that all regions of the contigs are represented by the best possible probes.

The interval is specified either by the user or calculated according to the total length of the contigs divided by the maximum number of probes that can be synthesized on an array.

Reverse complement strand

For the design of tiling arrays only, every other probe relative to its location on the sequences is transformed to its reverse complement. Therefore, both strand of the DNA are represented by probes on the array.

For the design of expression array, no transformation is done so the probes remain strand specific.

Top

3. Examples

Tiling array

Input FASTA file:

>contig1
GAACTGTCGCCTCTTCCTGTCGGGACAATGGAGGATCGGCGGCATGGGATGGGTGCTGAT
GAGCGAGCGCGAACTGAACCGCATCGAGATCCTGTCGAAGGTGCTCGATCGGAGGATGAC
GAGCCGCAACCCACGGCGCCGCCCAATGCAATCCGCGCCCGCCTCCATGCAACATAACTA
TCCTTATCCGTTCTGTCGGTGTAAGCGCAAAGTAGAATTGTCGCATCCAAGCAAAGTAAT
CAACTTGAGAGTTTGATCCTGGCTCAGAATGAACGCTGGCGGCAGGCCTAACACATGCAA
GTCGAGCGAAGTCTTCGGACTTAGCGGCGGACGGGTGAGTAACGCGTGGGAACGTGCCCT
GTAACTTGGCACATGGACAGAAAGACCTCGGGCGATGCCCGAGGCAGATGTGCGAAGGTT
CGACGTCAAGGACAGCGCTTCGGCGCTTT

Options:

Design Type: Genome
Job Name: Demo_Tiling_Job
Chip ID: TestCase
Maximum Number of Probes on Chip: 100
Interval Size: 10
Melting Temperature Model Type: Hybrid 3
Sodium Ion Concentration: 0.10
Target Melting Temperature: E
Minimum Melting Temperature Offset: 5.0
Ideal Probe Length: 50
Minimum Probe Length: 40
Maximum Probe Length: 70
Maximum Cycles to Print a Probe: 148
Maximum Consecutive Ambiguities: 3

Statistics output:

Estimating target melting temperature...
ESTIMATING Tm, Total Number bp: 449.0
contig 1  Number bp: 449 NumRandomSamps: 1000
        Tm estimate for this contig: 81.07113891257697
FINAL Tm estimate using all contigs: 81.07113891257697
Finished........ Target melting temperature set to: 81.071
Tm minimum offset: 5.0
Tm minimum: 76.07113891257697

Probe statistics...
Probe Length:
        mean:   50.029
        stdDev: 4.618
        Cv:     0.092
Probe Melting Temp:
        mean:   80.619
        stdDev: 2.159
        Cv:     0.027
Probe Score:
        mean:   25.011
        stdDev: 51.312
        Cv:     2.052

Number of probes used:         34
MAX  Number of probes:        100
Percent Chip Utilized:     34.000

Probe list output:

PROBE_ID        CHROMOSOME      POSITION        PROBE_SEQUENCE  SENSE   LENGTH  TM      SCORE
TESTCASE_F000000        contig1 1       GAACTGTCGCCTCTTCCTGTCGGGACAATGGAGGATCGGCGGCATGGGAT      +       50      82.064  10.513
TESTCASE_R000001        contig1 12      CCATCCCATGCCGCCGATCCTCCATTGTCCCGACAGGAAGA       -       41      81.838  20.625
TESTCASE_F000002        contig1 24      GACAATGGAGGATCGGCGGCATGGGATGGGTGCTGATGAGCGAGCGCGAACTGAA +       55      81.258  28.569
TESTCASE_R000003        contig1 35      ATCTCGATGCGGTTCAGTTCGCGCTCGCTCATCAGCACCCATCCCATGCCGCCGAT        -       56      81.484  24.686
TESTCASE_F000004        contig1 46      GGGATGGGTGCTGATGAGCGAGCGCGAACTGAACCGCATCGAGATCCTGT      +       50      82.200  5.547
TESTCASE_R000005        contig1 57      GAGCACCTTCGACAGGATCTCGATGCGGTTCAGTTCGCGCTCGCTCATCA      -       50      81.185  0.013
TESTCASE_F000006        contig1 76      GAACCGCATCGAGATCCTGTCGAAGGTGCTCGATCGGAGGATGACGAGCC      +       50      81.188  0.014
TESTCASE_R000007        contig1 86      GTGGGTTGCGGCTCGTCATCCTCCGATCGAGCACCTTCGACAGGATCTC       -       49      81.841  3.037
TESTCASE_F000008        contig1 96      CGAAGGTGCTCGATCGGAGGATGACGAGCCGCAACCCACG        +       40      82.603  15.004
TESTCASE_R000009        contig1 116     TTATGTTGCATGGAGGCGGGCGCGGATTGCATTGGGCGGCGCCGTGGGTTGCGGCTCGTCAT  -       62      82.799  32.398
TESTCASE_F000010        contig1 130     CCCACGGCGCCGCCCAATGCAATCCGCGCCCGCCTCCATGCAACATAACTATCCTTA       +       57      81.530  20.857
TESTCASE_R000011        contig1 141     CGGATAAGGATAGTTATGTTGCATGGAGGCGGGCGCGGATTGCATTGGGC      -       50      80.374  9.024
TESTCASE_F000012        contig1 153     CCGCGCCCGCCTCCATGCAACATAACTATCCTTATCCGTTCTGTCGGTGT      +       50      80.412  3.330
TESTCASE_R000013        contig1 164     TGCGCTTACACCGACAGAACGGATAAGGATAGTTATGTTGCATGGA  -       46      76.132  28.396
TESTCASE_F000014        contig1 176     AACTATCCTTATCCGTTCTGTCGGTGTAAGCGCAAAGTAGAATTGTCGCA      +       50      75.439  178.627
TESTCASE_R000015        contig1 187     TGCTTGGATGCGACAATTCTACTTTGCGCTTACACCGACAGAACGGA -       47      78.233  11.056
TESTCASE_F000016        contig1 197     CGGTGTAAGCGCAAAGTAGAATTGTCGCATCCAAGCAAAGT       +       41      76.286  31.898
TESTCASE_R000017        contig1 216     AGCCAGGATCAAACTCTCAAGTTGATTACTTTGCTTGGATGCGACAATT       -       49      74.652  265.546
TESTCASE_F000018        contig1 227     CCAAGCAAAGTAATCAACTTGAGAGTTTGATCCTGGCTCAGAATGAACGCTGGCGGCAGGCC  +       62      77.509  24.689
TESTCASE_R000019        contig1 240     AGGCCTGCCGCCAGCGTTCATTCTGAGCCAGGATCAAACTCTCAAGTTGA      -       50      80.249  0.676
TESTCASE_F000020        contig1 255     GATCCTGGCTCAGAATGAACGCTGGCGGCAGGCCTAACACATGCAAGTCG      +       50      81.147  0.006
TESTCASE_R000021        contig1 268     CGAAGACTTCGCTCGACTTGCATGTGTTAGGCCTGCCGCCAGCGTTCATT      -       50      80.860  0.045
TESTCASE_F000022        contig1 284     AGGCCTAACACATGCAAGTCGAGCGAAGTCTTCGGACTTAGCGGCGGACG      +       50      81.208  0.019
TESTCASE_R000023        contig1 294     GTTACTCACCCGTCCGCCGCTAAGTCCGAAGACTTCGCTCGACTTGCATG      -       50      80.614  4.694
TESTCASE_F000024        contig1 304     GAGCGAAGTCTTCGGACTTAGCGGCGGACGGGTGAGTAACGCGTGGGAAC      +       50      82.344  11.296
TESTCASE_R000025        contig1 314     TTACAGGGCACGTTCCCACGCGTTACTCACCCGTCCGCCGCTAAGTCCGAA     -       51      82.752  21.169
TESTCASE_F000026        contig1 328     CGGACGGGTGAGTAACGCGTGGGAACGTGCCCTGTAACTTGGCACATGGA      +       50      82.298  20.224
TESTCASE_R000027        contig1 339     AGGTCTTTCTGTCCATGTGCCAAGTTACAGGGCACGTTCCCACGCGTTAC      -       50      79.722  14.643
TESTCASE_F000028        contig1 350     GAACGTGCCCTGTAACTTGGCACATGGACAGAAAGACCTCGGGCGATGCC      +       50      81.067  7.887
TESTCASE_R000029        contig1 360     ATCTGCCTCGGGCATCGCCCGAGGTCTTTCTGTCCATGTGCCAAGTTACA      -       50      80.971  12.661
TESTCASE_F000030        contig1 370     CACATGGACAGAAAGACCTCGGGCGATGCCCGAGGCAGATGTGCGAAGGTT     +       51      81.678  16.956
TESTCASE_R000031        contig1 381     CTTGACGTCGAACCTTCGCACATCTGCCTCGGGCATCGCCCGAGGTCTTT      -       50      82.276  14.103
TESTCASE_F000032        contig1 394     GATGCCCGAGGCAGATGTGCGAAGGTTCGACGTCAAGGACAGCGCTTCGG      +       50      82.723  5.117
TESTCASE_R000033        contig1 405     AAGCGCCGAAGCGCTGTCCTTGACGTCGAACCTTCGCACATCTG    -       44      82.102  7.063

The probe list is given in a tab-delimited text file with one line per probe. Each probe recieves an unique ID that indicates which strand of the DNA it represents ('F' forward, 'R' reverse strand), column 5 also indicates the direction.

Expression array

Input file:

>locus1
GAACTGTCGCCTCTTCCTGTCGGGACAATGGAGGATCGGCGGCATGGGATGGGTGCTGAT
TATGCAGATCAGACGACTCGAGCATCTGAGCTCAGGCAGTACTCAGAGGCATCTCATGAG
GACTTAGAGCGCAGAGGCGCGTCTATTAGCGAGACGGCAGATCTTATCTAGAGCGACTAT
TAGCAGACGGATCTTATATCGCGCGGGCGGCATTATATTATGCGATCATGCAGACTCAGC
>locus2
GAGCGAGCGCGAACTGAACCGCATCGAGATCCTGTCGAAGGTGCTCGATCGGAGGATGAC
GAGCCGCAACCCACGGCGCCGCCCAATGCAATCCGCGCCCGCCTCCATGCAACATAACTA
GTCAGCATCATCAGCAGCTATCATCATCATGCAGTCATCAGCGAGCAGTGACGCGTAGCG
>locus3
TCCTTATCCGTTCTGTCGGTGTAAGCGCAAAGTAGAATTGTCGCATCCAAGCAAAGTAAT
CATCGATGCATGCTGCTGATCGTACGTGCTCGATGCTAGCTGTGCTGATGATCGTAGCTG
ACTGATGCTAGCTGATGTCGCTGCTGATCGTAGCTGATGTGCTGACTGATCGTGATCGTA
>locus4
CAACTTGAGAGTTTGATCCTGGCTCAGAATGAACGCTGGCGGCAGGCCTAACACATGCAA
GTCGAGCGAAGTCTTCGGACTTAGCGGCGGACGGGTGAGTAACGCGTGGGAACGTGCCCT
GTAACTTGGCACATGGACAGAAAGACCTCGGGCGATGCCCGAGGCAGATGTGCGAAGGTT
CGACGTCAAGGACAGCGCTTCGGCGCTTT

Options:

Job Name: Demo_Expression_Job
Chip ID: TestCase
Design Type: expression
Maximum Number of Probes on Chip: 20
Replicate Number: 1
Offset Adjustment: 0
Melting Temperature Model Type: Hybrid 3
Sodium Ion Concentration: 0.10
Target Melting Temperature: E
Minimum Melting Temperature Offset: 5.0
Ideal Probe Length: 50
Minimum Probe Length: 40
Maximum Probe Length: 70
Maximum Cycles to Print a Probe: 148
Maximum Consecutive Ambiguities: 3

Statistics output:

Estimating target melting temperature...
ESTIMATING Tm, Total Number bp: 809.0
FINAL Tm estimate using all contigs: 80.10455328555695
Finished........ Target melting temperature set to: 80.105
Tm minimum offset: 5.0
Tm minimum: 75.10455328555695
Number of replicates manually set to 1
Target number of probes per contig: 5
Using spacer offset of 0 to calculate the spacer for each contig.
Determining the spacer for each contig.......done!
Average spacer size: 40.25

Not reversing any probes. Don't need to do this for expression arrays.
Average number of probes per contig : 3.750
Max number of probes: 4
Min number of probes: 3
Transcripts with fewer than 6 probes:
locus1   240 bases) - spacer: 48 - number of probes: 4
locus2   180 bases) - spacer: 36 - number of probes: 4
locus3   180 bases) - spacer: 36 - number of probes: 3
locus4   209 bases) - spacer: 41 - number of probes: 4

All contigs have at least one probeCalculating probe statistics.....
Finished.........................Wed May 05 14:30:47 CDT 2010
Probe statistics...
Probe Length:
        mean:   50.200
        stdDev: 3.331
        Cv:     0.066
Probe Melting Temp:
        mean:   80.191
        stdDev: 1.276
        Cv:     0.016
Probe Score:
        mean:   6.197
        stdDev: 9.048
        Cv:     1.460

Number of replicates:        1
Number of unique probes used:         15
Number of total probes used:          15
MAX  Number of probes:                20
Percent Chip Utilized:        75.000

Probe list output:

PROBE_ID        CHROMOSOME      POSITION        PROBE_SEQUENCE  SENSE   LENGTH  TM      SCORE
TESTCASE_F000000        locus1  1       GAACTGTCGCCTCTTCCTGTCGGGACAATGGAGGATCGGCGGCATGGGAT      +       50      82.064  13.366
TESTCASE_F000001        locus1  62      ATGCAGATCAGACGACTCGAGCATCTGAGCTCAGGCAGTACTCAGAGGCA      +       50      79.382  0.522
TESTCASE_F000002        locus1  112     TCTCATGAGGACTTAGAGCGCAGAGGCGCGTCTATTAGCGAGACGGCAGA      +       50      80.210  0.011
TESTCASE_F000003        locus1  166     ATCTAGAGCGACTATTAGCAGACGGATCTTATATCGCGCGGGCGGCA +       47      79.112  7.160
TESTCASE_F000004        locus4  12      TTTGATCCTGGCTCAGAATGAACGCTGGCGGCAGGCCTAACACATGCAAG      +       50      80.076  0.001
TESTCASE_F000005        locus4  54      CATGCAAGTCGAGCGAAGTCTTCGGACTTAGCGGCGGACGGGTGAGTAAC      +       50      80.614  4.745
TESTCASE_F000006        locus4  110     GAACGTGCCCTGTAACTTGGCACATGGACAGAAAGACCTCGGGCGATGCC      +       50      81.067  8.813
TESTCASE_F000007        locus4  154     GATGCCCGAGGCAGATGTGCGAAGGTTCGACGTCAAGGACAGCGCTTC        +       48      82.116  8.580
TESTCASE_F000008        locus2  14      CTGAACCGCATCGAGATCCTGTCGAAGGTGCTCGATCGGAGGATGACGAG      +       50      80.164  0.004
TESTCASE_F000009        locus2  56      ATGACGAGCCGCAACCCACGGCGCCGCCCAATGCAATCCGCGCCCGCCTCCATGCAACATAA  +       62      82.799  36.674
TESTCASE_F000010        locus2  94      CGCGCCCGCCTCCATGCAACATAACTAGTCAGCATCATCAGCAGCTATCA      +       50      79.660  2.586
TESTCASE_F000011        locus2  131     TCAGCAGCTATCATCATCATGCAGTCATCAGCGAGCAGTGACGCGTAGC       +       49      79.147  1.916
TESTCASE_F000012        locus3  7       TCCGTTCTGTCGGTGTAAGCGCAAAGTAGAATTGTCGCATCCAAGCA +       47      78.233  6.503
TESTCASE_F000013        locus3  71      TGCTGCTGATCGTACGTGCTCGATGCTAGCTGTGCTGATGATCGTAGCTG      +       50      79.317  0.620
TESTCASE_F000014        locus3  117     GCTGACTGATGCTAGCTGATGTCGCTGCTGATCGTAGCTGATGTGCTGAC      +       50      78.901  1.449

Top

4. Option descriptions

The information enclosed in square brackets refers to the command line arguments for the standalone program version of chipD. If you are using the Web-based version, please ignore this information.

Type of Chip [tiling chip is default, use flag --exp_array for expression chip]

Expression: If you want to design strand specific probes, choose 'Expression Chip'.
Tiling: If you want to design probes that tile both DNA strands, choose 'Genome Chip'.

Job Name

Pick a name for your job which is meaningful to you. This name will be appear in the job queue listing of your jobs.

Chip ID [ -D ]

Name of the array design, also used as a prefix for all probe ID's written to the probe file. The ID should be no more than eight characters in length.

Maximum Number of Probes on Chip [ -N ]

This parameter will depend on the maximum capacity of the vendor's chip. It is also used to determine the optimal spacing between consecutive probes.

Interval Size (for Genome chip designs ONLY) [ -s ]

Sets the desired spacing between consecutive probes. If left blank the program will determine optimal spacing using the total length of the sequences and the maximum number of probes.

Replicate Number (for Expression chip designs ONLY) [ -r ]

Default value is 1. Increasing this number to N will result in N copies of the base set of probes being added to the chip design. Vendor may offer this option already so you may want to use default.

Offset Adjustment (for Expression chip designs ONLY) [ -c ]

Default value is 0. The spacer size may be increased or decreased by using a negative or positive value, respectively.

Melting Temperature Model Type [ -Tm ]

An integer value being 1, 2 or 3 used to specify which melting temperature model to use. For short probes, when the Ideal Probe Length is around 25, using model 1 is best. For longer target probes, around 50, model 2 is best. For intermediate cases, model 3 is suggested since a weighted average of model 1 and model 2 is calculated. The default value is model 3.

1: Nearest-Neighbor interactions model, best for short probes ( Nbp<50 )
2: GC content model, best for long probes ( Nbp>=60 )
3: Hybrid, if Nbp<43 use Model 1; if Nbp>59 use Model2; else use wtd avg of both.

Sodium Ion Concentration for Melting Temperature Model [-tN ]

Both melting temperature models use the sodium ion concentration. The default value is set to 0.10 M.

Target Melting Temperature [ -t ]

Sets the desired target melting temperature for the probes. If set to 'E', the program will estimate an optimal target temperature by sampling one thousand oligomers of length Ideal Probe Length randomly throughout each contig of the input sequences. The target melting temperature is then set to the average Tm of this sample.

Minimum Melting Temperature Offset [ -to ]

Sets the desired minimum probe melting temperature relative to the value of target melting temperature. The magnitude of the offset is subtracted from the target melting temperature to obtain the minimum probe melting temperaturem. The minimum probe melting temperaturem is not a 'hard' treshold, but the score penalty increases if probes reach this minimum. The default offset is 5 degrees.

Ideal Probe Length [ -Li ]

Probes closer to this length are weighted more favorably in the scoring.

Minimum Probe Length [ -Lm ]

Sets the minimum number of bases per probe. The default value is 40. This value must not be less than 15 since that is the size of the short oligos used as primitive elements in building the overall scores.

Maximum Probe Length [ -Lx ]

Sets the maximum number of bases per probe. The dedault value is 70. Note that the vendor specific maximum number of cycles allowed for the synthesis of probes may also act as a sequence dependent limit on the largest probe size.

Maximum Cycles to Synthesize a Probe [ -C ]

Sets the maximum number of cycles allowed for the synthesis of probes as defined by the chip manufacturer. For Nimblegen the bases are added in this order: A, C, G ,T with a maximum of 148 cycles allowed. This is the default value.

Maximum Consecutive Ambiguities [ -a ]

If your sequence contains a stretch of no more than N ambiguous bases, they will be replaced randomly by one the four bases. Otherwise, no probe will be designed over the ambiguous sequence.

Top

5. Links

NCBI Bacterial Genomes: NCBI

Getting Started in Tiling Microarray Analysis, Liu XS, 2007, PLoS Comput Biol 3(10). (View)

BACTER Institute contributors to server edition of chipD

Top

The design strategy, the initial implementation in Perl, and the actual use of a resulting chip design in a study on Rhodobacter sphaeroides are due to the efforts of Yann Dufour in collaboration with Tim Donohue [3]. The code was ported to JAVA by Andrew Tritt, who also added the expression array functionality. The server version of chipD was inspired by Julie Mitchell, who also provided invaluable guidance in its development. Improvements to the JAVA code and server scripts were done by Gary Wesenberg. Special thanks to Madeline Fisher for improving the use of language in this document and assisting in other aspects of web page design.

Top

6. References

1. John SantaLucia, Jr. (1998) A unified view of polymer, dumbbell, and oligonucleotide DNA nearest-neighbor, Proc Natl Acad Sci U S A., 95, 1460-1465. (Full Text)

2. James G. Wetmur (1991) DNA Probes: Applications of the Principles of Nucleic Acid Hybridization, Critical Reviews in Biochemistry and Molecular Biology, 26, 227-259.

3. Yann S. Dufour, Robert Landick, Timothy J. Donohue (2008) Organization and Evolution of the Biological Response to Singlet Oxygen Stress, Journal of Molecular Biology, 383, 713-730. (View)

4. Bolton ET, McCarthy BJ. (1962) A general method for the isolation of RNA complementary to DNA, Proc Natl Acad Sci U S A. 48, 1390-7. (View)

5. Yann S. Dufour, Gary E. Wesenberg, Andrew J. Tritt, Jeremy D. Glasner, Nicole T. Perna, Julie C. Mitchell, Timothy J. Donohue (2010) chipD: a web tool to design oligonucleotide probes for high-density tiling arrays, Nucleic Acids Research 38 W321-W325, (doi:10.1093/nar/gkq517), (View)

Top

7. Appendix

Appendix A. Graphical exploration of the melting temperature models used in chipD

The plots below compare the three Tm models available in chipD. The x-axis is the number of nucleotides and the y-axis is Tm. Model 1 (NN) is plotted as a red curve and Model 2 (%GC) as a blue curve. Points from Model 3 (Hybrid) aew plotted as black circles. The effect of salt concentration is demonstrated. Additionally, the role of nucleotide composition is examined using oligomer models generated by concatenating dinucleotide pairs. The same dinucleotide pair is successively repeated to obtain all even numbered lengths within the plot domain. For example, the 20-mer in a CA repeat would be: CACACACACACACACACACA. Plots for all possible pairs are shown to facilitate comparisons, though clearly the even-lengthed reverse-complements of the repeats used here will have the same plots (in odd length cases the initiation terms used in Model 1 may result in a minor asymmetry).

Case 1A: chipD default settings: [Na+] = 0.10 M, [DNA excess] = 0.0001 M.
Model 1 (NN) is plotted as a red curve and Model 2 (%GC) as a blue curve. Points from Model 3 (Hybrid) plotted as black circles.

Case 1B: Closer view, chipD default settings: [Na+] = 0.10 M, [DNA excess] = 0.0001 M.
Model 1 (NN) is plotted as a red curve and Model 2 (%GC) as a blue curve. Points from Model 3 (Hybrid) plotted as black circles.

Case 2: Higher salt: [Na+] = 0.50 M, [DNA excess] = 0.0001 M.
Model 1 (NN) is plotted as a red curve and Model 2 (%GC) as a blue curve. Points from Model 3 (Hybrid) plotted as black circles.

Case 3: Very high salt: [Na+] = 1.0 M, [DNA excess] = 0.0001 M
Model 1 (NN) is plotted as a red curve and Model 2 (%GC) as a blue curve. Points from Model 3 (Hybrid) plotted as black circles.

Disclaimer

Read Disclaimer

UW BACTER Institute - chipD Server