Category Archives: Important

Synthetic fragment troubleshooting

A crucial step in the next steps in my thesis is to be able to synthesize a 200bp fragment with a randomize section. This fragment is planned to be synthesized by using two oligos, 135 bp oligo and a 95 bp oligo.


The strategy to synthesize the fragment includes anneal and extend the oligos and finally amplify the fragment using two 15-16 bases primers.


Annealing and extension steps show a 200bp band, as expected, with a lot of crude showing as a tail in the agarose gel.  When I used this anneal/extension product as template for a PCR, two bands are visible in an agarose gel as well as a lot of smear, a 200 bp band and a 250 bp band.





‘Anneal oligo’ showed the gel migration of a anneal only product, ‘extension only’ shows migration of anneal/extended product. 1 to 20 are products of PCR amplification using the anneal oligos as template using 1,5,10 and 20 cycles.






Looking at this result, my next strategy was to first gel extract the 200 bp band of the  anneal/extended oligo products, and I used it as template for a PCR. Results showed that now the 200 bp band was stronger and the 200 bp band was fainter.














When I saw this result I decided to look closer at my oligo design. After closer analysis, I looked that a possible misspriming event was possible.


A hypothesis that I had at the momment was that maybe the PCR oligos were too small (15 – 16 bases) and they were mispriming somewhere in the template. I designed new primers 22 – 25 bases long that were more specific to the end of the annealed/extended fragment. However, amplification with this primers should a strong 250 bp band, with  very faint bands at higher molecular weight (e.g. 300, 350 bp)


This misspriming event was already described in past posts. I sequenced the 250 bp fragment to see if I can discover any evidence of misspriming, however, traces showed not any evidence of missprimming.  What they did showed was that traces were not very good. peaks were very close to each other, with shoulders overlapping, and in some cases, there were small peaks in the bottom.


My next strategy was to clone the PCR amplified 250 bp band and the 200 bp band into a pgem vector. Then I could sequence a clone with each band size using primers that would bind in the vector. This strategy would allow me to sequence the ends of the fragment accurately.

The entire fragment of the clone from the 250 bp band was sequenced, and the traces were very clear. This fragment was in reality 234 bp long and it had a duplication of the last 34 bases from the fragment right end.





Now that I know exactly why my synthetic fragment showed extra 34 bases, I only need to understand why it happened and how can I fix it.

The first idea that comes to my mind is a polymerase slippage caused by some type of hairpin structure in the template or the copied DNA strand. But I have no idea exactly of how this could be happening. I can only think of something like the polymerase slippage model that happens in minisatellite/microsatellite evolution


Sorry I could get a better figure!

My next idea was to test is there are places within the fragment where hairpins could happen. So, I took the right oligo and I run a harpin analysis using the oligoanalyzer program from IDT.

I took the hairpin structure with highest free energy, -15.03 ΔG(kcal.mole-1),   and  I saw several places that could potentially form hairpins within the right oligo.




Three potential places where hairpin could occur are shown. Place 1 hairpin happens in the illumina sequencing priming site (purple). If this hairpin would be happening then I would be in big trouble since I cannot change that region without changing the design entirely. The second place, the longest, happen within the extra bases of the oligo (dark green). This hairpin is key because is very close to the place that was duplicated  (pos 38). If this hairpin was the cause of the extra bases, then the solution would be to redesign only one oligo. The last hairpin is located in the overlapping region with the other oligo. If this hairpin were responsible of the  extra 34 bases, less likely since it is farther away, the solution would be to redesign both oligos.

Finally, I can try is to use a PCR additive that will reduce secondary structures in the template DNA. Biosizebio website describes two additives that I could use: DMSO and glycerol. However, even if these additives work, I might have to redesign at least one oligo since the hairpin structures might interfere with further steps in which adapter and barcodes will be added to the fragments by low cycle PCR .

primNote:Primers with Illumina adapters (in yellow and blue) and barcodes (in grey and light blue) are shown annealed to the strands of the synthetic fragment




P-32 calculations


The next step in my calculations is to determine how many nanograms of perfectly labelled sheared DNA would I have to load in the gel to see the size distribution with the phosphoimager.

To calculate this, first I need to know how many dpm in a band I need to see it using a phosphoimager.


My methodology is based on exposing a dried agarose gel with radioactive labelled P32-DNA to a Phosphor Screen which then is scanned using a typhoon imager laser scanner. Ideally I need to find detailed information on the limit of detection (LOD) of P32 using a given exposure time and scan resolution. Unfortunately,  information on the internet is not abundant. The manufacturer only provides an LOD of 2 dmp/mm2 for 14C  scanned at 200 µm.

According to this information then if I load a ng of perfectly labelled DNA (3547.5 dmp/ng, see my last post) in a well (4mm * 5mm) I will have a band with 177 dpm/mm2. However, my DNA will not run as a band but as a smear, since it has been sheared. The size of the smear will depend on the migration time of the gel. With a rough eye calculation, I would say that the smear is 4 mm * 20 mm (after migrating it for an hour).  If that is the case, then my smear will be: 3547.5 dpm/ 80 mm2 =  44 dpm / mm2.  Of course this is a rough calculation since the average dpm/ng that I am using was calculated with an average 6Kb fragment size, and smaller fragments with more ends will be hotter and I expect them to be more intense in the scanned gel.

I am not quite convinced that this calculations are right since the  LOD presented by the manufacturers is really vague. Fortunately, I found some useful data on a presentation in figshare.


Roy, Christian (2013): Limits of detection for P32 Phosphor Screens. figshare.

Retrieved 15:52, Oct 06, 2015 (GMT)

This figure shows the amount of radioactivity compared with relative band volume. Each colour represent a different exposure time. The darker grey represent “the sweet spot” or the ideal visibility of the bands according to the author.

Lets start with unit conversions

1nCI = 2220 cpm

100 pCI = 222 cpm

10 pCI = 22 cpm

So if my smear has a 44 dmp / mm2 then it will be visible by exposing it for 8 hours while exposing it for only an hour will give me a barely visible band.  So, according to this figure the minimum cpm that will be visible using a 8-hour exposure is 1 pCI = 2.2 cpm. This LOD is equal to the  reported LOD by the manufacturer  in the first place.

Interestingly, the former postdoc recommended me to expose my radioactively labelled isotopes overnight. This exposure time is congruent with the data showed by the figure. Maybe he optimised the exposure in a similar way.

This data suggests that the minimum amount of perfectly labelled DNA that I need is:

X dpm/ 80 mm2 =  2.2 dpm / mm2

X = 176 dpm

1  ng/ 3547.5 dpm *176 dpm =  0.05 ng = 50 pg of gDNA sheared to a 6Kb average size.





Increment of probability with number of motifs random fragments

More calculations….

This post is part of my thought process to try to understand the preliminary analysis that I have to do (read last two posts).

In the previous post I calculated the distribution of random fragments from a given size of fragment according to binomial distribution.

Than I calculated the increase expected in the distribution, if have 1 or more motifs increase the probability at a certain rate.  What I did was

First decreasing frequency when I have 0 motifs (x = 0)

The decreased amount divided by the total number of motifs present was added to each frequency with one or more fragments

This calculation however is wrong since I am not taking in account that the increase in frequency has to be proportional

For instance:

if my input is:

X =     0         1       2        3       4

# =    100   200   300   200   100

and 10% are taken up 90/900:

# =    10     20     30      20      10

Fx =   0.111  0.222  0.333 0.222 0.111

now what if having X >= 1 increase 2 fold

# =    10           40       60       40      20

Fx =   0.058  0.235  0.353 0.235 0.118

This same strategy can be used when the probability increases with the number of motifs as if each motif increment probability 5%:

prb  10%    15%     20%    25%    30%

#      10       30        60        50       30

Fx = 0.055  0.166  0.33   0.27     0.166

Sensitivity analysis

This post is intended to be a plan of the steps I need to do for sensitivity analysis I need to detect experimental limitations of my study. In other words, the sensitivity analysis will help me determine what kind of motifs I will be able to detect.

My thesis study will use synthetic degenerate fragments flanked with Illumina tags, as well as sheared genomic DNA (input DNA fragments) of naturally competent species with (Campylobacter jejuni) or without self-specificity (Thermus thermophilus, Acinetobacter baylyi). Input fragments will be then recovered from the periplasm in Rec2 knockout mutants using an organic extraction technique, to then sequenced them to a high coverage (~ 1000X) and determine presence of uptake bias.

For degenerate fragments I need to:

1. I need to determine how small the uptake motifs need to be to be able to detect them.

I have already calculated frequency distribution of a dimer AA and trimer AAA motif per 30bp, 50bp and 100bp fragments.

AA 3 figuresAAA 3 figure









An important consideration that I have to take in account is what is the probability that the motif is found also in the rest of the synthetic fragments (spacers and Illumina flanking sequences).

2. Determine how strong uptake bias needs to be to be able to detect it.

2.1. Assuming that having one motif has the same probability as having more than one.

This probability distribution seems easy to calculate since a fragment without a motif would be expected to decrease while the rest of the fragments with 1 or more motifs would increase evenly regarding to the amount that 0 motifs decreased

2.2  Assuming that probability of fragments taken up increases as number of motifs increase.

I idea I have to solve point 2.2 is that first I can assume that fragments without a motif would decrease in frequency. Now the frequency that was subtracted from the fragments with 0 motifs is distributed unevenly by a certain amount. I am not sure how to calculate this, since the amount they increased has to be proportional with the total frequency of 1 (100%)

For sheared genomic fragments I need to:

1. I need to determine how small the uptake motifs need to be to be able to detect them.

1.1 how many uptake motifs (different sizes) will be expected according to distinct average size of sheared genomic fragments.

1.2 How does coverage will affect the ability to detect different uptake motifs.

For this a given genome coverage I need to calculate the coverage per base, and estimate the amount of bases with low number of reads (below a threshold, for example 10 reads).

1.3 Which will be the average number of reads per uptake motif given that the motif increment chances to be taken up by different amounts.


Fragments and laws of probability

Today, I spend the day working through calculation necessary to infer the probability of a simple uptake motif (lets say a dimer AA) be in a 30bp degenerate fragment and a 50bp degenerate fragment. The objective is to be able to chose the appropriate size of the degenerate region in my synthetic fragments.

First, my supervisor and I did a basic calculation to figure out the probability of finding an AA in a 30bp fragment.

Given 4 bases = A, C, G, T probability of having an A is 0.25 (1/4)

The probability of having two AA is 0.25*0.25 = 0.0625, or of the 16 dimer combinations, we have 1 success (1/16).

0.0625*29 positions (or events) = 1.8125

But first, to understand number of positions lets imagine we have a 5 base pair sequence:

number events

If we consider each nucleotide position as an independent event we have 5 events.

if we consider each dimer as an event we have 4 events

0.0625*29 positions (or events) = 1.8125 probability in one strand

1.8125*2 = 3.625 probability in 2 strand (since in random proportions probability of AA is equal to probability of TT and complementary strand should have the same proportions)

We thought (or at least I thought) that this result mean that we have 3.6 AAs on average in a 30bp fragment

However, after one day of calculations and frustration I realized that this calculations are wrong.

I went back to my statistics books from my undergraduate statistics classes (10 years ago) and I realize that this is a classic binomial distribution probability. The biggest mistake in the calculations above is that we only took in account the probability of “success” or “p”, but not the probability of failure or “q”.

the formula to calculate this probability comes from the formula:

P(x.n.p) = nCx (p^x) * q^(n-x)

where n = # of essays

x = binomial random variables (0,1,2,3….n)

p = success probability

q = failure probability (1 – p)

nCx can be resolves two ways. The first is using binomial coefficients of Pascal triangle and the second one is resolving nCx using factorials. If factorials are used then  the formula above turn into:

Using this formula (multiplying per 2 to account for two strands), I was able to calculate frequency distribution of probability of AA in a 30bp fragments and in a 50bp fragments. Note: If I am statistically strict I would not multipky the frequencies by 2 since they will not add to 1 (100%) anymore, they would add to 200% (100% each strand). Instead I would have to multiply by 2 the estimation of the number of fragments given a certain number of reads. This would just be an observation since It wouldn’t change my results.



Summary of last week meeting with former post-doc of the laboratory

Last week, my supervisor and I met with the former post-doc of the lab (now a PI in another university), regarding the plans for the experimental design of my thesis.

My experiments analyze DNA uptake bias in gram-negative bacteria by using a methodology that allow us to extract DNA taken up by a Rec2 knockout (DNA in this mutant is stuck in the periplasm) and sequenced them at a high coverage. The fragments taken up (output) are then compared to the input DNA to see motifs that might be taken up more frequently than others. As input fragments for my uptake experiments I will use sheared genomic DNA and a synthetic fragment with a  30 – 50 bp degenerate region.

One part of the discussion with the former-postdoc and my supervisor was how the degenerate fragments will be synthesized. This fragments would contain, illumina adapters, illumina priming site, the degenerate region, and a illumina barcode (or index). The main question was where to locate the barcodes?, it seems that the more effective strategy is to have a left fragment containing: an adapter, the primer site, the degenerate region, a an spacer complementary to the right fragment. Then, we can have 12 right fragments each with a different barcode as well as the other adapter.


Each sample tested would have 3 biological replicates and 3 input replicates. Each replicate would use a different barcode, which would allow to determine if a barcode accidentally matches an uptake sequence. I was thinking that the mix-and-match of samples and replicates could follow a randomize block design using two illumina lanes.

block design

Finally, we discuss that it would not be a good idea to sequence all the samples from my thesis in a same run (even though economically this would make more sense), since it could be risky if something does not go according to plans. So, first I will sequence the H. influenzae samples in a MiSeq.

The second part of the discussion refers to the short term plans about preliminary analysis I have to do:

First challenge is understanding how the input would look like. In other words, which would be the random variation (noise) in amount of fragments taken up if  all the fragments are taken up equally (null hypothesis)?  Once I already figure this out, the next step is understanding the limitations on how strong or weak have to be the bias to be detectable using our current methodology. This point is important since my thesis will use species that take up their own DNA as efficiently as DNA from distant-related species, and we expect to find very simple uptake bias that might not be as strong as in species that take up only DNA from closely related-species (Haemophilus influenzae and Neisseria spp.). Another challenge is to determine how useful each technique (degenerate vs. genomic fragments) is and which are their limitations. This information relates to the previously explained analysis of limitations of each technique

Periplasm preliminary experiments

This post explain a series of preliminary experiments that have as an objective to answer three specific questions about  a periplasmic DNA extraction procedure that has been described by Kahn et al 1983, Barouki & Smith 1985, and Mell et al 2012.

Questions to be asked are:

1. Is there any contamination of DNA fragments not taken up during the washes steps (step 3 figure below)

2. Is there any chromosomal DNA contamination in the periplasmic extraction?

3. How much DNA is been extracted by the periplamic DNA extraction

The protocol is based on a DNA uptake essay that will be done in Haemophilus influenzae a naturally transformable bacteria followed by a organic periplasmic extraction using Tris 10mM EDTA 10mM, CsCl and phenol:acetone. The periplasmic extract will be used as donor DNA in a transformation essay following Poye & Redfield (2003) protocol. This transfomation should be sensitive enough to detect DNA present in the periplasm.

Overview of protocol described in figure below


The first question will be answered by using  a Rec2(ComEC) knockout and a pilA knockout as recipient cells and sheared genomic DNA (average 6kb bp) of Map7 (an Rd strain with seven different antibiotic resistance markers). Rec2 is inner-membrane porin that is involved in DNA internalisation and knockout mutants can take up DNA but are not able to translocate DNA to the citoplasm. On the contrary, PilA knockout mutants are not able to take up DNA.

A. ) If the washes at step 3 worked effectively then I should expect to see no colonies in the transformation (step 9) using pilA knockout cells and plenty using Rec2 knockout cells

B.) If I see colonies in pilA mutant, I am going to have to re-think how to remove not-taken up DNA. I might increase washes.

The second question will be solved by using Kw20 cells with a nov or kan antibiotic marker on their genome as recipient cells and using sheared genomic DNA of NP with Nal antibiotic marker. By having two antibiotic markers I can assess the amount of chromosomal DNA that leaks to the periplasmic extraction and compared it to the amount of DNA recovered from the periplasm.

This brings me to  my third question, how can I quantify the amount of DNA extracted in the periplasm. This relates to my second question as based on the colonies that I get in the transformation I could estimate the donor DNA amount but I need to make a calibration curve assessing the transformation frequency at lower amount of donor DNA (from 1 ng to 20 ng; 5 or 6 points taken).

Another alternative is using a fluorometric dye such as Qubit reagents. There is no much information about the dye used in Qubit essays but it seems according to the manual that it bind specifically to dsDNA as shown by the figure below and it is much more reliable than nanodrop quantification. What I am not sure is that there is no information about how often it binds to DNA (every 10bp, 20bp, 100bp?). I know that some real time PCR essays use saturating dyes that bind a very often to DNA molecules but I was not able to get that information (and given the secrecy of some kit providers I might not able to find it)