Last week, my supervisor and I met with the former post-doc of the lab (now a PI in another university), regarding the plans for the experimental design of my thesis.
My experiments analyze DNA uptake bias in gram-negative bacteria by using a methodology that allow us to extract DNA taken up by a Rec2 knockout (DNA in this mutant is stuck in the periplasm) and sequenced them at a high coverage. The fragments taken up (output) are then compared to the input DNA to see motifs that might be taken up more frequently than others. As input fragments for my uptake experiments I will use sheared genomic DNA and a synthetic fragment with a 30 – 50 bp degenerate region.
One part of the discussion with the former-postdoc and my supervisor was how the degenerate fragments will be synthesized. This fragments would contain, illumina adapters, illumina priming site, the degenerate region, and a illumina barcode (or index). The main question was where to locate the barcodes?, it seems that the more effective strategy is to have a left fragment containing: an adapter, the primer site, the degenerate region, a an spacer complementary to the right fragment. Then, we can have 12 right fragments each with a different barcode as well as the other adapter.
Each sample tested would have 3 biological replicates and 3 input replicates. Each replicate would use a different barcode, which would allow to determine if a barcode accidentally matches an uptake sequence. I was thinking that the mix-and-match of samples and replicates could follow a randomize block design using two illumina lanes.
Finally, we discuss that it would not be a good idea to sequence all the samples from my thesis in a same run (even though economically this would make more sense), since it could be risky if something does not go according to plans. So, first I will sequence the H. influenzae samples in a MiSeq.
The second part of the discussion refers to the short term plans about preliminary analysis I have to do:
First challenge is understanding how the input would look like. In other words, which would be the random variation (noise) in amount of fragments taken up if all the fragments are taken up equally (null hypothesis)? Once I already figure this out, the next step is understanding the limitations on how strong or weak have to be the bias to be detectable using our current methodology. This point is important since my thesis will use species that take up their own DNA as efficiently as DNA from distant-related species, and we expect to find very simple uptake bias that might not be as strong as in species that take up only DNA from closely related-species (Haemophilus influenzae and Neisseria spp.). Another challenge is to determine how useful each technique (degenerate vs. genomic fragments) is and which are their limitations. This information relates to the previously explained analysis of limitations of each technique