Jasper Toscani Field
Single-processing core versions of Rappel were completed in both Python and C. Functionality of both versions provided the expected utility of rapid read location mapping to identify homologous species sequences while also providing standard sequence variation information. A parallel processing core version of Rappel was completed to the kmer hashing and mapping stage. Anecdotally, the speed of Rappel appears promising on multiple processing cores and with additional development the program will perform exactly as expected. Additionally, novel designs were conceived late in program development. MSAs often contain significant regions of DNA where multiple organisms display identical sequences. These tracts of exactly homologous sequence could be used to construct a consensus sequence which can be searched a single time without requiring multiple kmer searches, vastly improving Rappel’s runtime. In the event of multiple shared homologous sequences being found (a group of species shares one identical version of a sequence region while another group shares a different version of the same region), multiple consensus sequence may be constructed to improve runtime regardless of the number of sequences present in the MSA. Moreover, it may be possible to construct a Burrows-Wheeler transform from the consensus sequence, allowing for extremely rapid whole read matching while kmer matching would still be preferable for regions of inexact homology between the MSA members.