← Back to Blog

Short update on our research projects

March 21, 2019 · by Agnieszka Szmurło

Short update on our research projects

Over the past several months we have been working on SeQuiLa solution (this comes up if you mix tequila & SQL ;) which in general is application of Big Data technologies to solving computationally expensive genomic problems.

For the first part we have tackled distributed range joins with broadcastable Interval Trees injected into Apache Spark’s optimizer. That’s how the first SeQuiLa package was created.

Secondly we have implemented event-based method of coverage calculations in distributed manner using accumulators & broadcast variables to reduce network shuffles.

Have a look at the documentation site: SeQuiLa docs

The range-joins part is already published in the peer-reviewed journal Bioinformatics: publication

The coverage part is under review in GigaScience but you can read it already in biorxiv: preprint

More is coming. We are still working on propagating SQL access and distributed processing for genomic data.

← All Posts

Have questions or comments?

Let us know — we'd love to hear from you.

Contact Us →