Q&A DNA

The Future of Mainstream Genomics

Dom Vinyard
9 min readJul 24, 2017

I visited the future. Ten years from now, the software that owns the market looks like this.

It is 2027 and a Question & Answer based genomics app for the casual user is fully mainstream. Users add their DNA samples and pay to get straightforward Answers. The creators of the app started with a simple premise:

Know what’d be cool? A service offering simple answers to simple questions about DNA. For everyone 👍.

To achieve it they relied on third-party developers to build Answers which were collected together in a store like smartphone apps. It’s centralised; the head scientist at an international health agency gets access to exactly the same Answers as the ten-year-old in science class. This democratic spirit is core to the design and the platform succeeded on this network effect.

Contents

I played with it for a while and here is my write-up, split into five parts. If there are any glaring spelling or grammatical mistakes it’s safe to assume that the text got mangled in the time machine.

  1. The Users
  2. The Samples
  3. The Answers
  4. The Answers Store
  5. The Ecosystem

I wasn’t allowed to take screenshots but I did do a few quick sketches. I’ll include them if I get time. If I missed a couple of things, come find me in 10 years and let me know. We’ll discuss it over some sequencer-approved single-origin coffee.

The Users

The same tool is used by people across all classes of industry.

Food distributers, supply chain logistics, supermarkets, chefs, brewers, sommeliers, gourmets, water treatment specialists, beauticians, dermatologists, dieticians, nutritionists, allergists, doctors, dentists, vets, nurses, pharmacists, outbreak specialists, genealogists, science teachers, farmers, animal breeders, gardeners, landscapers, botanists, cleaners, sanitation experts, pest controllers, criminal investigators, behavioural therapists, scientists, researchers.

Not to mention users at home. Have I got a flu developing? Should I visit my GP? Why has my pond gone bright green all of a sudden? Does my dog have a bit of poodle in it?

In fact there are two popular apps to check if your dog has poodle in it. PoodlePercent will tells you exactly how much poodle to the fraction of a percent, and WIMP (What’s In My Puppy) will tell you more roughly what combination of breeds compose your dog.

How big is the market?

It was supposed to be $22 billion by 2020 but nobody anticipated widespread access to an app like this. The creators designed a single ecosystem that grew organically enough for all available markets to address themselves which turned out to be infinitely more viable than having teams design custom workflows for individual members of individual industries on an ad-hoc basis. In the words of one of the creators:

It accommodated the industries we least expected and self-pivoted to target industries that didn’t exist when we started.

All of the sequencing data lives on the blockchain. Nobody owns it. We all own it. They train neural networks with the data, carry out research with the data, they licence insights into the data (properly anonomised) to others to use for research. The bigger companies with their own bespoke platforms resisted the model initially but were eventually persuaded by the sheer power and ubiquity of the system.

The Problem

It was always field with a lot of valuable data, but data which was extremely difficult to interpret. No matter how it was annotated, the learning curve was insurmountable for the casual user.

Non-specialist users needed two things, in order of importance:

  1. Answers to specific questions about their DNA samples which they could take action upon without learning any new terminology or special training.
  2. A way to ‘visualise’ their DNA samples. Enough that they could feel confident that it was in the hands of people that understood it and that they could distinguish one sample from another.

Point 2 existed solely to achieve point 1. In other words:

It’s all about the Answers. We visualise the samples solely so the user can clearly see which one they are asking for an Answer about.

The Samples

The users have only a DNA sequencer and this app. Uploading a sample requires no config whatsoever: connect DNA sequencer, add sample, one click to upload.

Once a sample been uploaded into the cloud it is immediately deleted from the device and remains 100% locked-in to the platform from the point of upload.

What’s a sample?

It could be blood, sweat, tears. Cow dung, pond water. A frozen 100% beef lasagne which is 99% beef and 1% camel. If it exists in nature, this app can look for Answers in it.

Once the sample is uploaded it is displayed inside the app. The app generates a visual thumbnail for each sample based on what’s in the sample. All uploaded samples are species-identified by default and this species identification data informs the thumbnail design.

The thumbnails are awesome, nothing short of a visual language for all of life. They tell you at a glance what was found in your sample and quickly see which sample you’re requesting an answer about. When a sample is uploaded, the thumbnail appears in a Samples List like a new photo in a photo library.

Scrolling through a list of samples is a rich and visually compelling experience.

Tapping a thumbnail opens up the sample. The sample shows a list of Answers that the user requested about that sample, and a gateway into the Answers Store to request new Answers about the sample. That’s it.

The Answers

Opening up a sample lets you get an Answer to a question. It’s quick too (they don’t upload anything but the minimum required for analysis. No cruft, crazy compression). The Answer scans through the sample and returns a simple piece of information which has specific value to the user.

There is no need to wait until the sample is fully uploaded. Some Answers, such as ‘is pathogen X present?’ will return immediately if the pathogen is found immediately. You can see an estimate of how long each Answer will take to calculate before you ask.

Answer Types

The type of information returned varies from Answer to Answer depending on what the user needs to know. Some Answers return a single word, others return a sentence, a yes or no, a number, a list, a simple graph, an image.

There is another Answer Type, and it is crucial to the way the platform operates. An Answer can return a subset of its sample. This means that it uses Answers as filters. For example, an Answer to ‘Chicken Only’ will go through every strand in the sample and strip out everything that’s not chicken DNA.

Combining Answers

You can do enormously powerful things without ever breaking out of the simple Answer model. Answers can be combined so that simple Answers can be used as building blocks for more complex answers. This turned out to be one of the app’s most powerful concepts.

Answers can be combined, such that simple Answers can be used as building blocks for more complex Answers.

A developer can write an Answer that does nothing but call half a dozen other simple Answers published by other developers (mutations in various locations, for example), combine the results, and publish this new Answer for others to use.

Plus, Answers can be combined and stacked to any depth. Like a pyramid. The most complex Answers are built on top of hundreds of simple Answers to return a result.

Answer Domain

Some Answers are only be applicable to certain types of sample; a particular gene, or region, or chromosome, or species, or sufficiently varied metagenome. Some Answers have a very specific domain (paternity test, ‘two Y chromosomes’) others, especially the super low-level ones are more universal. Making it clear to the user which Answers are married to which domains was a massive usability undertaking. They also added Comparison Answers, which work on groups of samples.

The Answers Store

It’s like an App Store but for Answers. This store is curated in-house and takes a strong editorial line for the sake of consistency. Developers of Answers are paid every time somebody requests them. Answers do not stand still. Developers of the top Answers regularly release updates and bug fixes.

Answers that the user has requested before and Answers that the user has favourited rise to the top. New or trending Answers specific to their industry rise to the top. It has enough cool and interesting Answers surfacing that it’s actually possible to take a random sample and then look for something fun to analyse it for.

Although they say “all Answers are available to all users”, a side-loading process circumvents this in extraordinary cases (defence, government, etc).

Developers

What abilities do they give their developers? What ways to manipulate data are permitted? How do developers hook onto other Answers, external databases, external resources? This is a big topic and will require a more detailed developer-centric write up. It’s enough to say here that it required an enormous amount of experience and expertise in multiple fields and disciplines to realise. In fact, getting this right consumed the majority of the development time. In their words:

At its heart, this was a project to build an Answer Store and development environment for developers to easily build and distribute Answers. 95% of the work was spent building the developer framework. It took years and some grey hairs.

The ecosystem was the secret. The app just wrapped an uploader and an exceptionally pretty skin on top.

Developer economics

Developers of Answers earn their commission per request and the commission rates are variable.

The commission structure is sufficiently generous to attract talented developers to the platform.

The Answer payment points are tiered and the platform is in full control of the actual pricing of the tiers. They vary between regions and use-cases (lower rates for charity, education).

If one Answer depends on two other Answers, and each of those depend on two other Answers still. All seven developers in the pyramid are paid for that request. The bigger the pyramid, the smaller the income for each individual developer as the capped commission pot is spread thinner. The further down the chain of Answers a developer sits, the lower the rate. Developers of basic ‘component Answers’ near the base of the pyramid earn less on each transaction but used far more frequently. Developers of ‘consumer Answers’ at the top of the pyramid earn far more per transaction but are only used by the users they directly target.

Billing is crazy simple from the user perspective. Storage tiers are available but speed/service tiers are not. Every user gets the same excellent user-experience and time-to-data.

The Ecosystem

They knew that if this was to succeed as a universal platform, it would be essential to attract developers from outside of the field. With some lower level Answers it’s just not be possible (they are still written by genomic specialists), but higher-level answers are well within the realms of traditional software development. They knew this early on:

We needed to convince traditional software developers with no prior experience in genomics or bioinformatics to consider writing an Answer as their next project.

They had to write a good deal of Answers in-house to get the ball rolling but when it took off, non-industry developers flocked to take advantage of the early gold-rush and the platform flourished.

They don’t run adverts. They tried it but someone had their stomach bug misidentified as athletes foot and was shown an advert for running equipment. A tabloid ran a headline about ‘having the runs’. Bad PR. They now show ‘sponsored Answers’ which is much more tactful.

Competition

When this app launched it was pretty expensive. The Answers Pyramid model meant that commission was spread more widely than the competing services that provided bespoke and direct analysis. So, the developers targeted areas where cost per base was not the primary commercial driver and snuck in through a few low-key markets.

Developers competed against each other in areas where there was a clear addressable market and wherever the pyramid grew uneconomically large they had incentive to disintermediate. Finally, the cost started dropping as storage and compute got cheaper and the gulf between the general and the bespoke was negligible

In short, they went all-in on the idea that affordability increases with Moore’s law, but usability does not.

And they won.

Also, the lotto numbers for that week in 2027 will be 6, 12, 19, 30, 48 and 50. The bonus ball is 11.

--

--