The future of data analysis

If you look at how the technical innovations of the future such as Big Data and Artificial Intelligence work, it is clear that the collection and analysis of large data sets will become increasingly important.

Industry and science are already collecting large amounts of data and evaluating them for their own purposes.

On the other hand, the further development of technologies such as machine learning and AI depends on the availability of large data sets that can be used to train them – but the synergy of these two interests is largely absent.

Particularly in the chemical and health industry, the sharing of data records turns out to be a delicate matter: on the one hand, companies do not want to disclose their data for reasons of competition; on the other hand, the collected (personal) data is often subject to data protection, so that it cannot be used without further ado.

This leads to companies and institutes researching and developing in their own bubble instead of jointly developing models and supporting each other.

The Startup Apheris AI is dedicated to solving this problem. With Apheris, it should be possible to cooperate with each other securely and efficiently without having to cede the rights to one’s own data.

Founder Robin Röhm explains how this works in our interview.

Apheris founder Robin Röhm and Michael Höh

What is Apheris AI?

Robin: With Apheris, we offer distributed and privacy-enhanced computing to train common models on different data providers. Specifically, it is a software system and specific algorithms that we develop. These algorithms serve on the one hand to evaluate data directly on the data centres of the data providers, but on the other hand they are also cryptographic procedures that should ensure that the data underlying the result cannot be reconstructed.

For example, we can allow two companies to evaluate their data together without the companies ever seeing each other’s data.

We build a distributed network to which companies can connect their data without leaving their servers. Our algorithms are sent to the servers of the data providers, trained there and then return as a black box. A pharmaceutical company, for example, which is interested in such models, then has a trained algorithm without being able to access the data of the data providers (for example, the suppliers).

However, this is only one of the ways in which our platform can be used. In the case described, we would provide our algorithms to evaluate data. But it is also possible that a pharmaceutical company has already developed models and now wants to train them based on the data of a certain provider – in this case we would use our cryptographic methods to ensure security.

In both cases, the provider can offer the training of the models based on his data as a service and thus has the possibility to monetize data in a targeted manner.

This is an advantage in that, in the past, it was often the buyers of data who made the most profit, while the providers of the data had to cede their rights to it.

Apheris acts here as a mediator, so to speak. In the long term, however, we as a company also want to become part of the value chain, for example by approaching data providers ourselves and suggesting possible collaborations.

Essentially, the start-up consists of two components: The engineering and the privacy part. The engineering component comprises the forwarding of an untrained model, including the requirements, rights and obligations of the companies involved, to the infrastructure of the data providers, where this model is then trained.

The privacy part consists on the one hand of a built-in security layer, which ensures that no codes for extracting data are included in the models, and on the other hand of our cryptographic methods.

A big problem in machine learning arises when models learn a lot about the underlying data with which they were trained. In science, where, for example, models train for face recognition, it has already happened that concrete faces could be reconstructed from the data sets.

Transferred to health data, it would be fatal if they were to appear in databases.

For this purpose we offer a model of differential privacy. For each result we calculate a certain amount of randomness to balance out at the end of the computational processes. This way, the results are not distorted, but by adding something, we ensure that the original data is not recoverable.

What does your business model look like?

Robin: We mainly offer the service of training these models and making our software available. At the end of this process, there is a black box into which a pharmaceutical company can send in requests and receive calculated results. This black box then lives on our infrastructure.

The pricing is still flexible. On the one hand, license fees are charged for the models, and on the other hand, we also get paid directly for our service.

In the long run we would like to become the owners of these models, because they could be interesting for several companies and a gap could be closed.

At the moment companies often want to have exclusive rights for their models, so our hands are still tied.

What distinguishes you as a team?

Robin: What distinguishes us as a team is that we have a very diverse background. For example, we have expertise in mathematics, medicine and philosophy. Apheris is also the fourth company that I am involved in founding.

For example, one of the previous companies was Janos Genomics, where we developed software for gene data. In connection with this, we received feedback that interest in such software already existed, but that access to the data still needed to be expanded.

This is how we came across the problem that we want to solve using Apheris.

Our co-founder Michael Höh has experience with algebraic theories on data linkage. Originally, the idea was to calculate data that resides on different computers in a network. For Apheris, we applied this to physically distributed data.

What are your plans for the future?

Robin: At the moment, we play more of a role as an intermediary between different companies and data providers. But in the long run we also want to become the owners of the models we develop and distribute.

Our company is already growing cheerfully and the further growth prospects also look very good. We could imagine many more areas of application in the future.

Such as training submodules based on many different data. By tapping health data from hospitals or institutes, which are obviously not allowed to sell their data, it would be possible, for example, to cover an entire field. It would be possible to combine the data of different providers and thus create interesting data sets.

Be a part of the Digital Hub Mannheim/Ludwigshafen:

Get talking to innovative startup and established corporates in our network.