How connecting data will lead to faster medical breakthroughs

by

Graph databases hold incredible potential for medical researchers to gain previously unattainable insight, predicts Emil Eifrem, co-founder and CEO of Neo4j.

Medical progress in the past decade has been great, but we’ve reached a bottleneck of which many of us are unaware.

Spreadsheets and relational databases that have been a mainstay over the past 30 years have reached their limits. And because they cannot cope with the vast amounts of data and the complexity of the multiple sources of data we want to explore, we’re going to land in increasing difficulty.

Medical data is highly heterogeneous and complex by its very nature, ranging from cell-level to massive population study – and often in the same study group. Researchers almost always want to link at either end of the scale, the meeting point between different data sets, which is where the compelling results tend to sit.

This presents a formidable challenge. At the same time, researchers working toward promising new medical treatments are also looking at huge amounts of data, often running into thousands of gigabytes.

Working out how multiple researchers can access and collaborate on the data is also a challenge. With data that often comes in an unstructured format and needs to be turned into a valuable research ingredient as quickly as possible – not just simply initially analysed and stored – we clearly need a dynamic, scalable way to leverage and connect big data.

The good news is that graph databases, are a viable and powerful alternative. Let me explain why.

Graph technology was first used by social web giants Google, LinkedIn and Facebook, and more recently in large-scale data applications like The Panama and Paradise Papers. The deep data mining and pattern detection capabilities of graph technology provide a route to invaluable insight.

Graph technology also has the innate power to collaboratively filter data, making great use of the information gathered by many users. Collaborative filtering is also a core technique used by recommendation engines, where information or patterns can be filtered via data sources, viewpoints, multiple agents and so forth. This approach allows research teams to work on lots of promising data in parallel, saving time and money.

Graph technology is already being leveraged by researchers. In the fight against cancer, for example, there are at least eight different graph-powered projects working toward a cure, for instance. One powerful example is the Candiolo Cancer Institute (IRCC) based in Torino, Italy.

IRCC makes significant contributions to the fight against cancer by understanding its scientific basis and providing state-of-the-art diagnostic and therapeutic services. The Institute is at the interface of molecular biology and precision medicine, with the team performing molecular and biological tests on cancer samples that have been collected from hospitals around Europe.

A more flexible model

The team at IRCC needed to develop a laboratory information management system to track data, such as the biological and molecular properties of the cancer samples, and the subsequent scientific procedures performed on these samples. This information feeds into a database used to analyse data and generate high-level biological hypotheses. Different types of structurally-complex data tend to be hierarchical, with intricate and frequently-changing relationships, which necessitated a number of integrated data models.

“Our application relies on complex hierarchical data, which required a more flexible model than the one provided by the traditional relational database model,” confirms Dr Andrea Bertotti, the manager of the project.

IRCC has developed a production version of their database that relies on a relational database to store the legacy data and track entities, characteristics and laboratory procedures, and it is supported in this task with a graph database. Using the data storage method MongoDB to store the raw, complex data, the graph database does all the heavy lifting – finding complex relationships, analysing the Institute’s experimental procedures, and modeling genomic domain and complex semantics.

With a graph database, the team gains real insights into its data, while the new flexibility of the database allows it to evolve and accommodate continually changing biological research, along with the ability to model relationships between concepts. 

In another example of the potential of graph, the German Centre for Diabetes Research, the DZD, is using graph technology in combination with techniques such as artificial intelligence (AI) to make valuable data connections.

A master database approach

The DZD, the Federal Republic’s national centre for studying diabetes, brings together experts to develop effective prevention and treatment measures for diabetes from multiple disciplines. The DZD’s research network accumulates a huge amount of data from clinical trials and patient information, covering a host of disciplines, distributed across various locations. DZD’s internal IT leadership decided it needed a new ‘master database’ that consolidates information and provide its team of 400 scientist peers with a holistic view of available information.

According to its head of Bioinformatics and Data Management, Dr Alexander Jarasch, graph technology provides a means of finding connections across all this data for queries and experimentation, which dramatically speeds up data analysis. For example, the DZD is using graph database software to mine deeper into its diabetes ‘map’ to seek out hidden relationships, allowing its researchers to examine new avenues of research.

The DZD is also looking to build new data models, and better compare animal and human data it has collected. In a graph representation, abnormalities, patterns or connections can be easily picked out and questioned, according to Dr Jarasch. In the future, data from diabetes research could be integrated with that of Alzheimer’s research, for example, to discover possible new connections.

The DZD is also looking to exploit the combination of machine learning (ML) with graph databases to identify new subtypes of diabetes, believing it will eventually be possible to create predictive models that will track the stages of a disease to a certain degree of probability. IRCC and DZD are some of the many use cases illustrating graph technology’s innate ability to discover data relationships, which promises to feature strongly in the future of medical research.

Back to topbutton