Skip to main content
Data notizia
Immagine
Image
Gruppo ricerca DEVIL
Testo notizia

Ten million cells analysed in less than two hours, with memory usage approximately three times lower than the best existing tools and speeds up to forty times faster on the largest datasets compared with the best existing tools. This is the remarkable result achieved by a group of researchers from the University of Trieste, Area Science Park, SISSA and Human Technopole, who developed DEVIL (Differential Expression with Variational Inference Learning), a new high-performance computational tool. The work has been published in Nature Communications.

Understanding which genes are active in cells is one of the keys to understanding diseases and developing new therapies. Today, the most advanced technologies make it possible to measure gene activity in millions of cells from dozens or hundreds of patients, generating an unprecedented amount of data for biomedical research. This revolution, however, brings with it two major challenges: on the one hand, the risk of errors in data interpretation; on the other, the difficulty of analysing such large volumes of information.

The first challenge is computational: analysing millions of cells requires enormous computing power. Traditional methods are too slow and consume too much memory to handle these volumes: a bottleneck that risks undermining the advantages offered by new data collection technologies. The second challenge is statistical. Cells collected from the same patient resemble one another more than they resemble cells from different patients, because they share the same individual biology, the same environment and the same personal characteristics. Ignoring this fact — as many currently used tools do — can lead to distorted statistical conclusions, with the risk of identifying as “significant” cellular changes that are not actually significant, or, conversely, of missing real ones.

To address these two issues, the researchers, thanks to DEVIL, succeeded in combining statistical rigour and computational speed in an innovative way. From a computational perspective, DEVIL, which was also developed with the support of Fondazione AIRC, was designed to make efficient use of the most advanced parallel computing architectures typical of artificial intelligence. Moreover, DEVIL is not only faster, but also uses less memory — a far from secondary detail. This means that analyses previously reserved for major computing centres can now become accessible to smaller research infrastructures and laboratories. From a statistical perspective, DEVIL addresses the problem through a Bayesian approach that correctly accounts for the structure of the data, treating cells from the same patient as correlated and therefore separating differences between patients from genuine differences in cellular activity.

This work would not have been possible without ORFEO, the Area Science Park data centre, recently upgraded thanks to funding from Italy’s National Recovery and Resilience Plan,” says Stefano Cozzini, Director of Area Science Park’s Research and Technological Innovation Institute. “The availability of latest-generation GPUs, characterised by extremely high computing performance, together with careful optimisation of the algorithms for this architecture, developed by our team, now makes it possible to use DEVIL to address and solve problems on a significantly larger scale. We are very satisfied: it is not often that one can rely on a team with such high-level expertise, capable of making the most of the resources acquired.”

Differential expression, that is, the statistical analysis that identifies which genes are significantly more or less active across two or more different biological conditions,” explains Giulio Caravagna of the University of Trieste, “is a mature technology. However, the transition to single-cell analysis has introduced statistical and computational issues that make the integrated analysis of large patient cohorts complex. Our work was developed precisely to overcome this bottleneck, combining methodological innovation and high-performance computing in order to scale up to the analysis of millions of cells from hundreds of patients.”

“In the development of DEVIL, the synergy between classical and Bayesian statistical tools represents a key strength within the reference oncological literature,” says Leonardo Egidi of the University of Trieste, “and makes DEVIL an efficient computational protocol with a strong methodological characterisation. Future developments could involve spatio-temporal models for multiple patients and introduce further computational approximations based on theoretical properties that are currently under study: a valuable combination of statistical, computational and biological expertise.”

DEVIL was tested on two concrete biological case studies. In the first, focused on the identification of immune system cells, the tool proved more precise and specific in recognising relevant biological functions. In the second, concerning the ageing of human muscle tissue, it identified age-related transcriptional changes in a more stable and biologically grounded way, reducing noise and highlighting key processes for subsequent analyses.

DEVIL has been released as free and open-source software, available to laboratories and hospitals around the world, paving the way for a new generation of large-scale genomic analyses for the study of tumours, degenerative diseases and the development of personalised medicine.