Theme leads:
Most traditional data analysis algorithms are limited to handling datasets that contain homogenous data (either images or text). The digital era has led to a rapid increase in the diversity of data used in cancer research and interventions, such as electronic patient records, genomics, functional genomics, metabolic phenotyping, and deep immunophenotyping analysis. This diversity together with the ever-increasing capabilities of imaging technology has led to the availability of mixed-type, highly dimensional data sets.
Leveraging these datasets to the full requires techniques for data analysis that can efficiently deal with mixed-type data (images, text, microarrays, serial analysis of gene expression, RNA sequencing) of high dimensionality. These techniques should not only be able to extract useful information from the data, they should also be able to communicate meaningful and explainable insights into humans. We will address these challenges by employing AI-based data analysis methods that can learn and represent knowledge from high-dimensional datasets. Ultimately, our research will result in a Digital Oncologist and a Digital Patient. With the Digital Oncologist, we will be able to search and screen the Digital Patient and generate recommendations for what new data would be most valuable to collect. It allows us to make
- Probabilistic search: query by probability (“find me the treatment with the most expected positive clinical outcome”) and query by example (“find me the 5 patients that are more similar to Paula”)
- AI-assisted assessment of data quality: “Find me RNA sequencing from the Lab that are most likely to be errors and/or anomalies.”;
- Virtual measurements: “Generate some MRI measurements we might expect for a patient with this type of cancer, given all the other MRI measurements we have observed so far.”
- AI-assisted inferential statistics: “What tissue markers, if any, predict increased risk of cancer recurrence given the current diagnosis? And how confident can we be in the amount of increase, given uncertainty due to statistical errors in the data and a large number of possible alternative explanations?”
- AI data completion: “What is the most probable value for this missing information in the patient record, given all the other patient’s data and all the data of patients we have observed so far?”
The ultimate aim of this theme is to develop a Digital Oncologist and a Digital Patient. The Digital Oncologist will be able to search and screen the Digital Patient to 1) find the treatment with the most expected positive clinical outcome; 2) predict risk of cancer recurrence; 3) generate recommendations for what new data would be most valuable to collect to improve the diagnosis and predictions; 4) generate virtual measurements and detect data anomalies.
This new AI system will improve the accuracy of diagnosis and treatment providing more targeted care for patients. It will reduce the cost of unnecessary screenings and treatments. It will influence companies to change the way they use, collect, store and share mixed-type health data. As a result, industry business processes will need to be designed to embrace the digital oncologist and digital patient AI system and this will increase commercial opportunities for the industry.