Security-first Federated Quantum Machine Learning for Genomics
Over the past decade, there's been an enormous explosion of Machine Learning (ML) use cases across multiple industries, healthcare included. However, for many key organisations with sensitive private datasets, such as NHS Trusts, the mainstream/generic centralised ML training doesn't provide the necessary assurances in terms of the privacy and security. This is where an emerging ML paradigm, known as Federated Learning (FL), comes into play. Usefully, FL allows multiple parties, that don't necessarily trust each other, to collaborate on training a common machine learning model, all without having to share their data with each other. Thus this technology fundamentally addresses the problem of data privacy and security.
Nevertheless, a crucial detail to note is that, while FL is a robust solution when it comes to; data access, governance, and ownership, it does not guarantee security and privacy unless combined with other security add-ons. Thus, FL is subject to some cyber security attacks, an example of which is if the local training datasets are not encrypted, attackers can steal personally identifiable data directly from the training nodes, or interfere with the communication process via the classical technique of a poisoning attack. Moreover, in a normal FL setup, the models are also not encrypted, leaving them open to adversarial attacks, including extraction of sensitive training data from attacks on the models.
Now, a natural question one may ask is, what can be combined with FL to make it a viable solution in the healthcare space? The answer is, yes, and our proposal for this project is to supplement FL with two emerging technologies:
1. Fully-Homomorphic-Encryption (FHE): In a nutshell, FHE is a novel computational paradigm that allows computation to be applied to encrypted datasets i.e. directly to cipher-text, without any decryption before/during/after the computation. The results of the computation, once decrypted, should in practice be identical to a situation in which it was applied to unencrypted data.
2. Quantum-Machine-Learning (QML): In effect, QML sits at the intersection between Quantum Computing and ML. In our case, this is aimed at exploiting quantum mechanical properties, including; superposition and entanglement to build better and faster algorithms.
In the healthcare sector, this solution would provide even greater security assurances, significantly lowering information governance barriers when sharing sensitive data with third-parties. This would thereby naturally enhance; collaboration, service innovation, and patient outcomes, without compromising data integrity & security. In-brief, we're aiming to address the unmet need for privacy-enhancing ML.
Automated Classical-to-Quantum Data Encoding for Genomics Data
Biomedical data is an essential resource for developing machine learning models that can aid in diagnosis, treatment, and prevention of diseases. However, the collection, storage, and sharing of biomedical data presents significant challenges due to their sensitive nature and ethical considerations. Healthcare data is subject to strict regulations and privacy laws, making it challenging for researchers to access and share data. Moreover, machine learning models trained on a single dataset tend to overfit, and may not generalise well to new data, which limits their potential use in real-world applications.
To address these challenges, we must explore new approaches to biomedical machine learning that can leverage large and diverse datasets, whilst also ensuring data privacy and security. One promising novel solution is to combine the power of quantum computing, with the benefits of federated learning (FL), namely, hybrid classical-quantum federated learning. This distributed quantum learning approach enables organisations to train hybrid quantum machine learning models on their respective classical datasets, without sharing raw data. In federated learning, the machine learning model training is distributed to individual devices or servers with access to quantum processing units (QPUs) to run the quantum part, which then trains the model on their respective datasets.
Unfortunately, classical datasets cannot directly be loaded into a quantum computer for processing, they need to be encoded into a form that a quantum computer can understand beforehand. In essence, classical-to-quantum data encoding is the process of converting classical data into quantum states for further usage in a quantum algorithm. However, due to the limitations on the number of qubits, due to the current generation of quantum hardware a.k.a Noisy-Intermediary Scale Quantum (NISQ) hardware, existing general data encoding methods are not always fit for purpose, especially when it comes to datasets such as DNA sequences. For example, we've recently tested a popular encoding scheme known as 'Amplitude encoding' on DNA sequences and found the following shortcomings: it is highly-sensitivity to input data, as small changes in the DNA sequence can result in large changes in the output.
Our idea is to develop an efficient and automated classical-to-quantum data encoding software-as-a-service (SaaS) toolkit codenamed NZ-SeQTech. One that is optimised for hybrid genomics data and federated quantum learning use cases. The aim is for NZ-SeQTech to provide; researchers, students, professionals, and enthusiasts working in Genomics with an easy-to-use and accessible data encoding tool for training hybrid classical-quantum models in a federated setup.