Publications
List of publication in reversed chronological order.
2025
- ProtoDrift: Evaluating the Impact of Small-to-Big Adjustments to Chemotherapy ProtocolsAlice Rogier, Elisabeth Ashton, Corine Barat, and 8 more authorsmedRxiv, 2025Preprint available at medRxiv:2025.05.09.25327311
Adjustments to chemotherapy protocols are common to adapt treatments to individual patient needs, yet consensus on the impact of such adjustements is lacking. We introduce ProtoDrift, a novel metric that quantifies time and dose adjustments in chemotherapies. This weighted distance enables an assessment of their impact on patient outcomes, offering more detailed analyses than the traditional Relative Dose Intensity (RDI). We compared ProtoDrift and RDI prediction performances through survival analyses on 20,808 patients across 38 groups, categorised by cancer location and treatment line at two hospitals. Without optimisation, ProtoDrift achieves either comparable or better prediction results in 71% of patient groups (27 out of 38). Once optimised, ProtoDrift surpasses the RDI C-index predictions in 89% (16 out of 18) of patient groups from the first hospital. This study confirms ProtoDrift as an advanced tool for refining chemotherapy regimen design, highlighting the critical role of time adjustments in patient outcomes.
2024
- Facilitating phenotyping from clinical texts: the medkit libraryAntoine Neuraz, Ghislain Vaillant, Camila Arias, and 8 more authorsBioinformatics, Nov 2024
Phenotyping consists in applying algorithms to identify individuals associated with a specific, potentially complex, trait or condition, typically out of a collection of Electronic Health Records (EHRs). Because a lot of the clinical information of EHRs are lying in texts, phenotyping from text takes an important role in studies that rely on the secondary use of EHRs. However, the heterogeneity and highly specialized aspect of both the content and form of clinical texts makes this task particularly tedious, and is the source of time and cost constraints in observational studies.To facilitate the development, evaluation and reproducibility of phenotyping pipelines, we developed an open-source Python library named medkit. It enables composing data processing pipelines made of easy-to-reuse software bricks, named medkit operations. In addition to the core of the library, we share the operations and pipelines we already developed and invite the phenotyping community for their reuse and enrichment.medkit is available at https://github.com/medkit-lib/medkit.Documentation, examples and tutorials are available at https://medkit-lib.org/.
- Representation and comparison of chemotherapy protocols with ChemoKG and graph embeddingsJong Ho Jhee, Alice Rogier, Dune Giraud, and 4 more authorsIn SWAT4HCLS 2024 - 15th International Semantic Web Applications and Tools for Health Care and Life Sciences Conferenc, Feb 2024
Background: Chemotherapy, a central cancer treatment, employs antineoplastic drugs to hinder cancer cell replication by disrupting DNA synthesis or mitosis. Chemotherapies follow complex protocols composed of cycles of treatment where antineoplastic and adjuvant drugs prescribed at different doses and times. Various protocols exist, with either small or large and numerous variations to others, making it hard to compare chemotherapies to each other, comparing their differential outcomes, and in the end choosing the most adapted one for a particular patient. Method: We propose ChemoKG, a knowledge graph for chemotherapy protocols that encompasses first administration programs such as drugs, dosages, treatment durations, and second drug properties and classes imported from ChEBI, DrugBank and the ATC classification. Three resources on drugs provide complementary hierarchies and chemical properties that help to better identify similar chemotherapy protocols. To this aim, we tested on ChemoKG a novel graph embedding method employing graph neural networks (GNNs) to compare nodes in the graph that represent protocols. Unlike previous approaches that focus on triple-based embeddings, the proposed method captures subgraph structures inherited from the aggregation scheme in GNNs. Results: The resulting knowledge graph encompasses 329,164 triples with 99,901 entities and 75 predicates including 1,358 chemotherapy protocols and 226 anti-cancer drugs. We performed a cluster analysis of protocol embeddings learned on ChemoKG, to propose groups of similar protocols. This will contribute in facilitating the comparison of chemotherapy themselves, and by extension to their potential effectiveness. Additionally, it should aid in analyzing gaps between commonly accepted protocols and their real-world implementation
2022
- Design of an Ontology-Based Triage System for Patients with Chronic PainAlexandre Saadi, Alice Rogier, Anita Burgun, and 1 more authorStudies in health technology and informatics, 2022
Objective:Waiting time for a consultation for chronic pain is a widespread health problem. This paper presents the design of an ontology use to assess patients referred to a consultation for chronic pain. Methods: We designed OntoDol, an ontology of pain domain for patient triage based on priority degrees. Terms were extracted from clinical practice guidelines and mapped to SNOMED-CT concepts through the Python module Owlready2. Selected SNOMED-CT concepts, relationships, and the TIME ontology, were implemented in the ontology using Protégé. Decision rules were implemented with SWRL. We evaluated OntoDol on 5 virtual cases. Results: OntoDol contains 762 classes, 92 object properties and 18 SWRL rules to assign patients to 4 categories of priority. OntoDol was able to assert every case and classify them in the right category of priority. Conclusion: Further works will extend OntoDol to other diseases and assess OntoDol with real world data from the hospital.
- Using an Ontological Representation of Chemotherapy Toxicities for Guiding Information Extraction and Integration from EHRsAlice Rogier, Adrien Coulet, and Bastien RanceStudies in health technology and informatics, 2022
Introduction: Chemotherapies against cancers are often interrupted due to severe drug toxicities, reducing treatment opportunities. For this reason, the detection of toxicities and their severity from EHRs is of importance for many downstream applications. However toxicity information is dispersed in various sources in the EHRs, making its extraction challenging. Methods: We introduce OntoTox, an ontology designed to represent chemotherapy toxicities, its attributes and provenance. We illustrated the interest of OntoTox by integrating toxicities and grading information extracted from three heterogeneous sources: EHR questionnaires, semi-structured tables, and free-text. Results: We instantiated 53,510, 2,366 and 54,420 toxicities from questionnaires, tables and free-text respectively, and compared the complementarity and redundancy of the three sources. Discussion: We illustrated with this preliminary study the potential of OntoTox to guide the integration of multiple sources, and identified that the three sources are only moderately overlapping, stressing the need for a common representation.
2020
- Natural Language Processing for Rapid Response to Emergent Diseases: Case Study of Calcium Channel Blockers and Hypertension in the COVID-19 PandemicAntoine Neuraz, Ivan Lerner, William Digan, and 8 more authorsJournal of medical Internet research, 2020
Background: A novel disease poses special challenges for informatics solutions. Biomedical informatics relies for the most part on structured data, which require a preexisting data or knowledge model; however, novel diseases do not have preexisting knowledge models. In an emergent epidemic, language processing can enable rapid conversion of unstructured text to a novel knowledge model. However, although this idea has often been suggested, no opportunity has arisen to actually test it in real time. The current coronavirus disease (COVID-19) pandemic presents such an opportunity. Objective: The aim of this study was to evaluate the added value of information from clinical text in response to emergent diseases using natural language processing (NLP). Methods: We explored the effects of long-term treatment by calcium channel blockers on the outcomes of COVID-19 infection in patients with high blood pressure during in-patient hospital stays using two sources of information: data available strictly from structured electronic health records (EHRs) and data available through structured EHRs and text mining. Results: In this multicenter study involving 39 hospitals, text mining increased the statistical power sufficiently to change a negative result for an adjusted hazard ratio to a positive one. Compared to the baseline structured data, the number of patients available for inclusion in the study increased by 2.95 times, the amount of available information on medications increased by 7.2 times, and the amount of additional phenotypic information increased by 11.9 times. Conclusions: In our study, use of calcium channel blockers was associated with decreased in-hospital mortality in patients with COVID-19 infection. This finding was obtained by quickly adapting an NLP pipeline to the domain of the novel disease; the adapted pipeline still performed sufficiently to extract useful information. When that information was used to supplement existing structured data, the sample size could be increased sufficiently to see treatment effects that were not previously statistically detectable.