Generative Spatial Gene Intelligence (GSI) for Human Muscle Dynamics
Code: BBSRC-DFA_2026_18 (CASE)
Primary Supervisor: James Timmons
Email: j.timmons@qmul.ac.uk
Institute: William Harvey Research Institute
Secondary Supervisor: Qianni Zhang
Email: qianni.zhang@qmul.ac.uk
Institute: School of Electronic Engineering and Computer Science
CASE Partner: AstraZeneca
Abstract:
Maintaining muscle integrity is a central challenge for healthy aging in humans. This project builds on novel human network representations of >1,650 bulk transcriptomics profiles and >100 paired 960-gene true single-cell Merscope profiles. Using single‑sample datasets comprising 7‑slice Z‑stacks, we will develop cross‑scale generative models for reconstructing high‑fidelity 3D tissue structure from limited spatial information and dense network representations. We will primarily focus on the major human muscle fibres, which are multi-nucleated and present a challenge to AI driven single-cell segmentation. Methods including diffusion-based slice inpainting, generative super resolution and 3D priors will be developed to integrate information from Z stacks, enabling accurate segmentation of muscle cells under distinct load conditions (associated with atrophy or hypertrophy) across the depth in Z plane (10um). Using established standard 2D segmentation and neighborhood-based analyses as a baseline, we employ recently emerging 3D spatial modelling approaches originally developed for other tissue types, and then develop novel AI-based frameworks tailored to the unique structural and functional properties of human muscle. These will include learning-based methods for three-dimensional spatial reconstruction, graph-based modelling of cell–cell interactions, and multimodal representation learning that integrates spatial gene expression, tissue morphology and muscle load status. The project will deliver broadly applicable AI methodologies for multimodal image-based data integration and analysis.
Lay Summary:
Multiple chronic diseases interact to promote muscle weakness, cause frailty and undermine healthy aging. As part of a 20y multidisciplinary project we have established a unique large cohort of muscle tissue profiles (from men and women and applied multiple new technologies to better understand the regulators of muscle integrity in health and disease. Our particular focus is understanding how muscle load connects with aging and atrophy.
To this end we have measure gene ‘activation’ in >1650 muscle biopsy samples from human muscle under different levels of physical activity. For a subset of samples we have generated Merscope 960-gene profiles and produced standard pathology “H&E” serial section images (n=150) to complement the physiology we have for each person. Our aim is to use these data to build novel models that help us understand how each cell type in the muscle tissue contribute to functional status.
This PhD project will exploit this uniquely large and deeply phenotyped human muscle dataset to develop advanced AI-based models for spatial and molecular tissue analysis. Specifically, the PhD candidate will develop data-driven, multi data type methods that integrate spatial gene expression, tissue morphology, and physiological variables to characterise cell-type-specific behaviour and cell–cell interactions within muscle tissue.
Long term our work may help identify the pathways that control muscle function and help screen for potential drugs to enhance patient function and quality of life. Further, the fundamental nature of AI research components may have broad impact on image-based data modelling beyond single cell spatial biology.
Aims and Objectives:
Contrast 2D and 3D spatial modelling to evaluate how each approach characterizes human muscle cell-type (myocyte, immune, vascular and stem-cell) during dynamic changes in load-status and insulin sensitivity
This will be achieved using 2D and 3D graph neural network (GNN) frameworks to construct hierarchical spatial graphs that capture cell–cell interactions at multiple spatial scales. Spatio-temporal representation learning will be employed to derive embeddings that encode dynamic changes in tissue organisation under varying load conditions and insulin sensitivity. In addition, contrastive learning strategies will be exploited to compare representations between healthy and diseased states, as well as between loaded and unloaded conditions, enabling systematic evaluation of the added value of 3D spatial modelling over conventional 2D approaches.
Develop data integration methods that enable stitching of both Z-stack (up to 7) and 10um serial sections to enable distance calculations between cell types and cell molecular sub-phenotypes to be constructed and related to clinical status
This will involve learning-based image registration and stitching methods to align Z-stack volumes and serial tissue sections, coupled with 3D reconstruction models designed to handle missing or sparsely sampled slices . The reconstructed volumes can be represented using graph-based spatial models, enabling robust calculation of distances and spatial relationships between cell types and molecular sub-phenotypes, and facilitating their association with clinical status.
Use paired Spatial images/gene counts, H&E images and bulk transcriptomics data to infer spatial molecular profiles from only H&E data. First, multi-instance learning (MIL) methods will be applied to learn whole-slide and patch-level embeddings that link H&E tissue morphology to underlying gene expression. Then, generative models, such as conditional variational autoencoders and diffusion-based approaches, will be developed to synthesise spatial gene expression maps directly from H&E images, trained using paired H&E, spatial transcriptomic, and bulk transcriptomic data. To improve robustness and generalisability, self-supervised pretraining and transfer learning strategies will be explored, leveraging large-scale histopathology datasets and available paired spatial transcriptomics–H&E data.