Methods for analysis of spatial peptidergic communication networks

Author

Donghyun Kim

Figure 1: Peptidergic communication networks

Communication in the brain can occur between cells that release specific neuropeptides and cells that sense these neuropeptides.

In this study, we develop computational methods to study the organization of this form of communication.

We explore regimes in which neighborhood information is helpful for cell type classification with graph neural networks and adapt Node2Vec (Grover and Leskovec 2016) to obtain multilayer graph embeddings.

Background on neuropeptidergic communication

We study neuropeptidergic communication between ligands and receptors in the brain.

Ligands are small molecules that bind selectively to receptor proteins.

The ligands of interest are peptides (small chains of amino acids), and the receptor proteins of interest are G-protein coupled receptors or GPCRs (Rosenbaum, Rasmussen, and Kobilka 2009).

Peptides released by one neuron can bind to GPCRs on other neurons and modulate their activity. Many such cognate neuropeptide-GPCR pairs have been identified in the literature.

Single cell transcriptomic data suggests that a vast majority of cortical neurons participate in such communication in a cell-type dependent manner (Smith et al. 2019).

Thus, compatible (a.k.a cognate) peptide-GPCR pairs can be considered as an unique communication channel between neurons (Smith et al. 2019).

It is important to study these networks because combinatorial expression of neuropeptides and receptors enables local, type specific communication between neurons. Such peptidergic signaling is slower and longer lasting compared to synaptic signaling.

Computational studies also suggest that such neuron-type–specific local neuromodulation may be a critical piece of the biological credit-assignment puzzle (Liu et al. 2021).

Background on spatial transcriptomic data

We study spatial peptidergic communication networks with spatial transcriptomic data.

Spatial transcriptomic experiments measure gene expression while preserving spatial context. There are many technologies to acquire such data, with trade-offs in spatial resolution, speed, and compatibility with other kinds of experiments on the same tissue sample Chen et al. (2023). Some of these technologies, namely SeqFISH, MERFISH, Slide-Seq have been used to produce atlas scale datasets in the mouse brain by the BICCN (Hawrylycz et al. 2023).

We consider a recent whole mouse brain dataset obtained with an in-situ hybridization-based technique called MERFISH (Yao et al. 2023).

This dataset consists of expression measurements of approximately 500 genes in 4 million cells from a single mouse brain.

We focus on the primary visual cortex (VISp region) in this study as shown in Figure 2.

Figure 2: VISp region of the brain. The different colors assigned to cells correspond to different subclasses.

We note that the subclass distribution of cells in the dataset is imbalanced. This makes subclass classification tasks more difficult.

Figure 3: Subclass distribution of cells in VISp

Neuropeptidergic communication as directed multilayer graphs

Peptidergic communication networks can be viewed as directed multilayer graphs.

The multilayer graph G is the set \(\{V, \{E_l\}\}\) where:

\(V\) is the set of vertices (cells).

\(E_l\) s the set of directed edges for the lth network.

For the \(l\) th network, an edge exists from vertex \(i\) to vertex \(j\) if all the following conditions are met:

  1. distance \(d[i,j] \leq d_{threshold}\)
  2. vertex i expresses the peptide precursor gene
  3. vertex j expresses the GPCR gene

The peptidergic multilayer graph does not contain any edges between layers.

We set \(d_{threshold} = 40 \textmu m\) for all experiments unless mentioned otherwise.

Code for graph construction can be found here.

Figure 4: Sliced section of each layer of the multilayer graph, where each layer corresponds to a unique communication channel

5 neuropeptide precursor and GPCR pairs were experimentally measured in the MERFISH dataset. The 5 pairs each make up a layer in the multilayer graph as shown in Figure 4.

Node classification with graph neural networks

We first explore regimes in which neighborhood information is helpful for cell type classification with graph neural networks.

Graph neural networks combine features of nodes that are adjacent according to the underlying graph.

Figure 5: Overview of graph attention networks

A nonlinear function ∅ of aggregated (sum here) weighted node features determines updated node representation.

In GATv2, the node features are weighted using a function of node pairs, \(\theta\) (Brody, Alon, and Yahav 2021).

Functions \(\phi\) and \(\theta\) are learned to optimize a per-node classification objective.

Code for node classification with graph neural networks can be found here.

To test if neighborhood information is helpful for cell type classification, we construct a graph with cells as nodes and their gene expression values as initial node features. Cognate neuropeptide-GPCR pairs that lie within the specified distance are connected with an edge.

Initial experiments showed that a relatively simple classifier like a support vector machine could predict supertypes extremely well (> 95%) if given the full gene expression (550 genes) as input. In such a setting, neighborhood information won’t contribute much to the classification task.

Thus, we considered a more difficult problem where only a subset of genes are available. We randomly selected 3 nonoverlapping subsets with size equal to 5% of the total genes available. We combined these subsets to create 3 subsets of size equal to 5%, 10% and 15% of the total genes available.

Figure 6: Supertype classification accuracy with graph neural networks where we use cells as nodes and subsets of their gene expression values as initial node features. Cognate neuropeptide-GPCR pairs that lie within the specified distance are connected with an edge.

We found that gene expression in neighboring cells can improve supertype classification when fewer genes are available.

Node embeddings

Having found that neighboorhood information can be helpful for cell type classification, we wanted to see if the neighborhood structure of a cell by itself (without any gene expression information) can be informative of cell types.

To do so, we use node embedding methods.

Node2Vec learns an embedding \(f(u)\) that is predictive of the neighborhood \(\mathcal{N}(u)\) of each node \(u\).

Figure 7: Overview of Node2Vec

The neighborhood for each node is determined through fixed length, 2nd order random walks starting from each node.

Parameters \(p\) and \(q\) determine the transition probabilities for the random walk.

Spectral graph clustering frames the problem as an eigen decomposition (Von Luxburg 2007).

Specifically, the eigenvectors of the Laplacian matrix are used as features for clustering.

Figure 8: Eigenvectors of the laplacian matrix for the cell x cell matrix from the toy problem in Figure 9

Simulation

We build intuition for how embeddings based on peptidergic connectivity could partition cells into meaningful groups even in the absence of explicit gene expression information with a toy problem.

Figure 9: Type x type matrix, cell x cell matrix, and visualization of the cell x cell matrix

We start with a type x type adjaceny matrix that shows which cell types can communicate with which cell types. The type x type adjacency matrix can be sampled to obtain a cell x cell adjacency matrix. The block structure is equivalent to the graph having distinct connected components (as shown on the right).

Code for Node2Vec and spectral graph clustering applied to the simulation can be found here.

Figure 10: TSNE projection of Node2Vec embeddings obtained from the cell x cell matrix from Figure 9

Clustering the embeddings obtained with Node2Vec is a scalable way to identify such components. We found that spectral graph clustering can identify these components too. However, spectral graph clustering has limited applicability to our dataset given its computational complexity of \(O(n^3)\) (Von Luxburg 2007).

Conencting this back to the problem of peptidergic communication, we can think of the dark blue, light blue, red, and green nodes as 4 cell types that communicate with each other, and the yellow, brown, orange, and violet nodes as 4 cell types that communicate with each other.

This type x type adjacency matrix denotes 1 communication network between the cells or 1 layer in the multilayer graph. With 1 layer, we can distinguish between 2 groups of cells.

Multilayer graph clustering

For multilayer graphs, one approach to obtain embeddings for the whole graph is to obtain independent embeddings for each layer in the multigraph. Combining information from each embedding can identify all the ‘types’ when the blocks are chosen carefully across layers.

If we have n types \((t_1,…,t_n)\) and k layers \((X_1,…, X_k)\):

\(P(t_i) = \frac{1}{n}\)

\(P(t_i | X_1) = \frac{2}{n}\)

\(P(t_i | X_1, X_2) = \frac{4}{n}\)

\(P(t_i | X_1, X_2, ..., X_k) = 1\)

where \(k = \log_2 n\)

Figure 11: 3 carefully chosen cell x cell matrices and a TSNE projection of concatenated Node2Vec embeddings from the 3 matrices

An embedding is generated from each type x type matrix. The 3 embeddings are concatenated and visualized with TSNE. We see that with 3 layers, we can perfectly distinguish between the 8 cell types.

Simiarly, we hypothesize that with 3 peptidergic communication networks, we can distinguish between 8 groups of cells.

Node2Vec performance against noise/sparsity

Because gene expression data is noisy and our peptidergic communication networks are quite sparse, we study Node2Vec’s performance when faced with these issues.

We see that Node2Vec embeddings cluster according to the underlying block structure in a manner that is robust to sparsity as well as noise in the cell x cell graph.

Noise experiment Sparsity experiment

Application to MERFISH VISp data

We apply Node2Vec to MERFISH VISp data to inspect the utility of graph embeddings in identifying celltypes (subclass) and spatial groupings.

We consider 4 different methods of obtaining Node2Vec embeddings.

  1. considering only proximity of cells in the graph (distance)
  2. concatenating embeddings of individual peptidergic graphs (independent)
  3. collapsing the multi-layer graphs into a single layer graph (squashed)
  4. jointly considering the multilayer graphs with an adapted Node2Vec model (joint)

For the joint method, we sample random walks from all layers to determine the neighborhood for each node (as opposed to neighborhoods constructed from only 1 layer). To do so, we modify the Pytorch Geometric implementation of Node2Vec.

Code for distance, independent, squashed, and joint.

Figure 12: Topmost plots are TSNE projections of embeddings with colors obtained from leiden clustering. Bottom plots are projections of leiden clusters onto the VISp region. Accuracy refers to subclass classification accuracy using the embeddings. Note that classification accuracy using a random classifier is approximately 25%.

We observe that even in the absence of explicit gene expression information, embeddings obtained from peptidergic connectivity contain information about cell types.

We speculate that cell clusters obtained with leiden clustering correspond to spatial domains relevant from a functional point of view.

Gene Imputation

We speculated that with more communication channels (more layers in the peptidergic multilayer graph), we should be able to distinguish between cells with higher accuracy (only 5 communication channels were available in the dataset out of 37 putative channels).

To test this hypothesis, we tried gene imputation methods.

Gene imputation methods can be used to recover the expression of peptidergic communication genes that were not part of the MERFISH dataset. Several tools have been benchmarked for this task (Li et al. 2022), and we used the top performing model Tangram to impute values of all peptidergic communication genes.

Tangram maps cells from a scRNA-seq dataset to cells (or voxels) from a spatial transcriptomic dataset (Biancalani et al. 2021).

Tangram learns a mapping matrix M defined such that \(M_{ij} \geq 0\) is the probability of cell i (from scRNA-seq) mapping to cell \(j\) (from spatial transcriptomic data). The loss function used to optimize M is shown below.

\[ L(S, M) = \sum_{k}^{n_{genes}} \cos_{sim} ((M^T S)_{*, k}, G_{*,k}) \]

S: scRNA-seq matrix (cell x gene)

G: spatial transcriptomic data matrix (cell x gene)

M: mapping matrix (cell * cell)

We assessed imputation performance by using peptidergic communication genes that were already measured as test data.

Figure 13: The diagrams on the left are scatter plots of measured and predicted gene expression values for test genes (colors indicate subclass). The diagrams on the right are violin plots of test gene expression for each subclass (blue indicates real gene expression values and orange indicates imputed values).

We found that the correlation between imputed gene expression and actual gene expression is not strong enough for us to use imputed results in our multilayer graphs. In particular, a non trivial amount of null gene expression values were mapped to positive values.

Limitations and future work

Only 5 out of the 37 cognate neuropeptide precursor and GPCR pairs were experimentally measured in the MERFISH dataset, which precludes an assessment at more detailed levels of the taxonomy.

Neuron morphology may be an important consideration relevant for the full spatial extent of peptidergic communication. Arbor density representations derived from patch-seq datasets could offer a way forward.


The poster can be found here.

Code repository: https://github.com/donghyunkm/cciGraphs


We thank Tom Chartrand, Ian Convy, Olga Gliko, Anna Grim, Yeganeh Marghi, Uygar Sümbül, Meghan Turner, and Kasey Zhang for helpful feedback and discussions.

References

Biancalani, Tommaso, Gabriele Scalia, Lorenzo Buffoni, Raghav Avasthi, Ziqing Lu, Aman Sanger, Neriman Tokcan, et al. 2021. “Deep Learning and Alignment of Spatially Resolved Single-Cell Transcriptomes with Tangram.” Nature Methods 18 (11): 1352–62.
Brody, Shaked, Uri Alon, and Eran Yahav. 2021. “How Attentive Are Graph Attention Networks?” arXiv Preprint arXiv:2105.14491.
Chen, Tsai-Ying, Li You, Jose Angelito U Hardillo, and Miao-Ping Chien. 2023. “Spatial Transcriptomic Technologies.” Cells 12 (16): 2042.
Grover, Aditya, and Jure Leskovec. 2016. “Node2vec: Scalable Feature Learning for Networks.” In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 855–64.
Hawrylycz, Michael, Maryann E Martone, Giorgio A Ascoli, Jan G Bjaalie, Hong-Wei Dong, Satrajit S Ghosh, Jesse Gillis, et al. 2023. “A Guide to the BRAIN Initiative Cell Census Network Data Ecosystem.” PLoS Biology 21 (6): e3002133.
Lein, Ed, Lars E Borm, and Sten Linnarsson. 2017. “The Promise of Spatial Transcriptomics for Neuroscience in the Era of Molecular Cell Typing.” Science 358 (6359): 64–69.
Li, Bin, Wen Zhang, Chuang Guo, Hao Xu, Longfei Li, Minghao Fang, Yinlei Hu, et al. 2022. “Benchmarking Spatial and Single-Cell Transcriptomics Integration Methods for Transcript Distribution Prediction and Cell Type Deconvolution.” Nature Methods 19 (6): 662–70.
Liu, Yuhan Helena, Stephen Smith, Stefan Mihalas, Eric Shea-Brown, and Uygar Sümbül. 2021. “Cell-Type–Specific Neuromodulation Guides Synaptic Credit Assignment in a Spiking Neural Network.” Proceedings of the National Academy of Sciences 118 (51): e2111821118.
Marx, Vivien. 2021. “Method of the Year: Spatially Resolved Transcriptomics.” Nature Methods 18 (1): 9–14.
Moffitt, Jeffrey R, Emma Lundberg, and Holger Heyn. 2022. “The Emerging Landscape of Spatial Profiling Technologies.” Nature Reviews Genetics 23 (12): 741–59.
Ortiz, Cantin, Marie Carlén, and Konstantinos Meletis. 2021. “Spatial Transcriptomics: Molecular Maps of the Mammalian Brain.” Annual Review of Neuroscience 44 (1): 547–62.
Rosenbaum, Daniel M, Søren GF Rasmussen, and Brian K Kobilka. 2009. “The Structure and Function of g-Protein-Coupled Receptors.” Nature 459 (7245): 356–63.
Smith, Stephen J, Uygar Sümbül, Lucas T Graybuck, Forrest Collman, Sharmishtaa Seshamani, Rohan Gala, Olga Gliko, et al. 2019. “Single-Cell Transcriptomic Evidence for Dense Intracortical Neuropeptide Networks.” Elife 8: e47889.
Von Luxburg, Ulrike. 2007. “A Tutorial on Spectral Clustering.” Statistics and Computing 17: 395–416.
Yao, Zizhen, Cindy TJ van Velthoven, Michael Kunst, Meng Zhang, Delissa McMillen, Changkyu Lee, Won Jung, et al. 2023. “A High-Resolution Transcriptomic and Spatial Atlas of Cell Types in the Whole Mouse Brain.” Nature 624 (7991): 317–32.