Formal Semantic Control over Language Models

About

This thesis advances semantic representation learning to render language representations or models more semantically and geometrically interpretable, and to enable localised, quasi-symbolic, compositional control through deliberate shaping of their latent space geometry. We pursue this goal within a VAE framework, exploring two complementary research directions: (i) Sentence-level learning and control: disentangling and manipulating specific semantic features in the latent space to guide sentence generation, with explanatory text serving as the testbed; and (ii) Reasoning-level learning and control: isolating and steering inference behaviours in the latent space to control NLI. In this direction, we focus on Explanatory NLI tasks, in which two premises (explanations) are provided to infer a conclusion. The overarching objective is to move toward language models whose internal semantic representations can be systematically interpreted, precisely structured, and reliably directed. We introduce a set of novel theoretical frameworks and practical methodologies, together with corresponding experiments, to demonstrate that our approaches enhance both the interpretability and controllability of latent spaces for natural language across the thesis.

Yingji Zhang• 2026

Related benchmarks

Task	Dataset	Result
Mathematical Reasoning	Mathematics out-of-domain (test)	--	30
Conclusion Generation	EntailmentBank (test)	BLEU42	26
Sentence Interpolation Smoothness	ARGO randomly sampled 200 sentence pairs	Average IS0.282	22
Autoencoding	Mathematical expressions EVAL (test)	BLEU98	22
Natural Language Inference	EntailmentBank (test)	--	20
Language modelling	Explanatory sentences	BLEU65	19
Language modelling	Mathematical expression EVAL (test)	Exact Match100	19
Disentanglement	ARG0	Accuracy98	18
Autoencoding	Explanatory sentences (test)	BLEU82	13
Explanatory Inference	EntailmentBank	BLEU46	12

Showing 10 of 22 rows

Other info

Follow for update

@wizwand_team Discord