Language Resources and Evaluation Conference (LREC) 20262026
Hyperbolic embeddings such as the Poincaré model effectively represent lexical hierarchies with low distortion, yet their cross-lingual generalizability remains largely unexplored. This study investigates cross-lingual transfer by training 20-dimensional Poincaré embeddings exclusively on Open English WordNet (OEWN) hypernymy relations and evaluating on aligned Chinese Wordnet (CWN) synsets under a vocabulary-constrained transfer setting, where CWN-relevant synsets appear in OEWN training data but no Chinese-language supervision is used. We report robust statistical evidence based on the final 10 training checkpoints: Poincaré embeddings achieve 2.57× higher Mean Reciprocal Rank (MRR) than Euclidean embeddings on CWN (0.030 ± 0.001 vs 0.012 ± 0.000, p < 0.001, Cohen’s d = 34.48) and 5.61× higher on OEWN (0.016 ± 0.000 vs 0.003 ± 0.000, p < 0.001, d = 42.48). Furthermore, hierarchical filtering leveraging the radial dimension of hyperbolic space provides substantial additional gains: +74.6% MRR improvement on CWN and +25.8% on OEWN (both p < 0.001). The model achieves higher absolute performance on the zero-shot CWN test set (MRR = 0.052 ± 0.002) than on the in-domain OEWN test set (MRR = 0.020 ± 0.001). We attribute this to structural alignment: CWN’s broader branching factor (4.32 vs 1.10) and moderate depth naturally suit hyperbolic geometry’s capacity to compactly represent hierarchies. Our findings demonstrate that geometric properties learned from English hypernymy transfer robustly across languages when semantic structures align. We release the aligned CWN–OEWN hypernymy evaluation dataset and complete evaluation framework to facilitate future research on geometry-based cross-lingual semantic modeling.
papersource
@inproceedings{ku-etal-2026-when,
title = {When Structure Matters: Cross-Lingual Hyperbolic Embeddings for Chinese and English Wordnets},
author = {Mao-Chang Ku AND Da-Chen Lian AND Pin-Er Chen AND Po-Ya Angela Wang AND Wei-Ling Chen AND Shu-Kai Hsieh},
booktitle = {Proceedings of the Fifteenth Language Resources and Evaluation Conference (LREC 2026)},
month = {May},
year = {2026},
pages = {12054--12071},
address = {Palma, Mallorca, Spain},
publisher = {European Language Resources Association (ELRA)},
editor = {Piperidis, Stelios AND Bel, Núria AND van den Heuvel, Henk AND Ide, Nancy AND Krek, Simon AND Toral, Antonio},
doi = {10.63317/55a4sr9mfucq},
abstract = {Hyperbolic embeddings such as the Poincaré model effectively represent lexical hierarchies with low distortion, yet their cross-lingual generalizability remains largely unexplored. This study investigates cross-lingual transfer by training 20-dimensional Poincaré embeddings exclusively on Open English WordNet (OEWN) hypernymy relations and evaluating on aligned Chinese Wordnet (CWN) synsets under a vocabulary-constrained transfer setting, where CWN-relevant synsets appear in OEWN training data but no Chinese-language supervision is used. We report robust statistical evidence based on the final 10 training checkpoints: Poincaré embeddings achieve 2.57× higher Mean Reciprocal Rank (MRR) than Euclidean embeddings on CWN (0.030 ± 0.001 vs 0.012 ± 0.000, p < 0.001, Cohen’s d = 34.48) and 5.61× higher on OEWN (0.016 ± 0.000 vs 0.003 ± 0.000, p < 0.001, d = 42.48). Furthermore, hierarchical filtering leveraging the radial dimension of hyperbolic space provides substantial additional gains: +74.6% MRR improvement on CWN and +25.8% on OEWN (both p < 0.001). The model achieves higher absolute performance on the zero-shot CWN test set (MRR = 0.052 ± 0.002) than on the in-domain OEWN test set (MRR = 0.020 ± 0.001). We attribute this to structural alignment: CWN’s broader branching factor (4.32 vs 1.10) and moderate depth naturally suit hyperbolic geometry’s capacity to compactly represent hierarchies. Our findings demonstrate that geometric properties learned from English hypernymy transfer robustly across languages when semantic structures align. We release the aligned CWN–OEWN hypernymy evaluation dataset and complete evaluation framework to facilitate future research on geometry-based cross-lingual semantic modeling.}
}
Empowering Elementary Learning: Utilizing Large Language Models to Craft Tailored Textbooks with Expert Insight
Large language models (LLMs) have in recent years spurred research across various sectors, owing to their remarkable zero-shot or few-shot performance. This capability has become indispensable for individuals seeking to integrate these language models into their workflows effectively. In this paper, based on in-depth linguistic analyses, we explore the application of an LLM, specifically GPT-4, in generating Chinese language textbooks tailored for grade school students. This encompasses the creation of main lesson texts alongside accompanying Chinese character exercises. Experimental results suggest that the LLM-generated textbook lessons are a viable research direction. The initial outcomes demonstrate the ability of LLM to generate texts of satisfactory quality appropriate for a specified grade level. The contributions of this work include pioneering the quantitative analysis of Chinese language textbooks for native speakers in Taiwan and leveraging an LLM to automatically generate textbook content and accompanying Chinese character exercises targeted at native Chinese speakers, which is a novel approach facilitated by the development of prompts tailored to different language learning levels. The study also conducts quantitative and qualitative comparisons between machine-generated lessons and those developed by educational professionals in Taiwan.
paper
@article{lian_empowering_2025,
title = {Empowering Elementary Learning: Utilizing Large Language Models to Craft Tailored Textbooks with Expert Insight},
author = {Da-Chen Lian AND Mao-Chang Ku AND Po-Ya Angela Wang AND Wei-Ling Chen AND Shu-Kai Hsieh},
journal = {Journal of Library and Information Studies},
year = {2025},
}
Vec2Gloss: definition modeling leveraging contextualized vectors with Wordnet gloss
Yu-Hsiang Tseng, Mao-Chang Ku, Wei-Ling Chen, Yu-Lin Chang, Shu-Kai Hsieh
Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation2023
Contextualized embeddings have proven to be powerful tools in various NLP tasks. However, their interpretability and how they encode lexical semantics remain challenging issues. In this paper, we tackle this problem by using definition modeling, a technique that aims to generate human-readable definitions for words, as a means to evaluate and understand high-dimensional semantic vectors. We introduce the Vec2Gloss model, which generates glosses from the contextualized embeddings of target words. The systematic gloss patterns provided by Chinese Wordnet enable us to examine the mechanism behind the model’s gloss generation. To delve deeper into this mechanism, we devise two dependency indices to measure the semantic and contextual dependencies of the generated glosses. These indices allow us to analyze the generated texts at both the gloss and token levels. Our results demonstrate that the proposed Vec2Gloss model enhances our understanding of lexical semantics in contextualized embeddings.
paper
@inproceedings{tseng_vec2gloss_2023,
title = {Vec2Gloss: definition modeling leveraging contextualized vectors with Wordnet gloss},
author = {Yu-Hsiang Tseng AND Mao-Chang Ku AND Wei-Ling Chen AND Yu-Lin Chang AND Shu-Kai Hsieh},
booktitle = {Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation},
year = {2023},
}