Pin-Er Chen

When Structure Matters: Cross-Lingual Hyperbolic Embeddings for Chinese and English Wordnets

Mao-Chang Ku, Da-Chen Lian, Pin-Er Chen, Po-Ya Angela Wang, Wei-Ling Chen, Shu-Kai Hsieh

Language Resources and Evaluation Conference (LREC) 2026 2026

Continual Pre-Training is (not) What You Need in Domain Adaption

Pin-Er Chen, Da-Chen Lian, Shu-Kai Hsieh, Sieh-Chuen Huang, Hsuan-Lei Shao, Jun-Wei Chiu, Yang-Hsien Lin, Zih-Ching Chen, Eddie TC Huang, Simon See

arXiv preprint arXiv:2504.13603 2025

The recent advances in Legal Large Language Models (LLMs) have transformed the landscape of legal research and practice by automating tasks, enhancing research precision, and supporting complex decision-making processes. However, effectively adapting LLMs to the legal domain remains challenging due to the complexity of legal reasoning, the need for precise interpretation of specialized language, and the potential for hallucinations. This paper examines the efficacy of Domain-Adaptive Continual Pre-Training (DACP) in improving the legal reasoning capabilities of LLMs. Through a series of experiments on legal reasoning tasks within the Taiwanese legal framework, we demonstrate that while DACP enhances domain-specific knowledge, it does not uniformly improve performance across all legal tasks. We discuss the trade-offs involved in DACP, particularly its impact on model generalization and performance in prompt-based tasks, and propose directions for future research to optimize domain adaptation strategies in legal AI.

paper source

The semantic relations in LLMs: An information-theoretic compression approach

Yu-Hsiang Tseng, Pin-Er Chen, Da-Chen Lian, Shu-Kai Hsieh

Proceedings of the Workshop: Bridging Neurons and Symbols for Natural Language Processing and Knowledge Graphs Reasoning (NeusymBridge)@ LREC-COLING-2024 2024

Compressibility is closely related to the predictability of the texts from the information theory viewpoint. As large language models (LLMs) are trained to maximize the conditional probabilities of upcoming words, they may capture the subtlety and nuances of the semantic constraints underlying the texts, and texts aligning with the encoded semantic constraints are more compressible than those that do not. This paper systematically tests whether and how LLMs can act as compressors of semantic pairs. Using semantic relations from English and Chinese Wordnet, we empirically demonstrate that texts with correct semantic pairings are more compressible than incorrect ones, measured by the proposed compression advantages index. We also show that, with the Pythia model suite and a fine-tuned model on Chinese Wordnet, compression capacities are modulated by the model’s seen data. These findings are consistent with the view that LLMs encode the semantic knowledge as underlying constraints learned from texts and can act as compressors of semantic information or potentially other structured knowledge.

paper source

Lexical Retrieval Hypothesis in Multimodal Context

Po-Ya Angela Wang, Pin-Er Chen, Hsin-Yu Chou, Yu-Hsiang Tseng, Shu-Kai Hsieh

Proceedings of the 4th Conference on Language, Data and Knowledge 2023

Multimodal corpora have become an essential language resource for language science and grounded natural language processing (NLP) systems due to the growing need to understand and interpret human communication across various channels. In this paper, we first present our efforts in building the first Multimodal Corpus for Languages in Taiwan (MultiMoco). Based on the corpus, we conduct a case study investigating the Lexical Retrieval Hypothesis (LRH), specifically examining whether the hand gestures co-occurring with speech constants facilitate lexical retrieval or serve other discourse functions. With detailed annotations on eight parliamentary interpellations in Taiwan Mandarin, we explore the co-occurrence between speech constants and non-verbal features (i.e., head movement, face movement, hand gesture, and function of hand gesture). Our findings suggest that while hand gestures do serve as facilitators for lexical retrieval in some cases, they also serve the purpose of information emphasis. This study highlights the potential of the MultiMoco Corpus to provide an important resource for in-depth analysis and further research in multimodal communication studies.

paper

Exploring affordance and situated meaning in image captions: A multimodal analysis

Pin-Er Chen, Po-Ya Angela Wang, Hsin-Yu Chou, Yu-Hsiang Tseng, Shu-Kai Hsieh

Proceedings of the 37th Pacific Asia Conference on Language, Information and Computation 2023

This paper explores the grounding issue regarding multimodal semantic representation from a computational cognitive-linguistic view. We annotate images from the Flickr30k dataset with five perceptual properties: Affordance, Perceptual Salience, Object Number, Gaze Cueing, and Ecological Niche Association (ENA), and examine their association with textual elements in the image captions. Our findings reveal that images with Gibsonian affordance show a higher frequency of captions containing 'holding-verbs' and 'container-nouns' compared to images displaying telic affordance. Perceptual Salience, Object Number, and ENA are also associated with the choice of linguistic expressions. Our study demonstrates that comprehensive understanding of objects or events requires cognitive attention, semantic nuances in language, and integration across multiple modalities. We highlight the vital importance of situated meaning and affordance grounding in natural language understanding, with the potential to advance human-like interpretation in various scenarios.

paper

陳品而

Research

Affiliated Projects

Chinese Wordnet

MultiMoCo NTU

Academic Output

Affiliated Publications

When Structure Matters: Cross-Lingual Hyperbolic Embeddings for Chinese and English Wordnets

Continual Pre-Training is (not) What You Need in Domain Adaption

The semantic relations in LLMs: An information-theoretic compression approach

Lexical Retrieval Hypothesis in Multimodal Context

Exploring affordance and situated meaning in image captions: A multimodal analysis

Let's explore language
frontiers together.

USEFUL_LINKS

LOCATE_US

Pin-Er Chen

陳品而

Research

Affiliated Projects

Chinese Wordnet

MultiMoCo NTU

Academic Output

Affiliated Publications

When Structure Matters: Cross-Lingual Hyperbolic Embeddings for Chinese and English Wordnets

Continual Pre-Training is (not) What You Need in Domain Adaption

The semantic relations in LLMs: An information-theoretic compression approach

Lexical Retrieval Hypothesis in Multimodal Context

Exploring affordance and situated meaning in image captions: A multimodal analysis

Let's explore languagefrontiers together.

Let's explore language
frontiers together.