Multi-VALUE: Cross-Dialectal NLP

About

⚠️ Problem: Dialect differences cause performance issues for many types of users of language technologies. We show significant performance disparities in dialect QA, MT, and Semantic Parsing tasks. If we want fair and equitable NLP, system performance should be constant over dialect shifts to avoid allocative harms, which we call dialect invariance

📊 Data: Multi-VALUE is a suite of resources for evaluating and achieving English dialect invariance. It contains tools for systematically modifying written text in accordance with 189 attested linguistic patterns from 50 varieties of English. Researchers can use this to build dialect stress tests and train more robust models for their task using data augmentation.

Multi-VALUE: A Framework for Cross-Dialectal English NLP

Paper Code Data

🌐 Modeling: Multi-VALUE powers several plug-and-play methods to mitigate dialect discrepancies for different data scenarios. When aiming to enable a specific community of speakers, task-agnostic dialect adapters provide invariance using a small number of pairwise translations¹. However, language use varies across time, place, and social scenario creating many low-data scenarios. We work towards addressing these by integrating expert linguistic knowledge into Language Models with dynamic aggregation of linguistic rules² and model adaptation using a HyperNetwork condition on eWAVE features³.

Attribution: Multi-VALUE was designed and built at the SALT Lab 🧂 at Stanford University 🌲 and the Georgia Institute of Technology 🐝. This resource draws heavily from the Electronic World Atlas of Varieties of English (eWAVE), which aggregates the work of over 80 field linguists.

What Multi-VALUE can do for you

In short, Multi-VALUE can produce synthetic forms of dialectal text with modular control over which linguistic features are expressed. This will allow you to systematically isolate and the effects of certain syntactic and morphological structures on the performance of English NLP systems. Research applications include:

📐 Benchmarking: ML researchers can more comprehensively evaluate task performance across domain shifts.

⚖️ Bias and Fairness: Fairness, accountability and transparency researchers can more directly examine the ways NLP systems systematically harm disadvantaged or protected groups.

🌏 Linguistic Typology: Computational linguists can systematically understand the internal representations of large language models according to the literature on theoretical and field linguistics.

🌱 Low-resource NLP: Practitioners can adapt models to dialects which have limited labeled data by building on rich knowledge from dialectology.

Getting Started

Use the following demo to start using Multi-VALUE.

Installation

		pip install value-nlp

Demo

from multivalue import Dialects

southern_usa = Dialects.SoutheastAmericanEnclaveDialect()()

print(southern_usa)

[VECTOR DIALECT] Southeast American enclave dialects (abbr: SEAmE)
      Region: United States of America
      Latitude: 34.2
      Longitude: -80.9

southern_usa.transform("I talked with them yesterday")

'I done talked with them yesterday'

southern_usa.executed_rules

{(2, 7): {'value': 'done talked','type': 'completive_done'}}

Important Considerations

Limitations

🤔 Multi-VALUE does not cover orthographic (writing) or lexical variation; researchers can draw such variation from corpus data.

🤔 Multi-VALUE covers only what linguists have observed frequently enough to document, and the catalogue is incomplete. Speech and communication can vary in myriad forms not captured by this resource.

Additional Considerations

Please keep the following points in mind when using Multi-VALUE

📐 Benchmarking: Dialects are not fixed or deterministic – they are living elements of the communities which speak them.

⚖️ Bias and Fairness: Synthetic transformations are designed for stress-testing NLP systems and not for impersonating spoken or written dialect.

Data Agreement

I will not use Multi-VALUE for malicious purposes including not limited to: deception, impersonation, mockery, discrimination, hate speech, targeted harassment and cultural appropriation. In my use of this resource, I will respect the dignity and privacy of all people.

References

	    @inproceedings{ziems-etal-2022-value,
	            title = "{VALUE}: {U}nderstanding {D}ialect {D}isparity in {NLU}",
	            author = "Ziems, Caleb  and
	            Chen, Jiaao  and
	            Harris, Camille  and
	            Anderson, Jessica  and
	            Yang, Diyi",
	            booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
	            month = may,
	            year = "2022",
	      }

	    @inproceedings{ziems-etal-2022-multi-value,
	            title = "{M}ulti-{VALUE}: {E}valuating {C}ross-dialectal {NLP}",
	            author = "Held*, William  and
	            Ziems*, Caleb  and
	            Yang, Jingfeng  and
	            Dhamala, Jwala  and
	            Gupta, Rahul  and
	            Yang, Diyi",
	            booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics",
	            month = jul,
	            year = "2023",
	      }

	    @inproceedings{held-etal-2023-tada,
	            title = "{TADA} : {T}ask {A}gnostic {D}ialect {A}dapters for {E}nglish",
	            author = "Held, William  and
	            Ziems, Caleb  and
	            Yang, Diyi",
	            booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
	            month = jul,
	            year = "2023",
	      }

	    @inproceedings{liu-etal-2023-dada,
	            title = "{DADA}: {D}ialect {A}daptation via {D}ynamic {A}ggregation of {L}inguistic {R}ules",
	            author = "Liu, Yanchen  and
	            Held, William  and
	            Yang, Diyi",
	            booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
	            month = dec,
	            year = "2023",
	      }

	    @inproceedings{held-etal-2023-tada,
	            title = "Task-Agnostic Low-Rank Adapters for Unseen English Dialects",
	            author = "Xiao, Zedian and
	            Held, William  and
	            Liu, Yanchen  and
	            Yang, Diyi",
	            booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
	            month = dec,
	            year = "2023",
	      }

Measuring the Multi-Dialectal Gap

We find that current NLP systems show discrepancies on dialect versions of popular benchmarks like CoQA Conversationsl QA, Spider Semantic Parsing, and WMT19 Machine translation. We report some of these discrepancies here.

Reproduce the Experiments

COQA Experiments

Model		Test Dialect
Base	Train Set	SAE	AppE	ChcE	CollSgE	IndE	UAAVE	Average
BERT Base	SAE	77.2	77.4 (-3.8%)	76.6 (-0.7%)	61.5 (-25.4%)	70.8 (-9%)	71.2 (-8.4%)	71.9 (-7.3%)
	AppE	76.3 (-1.1%)	76.4 (-1%)	76.1 (-1.4%)	64.7 (-19.3%)	72.8 (-6%)	73.2 (-5.4%)	73.3 (-5.3%)
	ChcE	76.8	74.7 (-3.3%)	76.5 (-0.8%)	63.6 (-21.3%)	71.6 (-7.8%)	71.4 (-8.1%)	72.4 (-6.5%)
	CollSgE	75.7 (-1.9%)	74.1 (-4.2%)	75.5 (-2.2%)	74.7 (-3.3%)	73.6 (-4.8%)	73.4 (-5.1%)	74.5 (-3.6%)
	IndE	76.0 (-1.5%)	75.4 (-2.4%)	75.7 (-2%)	63.2 (-22%)	75.1 (-2.7%)	74.1 (-4.1%)	73.3 (-5.3%)
	UAAVE	76.1 (-1.4%)	75.6 (-2%)	76.0 (-1.5%)	64.6 (-19.5%)	74.5 (-3.6%)	75.3 (-2.5%)	73.7 (-4.7%)
	Multi	76.2 (-1.2%)	75.6 (-2%)	76.1 (-1.3%)	73.7 (-4.7%)	74.9 (-3.1%)	75.1 (-2.7%)	75.3 (-2.5%)
	In-Dialect	77.2	76.4 (-1%)	76.5 (-0.8%)	74.7 (-3.3%)	75.1 (-2.7%)	75.3 (-2.5%)	75.9 (-1.7%)
RoBERTa Base	SAE	81.8	79.1 (-3.4%)	81.5 (-0.3%)	68.8 (-18.9%)	76.1 (-7.5%)	76.6 (-6.7%)	77.3 (-5.8%)
	AppE	82.0 (0.3%)	81.8	81.8	71.2 (-14.9%)	79.0 (-3.5%)	79.6 (-2.8%)	79.2 (-3.2%)
	ChcE	81.7 (-0.1%)	79.3 (-3.1%)	81.5 (-0.4%)	68.8 (-18.9%)	76.5 (-7%)	77.3 (-5.9%)	77.5 (-5.5%)
	CollSgE	81.5 (-0.4%)	80.1 (-2.2%)	81.2 (-0.7%)	80.2 (-2%)	79.4 (-3%)	78.7 (-3.9%)	80.2 (-2%)
	IndE	81.1 (-0.8%)	80.5 (-1.5%)	80.9 (-1.1%)	67.2 (-21.7%)	80.3 (-1.9%)	79.2 (-3.3%)	78.2 (-4.6%)
	UAAVE	81.6 (-0.2%)	81.1 (-0.9%)	81.5 (-0.3%)	69.2 (-18.2%)	79.6 (-2.7%)	81.1 (-0.9%)	79.0 (-3.5%)
	Multi	80.6 (-1.5%)	80.4 (-1.7%)	80.5 (-1.6%)	78.5 (-4.2%)	79.7 (-2.7%)	80.0 (-2.2%)	80.0 (-2.3%)
	In-Dialect	81.8	81.8	81.5 (-0.4%)	80.2 (-2%)	80.3 (-1.9%)	81.1 (-0.9%)	81.1 (-0.9%)

About

Multi-VALUE: A Framework for Cross-Dialectal English NLP

Task-Agnostic Dialect Adapters for English1

Dialect Adaptation via Dynamic Aggregation of Linguistic Rules2

Task-Agnostic Low-Rank Adapters for Unseen English Dialects3