Multi-VALUE:
  A toolkit for Cross-Dialectal NLP

About

⚠️   Problem: Dialect differences cause performance issues for many types of users of language technologies. We show significant performance disparities in dialect QA, MT, and Semantic Parsing tasks. If we want fair and equitable NLP, system performance should be constant over dialect shifts to avoid allocative harms, which we call dialect invariance

📊   Data: Multi-VALUE is a suite of resources for evaluating and achieving English dialect invariance. It contains tools for systematically modifying written text in accordance with 189 attested linguistic patterns from 50 varieties of English. Researchers can use this to build dialect stress tests and train more robust models for their task using data augmentation.

Multi-VALUE: A Framework for Cross-Dialectal English NLP

🌐   Modeling: Multi-VALUE powers several plug-and-play methods to mitigate dialect discrepancies for different data scenarios. When aiming to enable a specific community of speakers, task-agnostic dialect adapters provide invariance using a small number of pairwise translations1. However, language use varies across time, place, and social scenario creating many low-data scenarios. We work towards addressing these by integrating expert linguistic knowledge into Language Models with dynamic aggregation of linguistic rules2 and model adaptation using a HyperNetwork condition on eWAVE features3.

Task-Agnostic Dialect Adapters for English1
Dialect Adaptation via Dynamic Aggregation of Linguistic Rules2
Task-Agnostic Low-Rank Adapters for Unseen English Dialects3
Paper Code

Attribution: Multi-VALUE was designed and built at the SALT Lab 🧂 at Stanford University 🌲 and the Georgia Institute of Technology 🐝. This resource draws heavily from the Electronic World Atlas of Varieties of English (eWAVE), which aggregates the work of over 80 field linguists.

What Multi-VALUE can do for you

In short, Multi-VALUE can produce synthetic forms of dialectal text with modular control over which linguistic features are expressed. This will allow you to systematically isolate and the effects of certain syntactic and morphological structures on the performance of English NLP systems. Research applications include:


📐   Benchmarking: ML researchers can more comprehensively evaluate task performance across domain shifts.


⚖️   Bias and Fairness: Fairness, accountability and transparency researchers can more directly examine the ways NLP systems systematically harm disadvantaged or protected groups.


🌏   Linguistic Typology: Computational linguists can systematically understand the internal representations of large language models according to the literature on theoretical and field linguistics.


🌱   Low-resource NLP: Practitioners can adapt models to dialects which have limited labeled data by building on rich knowledge from dialectology.

Getting Started

Use the following demo to start using Multi-VALUE.

Installation

		pip install value-nlp
	      


Demo

from multivalue import Dialects
southern_usa = Dialects.SoutheastAmericanEnclaveDialect()()
print(southern_usa)

[VECTOR DIALECT] Southeast American enclave dialects (abbr: SEAmE)
      Region: United States of America
      Latitude: 34.2
      Longitude: -80.9

southern_usa.transform("I talked with them yesterday")

'I done talked with them yesterday'

southern_usa.executed_rules

{(2, 7): {'value': 'done talked','type': 'completive_done'}}

Important Considerations

Limitations

🤔   Multi-VALUE does not cover orthographic (writing) or lexical variation; researchers can draw such variation from corpus data.


🤔   Multi-VALUE covers only what linguists have observed frequently enough to document, and the catalogue is incomplete. Speech and communication can vary in myriad forms not captured by this resource.


Additional Considerations

Please keep the following points in mind when using Multi-VALUE


📐   Benchmarking: Dialects are not fixed or deterministic – they are living elements of the communities which speak them.


⚖️   Bias and Fairness: Synthetic transformations are designed for stress-testing NLP systems and not for impersonating spoken or written dialect.


Data Agreement

I will not use Multi-VALUE for malicious purposes including not limited to: deception, impersonation, mockery, discrimination, hate speech, targeted harassment and cultural appropriation. In my use of this resource, I will respect the dignity and privacy of all people.

References


	    @inproceedings{ziems-etal-2022-value,
	            title = "{VALUE}: {U}nderstanding {D}ialect {D}isparity in {NLU}",
	            author = "Ziems, Caleb  and
	            Chen, Jiaao  and
	            Harris, Camille  and
	            Anderson, Jessica  and
	            Yang, Diyi",
	            booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
	            month = may,
	            year = "2022",
	      }
	  

	    @inproceedings{ziems-etal-2022-multi-value,
	            title = "{M}ulti-{VALUE}: {E}valuating {C}ross-dialectal {NLP}",
	            author = "Held*, William  and
	            Ziems*, Caleb  and
	            Yang, Jingfeng  and
	            Dhamala, Jwala  and
	            Gupta, Rahul  and
	            Yang, Diyi",
	            booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics",
	            month = jul,
	            year = "2023",
	      }
	  

	    @inproceedings{held-etal-2023-tada,
	            title = "{TADA} : {T}ask {A}gnostic {D}ialect {A}dapters for {E}nglish",
	            author = "Held, William  and
	            Ziems, Caleb  and
	            Yang, Diyi",
	            booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
	            month = jul,
	            year = "2023",
	      }
	  

	    @inproceedings{liu-etal-2023-dada,
	            title = "{DADA}: {D}ialect {A}daptation via {D}ynamic {A}ggregation of {L}inguistic {R}ules",
	            author = "Liu, Yanchen  and
	            Held, William  and
	            Yang, Diyi",
	            booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
	            month = dec,
	            year = "2023",
	      }
	  

	    @inproceedings{held-etal-2023-tada,
	            title = "Task-Agnostic Low-Rank Adapters for Unseen English Dialects",
	            author = "Xiao, Zedian and
	            Held, William  and
	            Liu, Yanchen  and
	            Yang, Diyi",
	            booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
	            month = dec,
	            year = "2023",
	      }
	  

Measuring the Multi-Dialectal Gap

We find that current NLP systems show discrepancies on dialect versions of popular benchmarks like CoQA Conversationsl QA, Spider Semantic Parsing, and WMT19 Machine translation. We report some of these discrepancies here.

COQA Experiments

Model Test Dialect
Base Train Set SAE AppE ChcE CollSgE IndE UAAVE Average
BERT Base SAE 77.2 77.4 (-3.8%) 76.6 (-0.7%) 61.5 (-25.4%) 70.8 (-9%) 71.2 (-8.4%) 71.9 (-7.3%)
AppE 76.3 (-1.1%) 76.4 (-1%) 76.1 (-1.4%) 64.7 (-19.3%) 72.8 (-6%) 73.2 (-5.4%) 73.3 (-5.3%)
ChcE 76.8 74.7 (-3.3%) 76.5 (-0.8%) 63.6 (-21.3%) 71.6 (-7.8%) 71.4 (-8.1%) 72.4 (-6.5%)
CollSgE 75.7 (-1.9%) 74.1 (-4.2%) 75.5 (-2.2%) 74.7 (-3.3%) 73.6 (-4.8%) 73.4 (-5.1%) 74.5 (-3.6%)
IndE 76.0 (-1.5%) 75.4 (-2.4%) 75.7 (-2%) 63.2 (-22%) 75.1 (-2.7%) 74.1 (-4.1%) 73.3 (-5.3%)
UAAVE 76.1 (-1.4%) 75.6 (-2%) 76.0 (-1.5%) 64.6 (-19.5%) 74.5 (-3.6%) 75.3 (-2.5%) 73.7 (-4.7%)
Multi 76.2 (-1.2%) 75.6 (-2%) 76.1 (-1.3%) 73.7 (-4.7%) 74.9 (-3.1%) 75.1 (-2.7%) 75.3 (-2.5%)
In-Dialect 77.2 76.4 (-1%) 76.5 (-0.8%) 74.7 (-3.3%) 75.1 (-2.7%) 75.3 (-2.5%) 75.9 (-1.7%)
RoBERTa Base SAE 81.8 79.1 (-3.4%) 81.5 (-0.3%) 68.8 (-18.9%) 76.1 (-7.5%) 76.6 (-6.7%) 77.3 (-5.8%)
AppE 82.0 (0.3%) 81.8 81.8 71.2 (-14.9%) 79.0 (-3.5%) 79.6 (-2.8%) 79.2 (-3.2%)
ChcE 81.7 (-0.1%) 79.3 (-3.1%) 81.5 (-0.4%) 68.8 (-18.9%) 76.5 (-7%) 77.3 (-5.9%) 77.5 (-5.5%)
CollSgE 81.5 (-0.4%) 80.1 (-2.2%) 81.2 (-0.7%) 80.2 (-2%) 79.4 (-3%) 78.7 (-3.9%) 80.2 (-2%)
IndE 81.1 (-0.8%) 80.5 (-1.5%) 80.9 (-1.1%) 67.2 (-21.7%) 80.3 (-1.9%) 79.2 (-3.3%) 78.2 (-4.6%)
UAAVE 81.6 (-0.2%) 81.1 (-0.9%) 81.5 (-0.3%) 69.2 (-18.2%) 79.6 (-2.7%) 81.1 (-0.9%) 79.0 (-3.5%)
Multi 80.6 (-1.5%) 80.4 (-1.7%) 80.5 (-1.6%) 78.5 (-4.2%) 79.7 (-2.7%) 80.0 (-2.2%) 80.0 (-2.3%)
In-Dialect 81.8 81.8 81.5 (-0.4%) 80.2 (-2%) 80.3 (-1.9%) 81.1 (-0.9%) 81.1 (-0.9%)