About
⚠️ Problem: Dialect differences cause performance issues for many types of users of language technologies. We show significant performance disparities in dialect QA, MT, and Semantic Parsing tasks. If we want fair and equitable NLP, system performance should be constant over dialect shifts to avoid allocative harms, which we call dialect invariance
📊 Data: Multi-VALUE is a suite of resources for evaluating and achieving English dialect invariance. It contains tools for systematically modifying written text in accordance with 189 attested linguistic patterns from 50 varieties of English. Researchers can use this to build dialect stress tests and train more robust models for their task using data augmentation.
Multi-VALUE: A Framework for Cross-Dialectal English NLP
🌐 Modeling: Multi-VALUE powers several plug-and-play methods to mitigate dialect discrepancies for different data scenarios. When aiming to enable a specific community of speakers, task-agnostic dialect adapters provide invariance using a small number of pairwise translations1. However, language use varies across time, place, and social scenario creating many low-data scenarios. We work towards addressing these by integrating expert linguistic knowledge into Language Models with dynamic aggregation of linguistic rules2 and model adaptation using a HyperNetwork condition on eWAVE features3.
Task-Agnostic Dialect Adapters for English1
Dialect Adaptation via Dynamic Aggregation of Linguistic Rules2
Attribution: Multi-VALUE was designed and built at the SALT Lab 🧂 at Stanford University 🌲 and the Georgia Institute of Technology 🐝. This resource draws heavily from the Electronic World Atlas of Varieties of English (eWAVE), which aggregates the work of over 80 field linguists.
What Multi-VALUE can do for you
In short, Multi-VALUE can produce synthetic forms of dialectal text with modular control over which linguistic features are expressed. This will allow you to systematically isolate and the effects of certain syntactic and morphological structures on the performance of English NLP systems. Research applications include:
📐 Benchmarking: ML researchers can more comprehensively evaluate task performance across domain shifts.
⚖️ Bias and Fairness: Fairness, accountability and transparency researchers can more directly examine the ways NLP systems systematically harm disadvantaged or protected groups.
🌏 Linguistic Typology: Computational linguists can systematically understand the internal representations of large language models according to the literature on theoretical and field linguistics.
🌱 Low-resource NLP: Practitioners can adapt models to dialects which have limited labeled data by building on rich knowledge from dialectology.
Getting Started
Use the following demo to start using Multi-VALUE.
Installation
pip install value-nlp
Demo
[VECTOR DIALECT] Southeast American enclave dialects (abbr: SEAmE)
Region: United States of America
Latitude: 34.2
Longitude: -80.9
'I done talked with them yesterday'
{(2, 7): {'value': 'done talked','type': 'completive_done'}}
Important Considerations
Limitations
🤔 Multi-VALUE does not cover orthographic (writing) or lexical variation; researchers can draw such variation from corpus data.
🤔 Multi-VALUE covers only what linguists have observed frequently enough to document, and the catalogue is incomplete. Speech and communication can vary in myriad forms not captured by this resource.
Additional Considerations
Please keep the following points in mind when using Multi-VALUE
📐 Benchmarking: Dialects are not fixed or deterministic – they are living elements of the communities which speak them.
⚖️ Bias and Fairness: Synthetic transformations are designed for stress-testing NLP systems and not for impersonating spoken or written dialect.
Data Agreement
I will not use Multi-VALUE for malicious purposes including not limited to: deception, impersonation, mockery, discrimination, hate speech, targeted harassment and cultural appropriation. In my use of this resource, I will respect the dignity and privacy of all people.
References
@inproceedings{ziems-etal-2022-value,
title = "{VALUE}: {U}nderstanding {D}ialect {D}isparity in {NLU}",
author = "Ziems, Caleb and
Chen, Jiaao and
Harris, Camille and
Anderson, Jessica and
Yang, Diyi",
booktitle = "Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics",
month = may,
year = "2022",
}
@inproceedings{ziems-etal-2022-multi-value,
title = "{M}ulti-{VALUE}: {E}valuating {C}ross-dialectal {NLP}",
author = "Held*, William and
Ziems*, Caleb and
Yang, Jingfeng and
Dhamala, Jwala and
Gupta, Rahul and
Yang, Diyi",
booktitle = "Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics",
month = jul,
year = "2023",
}
@inproceedings{held-etal-2023-tada,
title = "{TADA} : {T}ask {A}gnostic {D}ialect {A}dapters for {E}nglish",
author = "Held, William and
Ziems, Caleb and
Yang, Diyi",
booktitle = "Findings of the Association for Computational Linguistics: ACL 2023",
month = jul,
year = "2023",
}
@inproceedings{liu-etal-2023-dada,
title = "{DADA}: {D}ialect {A}daptation via {D}ynamic {A}ggregation of {L}inguistic {R}ules",
author = "Liu, Yanchen and
Held, William and
Yang, Diyi",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
}
@inproceedings{held-etal-2023-tada,
title = "Task-Agnostic Low-Rank Adapters for Unseen English Dialects",
author = "Xiao, Zedian and
Held, William and
Liu, Yanchen and
Yang, Diyi",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
}
COQA Experiments
Model | Test Dialect | |||||||
---|---|---|---|---|---|---|---|---|
Base | Train Set | SAE | AppE | ChcE | CollSgE | IndE | UAAVE | Average |
BERT Base | SAE | 77.2 | 77.4 (-3.8%) | 76.6 (-0.7%) | 61.5 (-25.4%) | 70.8 (-9%) | 71.2 (-8.4%) | 71.9 (-7.3%) |
AppE | 76.3 (-1.1%) | 76.4 (-1%) | 76.1 (-1.4%) | 64.7 (-19.3%) | 72.8 (-6%) | 73.2 (-5.4%) | 73.3 (-5.3%) | |
ChcE | 76.8 | 74.7 (-3.3%) | 76.5 (-0.8%) | 63.6 (-21.3%) | 71.6 (-7.8%) | 71.4 (-8.1%) | 72.4 (-6.5%) | |
CollSgE | 75.7 (-1.9%) | 74.1 (-4.2%) | 75.5 (-2.2%) | 74.7 (-3.3%) | 73.6 (-4.8%) | 73.4 (-5.1%) | 74.5 (-3.6%) | |
IndE | 76.0 (-1.5%) | 75.4 (-2.4%) | 75.7 (-2%) | 63.2 (-22%) | 75.1 (-2.7%) | 74.1 (-4.1%) | 73.3 (-5.3%) | |
UAAVE | 76.1 (-1.4%) | 75.6 (-2%) | 76.0 (-1.5%) | 64.6 (-19.5%) | 74.5 (-3.6%) | 75.3 (-2.5%) | 73.7 (-4.7%) | |
Multi | 76.2 (-1.2%) | 75.6 (-2%) | 76.1 (-1.3%) | 73.7 (-4.7%) | 74.9 (-3.1%) | 75.1 (-2.7%) | 75.3 (-2.5%) | |
In-Dialect | 77.2 | 76.4 (-1%) | 76.5 (-0.8%) | 74.7 (-3.3%) | 75.1 (-2.7%) | 75.3 (-2.5%) | 75.9 (-1.7%) | |
RoBERTa Base | SAE | 81.8 | 79.1 (-3.4%) | 81.5 (-0.3%) | 68.8 (-18.9%) | 76.1 (-7.5%) | 76.6 (-6.7%) | 77.3 (-5.8%) |
AppE | 82.0 (0.3%) | 81.8 | 81.8 | 71.2 (-14.9%) | 79.0 (-3.5%) | 79.6 (-2.8%) | 79.2 (-3.2%) | |
ChcE | 81.7 (-0.1%) | 79.3 (-3.1%) | 81.5 (-0.4%) | 68.8 (-18.9%) | 76.5 (-7%) | 77.3 (-5.9%) | 77.5 (-5.5%) | |
CollSgE | 81.5 (-0.4%) | 80.1 (-2.2%) | 81.2 (-0.7%) | 80.2 (-2%) | 79.4 (-3%) | 78.7 (-3.9%) | 80.2 (-2%) | |
IndE | 81.1 (-0.8%) | 80.5 (-1.5%) | 80.9 (-1.1%) | 67.2 (-21.7%) | 80.3 (-1.9%) | 79.2 (-3.3%) | 78.2 (-4.6%) | |
UAAVE | 81.6 (-0.2%) | 81.1 (-0.9%) | 81.5 (-0.3%) | 69.2 (-18.2%) | 79.6 (-2.7%) | 81.1 (-0.9%) | 79.0 (-3.5%) | |
Multi | 80.6 (-1.5%) | 80.4 (-1.7%) | 80.5 (-1.6%) | 78.5 (-4.2%) | 79.7 (-2.7%) | 80.0 (-2.2%) | 80.0 (-2.3%) | |
In-Dialect | 81.8 | 81.8 | 81.5 (-0.4%) | 80.2 (-2%) | 80.3 (-1.9%) | 81.1 (-0.9%) | 81.1 (-0.9%) |