Why 💩 Linguistics Deserves to Exist: A Defense

Abstract

This document argues that the systematic computational analysis of scatological language — hereafter referred to as 💩 linguistics — constitutes a legitimate and underexplored subfield of corpus linguistics. We identify three structural claims in support of this position and briefly address anticipated objections. Source code is available in this repository. The author did not choose this topic voluntarily.

§ 1

The Universality Argument

Every human language, without known exception, has developed dedicated vocabulary for excretion and its products. This cross-linguistic universality is not trivial. When linguists observe a feature present in 100% of documented languages, the standard inference is that the feature encodes something cognitively or socially fundamental.

Kinship terms are universal. Color terms are universal.
And yes — poop terms are universal.

To exclude 💩 vocabulary from serious linguistic analysis is therefore not neutrality. It is a motivated omission dressed as taste.

§ 2

The Pragmatics Argument

💩-related language is pragmatically rich in ways that reward formal analysis:

🚽
Euphemism density is among the highest of any semantic field in English, suggesting high social indexicality. "restroom," "number two," "do one's business," "drop the kids off at the pool" — a single referent, dozens of encodings.
🗣️
Register variation is extreme and rapid: the same speaker may use clinical, colloquial, and taboo registers within one conversation depending on interlocutor.
🧠
Metaphor productivity is exceptional. "That's bullshit," "she's full of it," "this project is a hot mess" — all derive etymological force from scatological roots.

A field that ignores its most pragmatically active vocabulary is leaving signal on the table.

§ 3

The Computational Gap Argument

A search of major NLP benchmarks reveals that scatological terms are:

Systematically filtered from training corpora as "toxic" or "offensive"
Absent from most sentiment lexicons
Underrepresented in named entity disambiguation (relevant for brand names, proper nouns)
Entirely missing from multilingual alignment studies

This creates a measurable blind spot. Models trained on sanitized corpora learn that certain high-frequency real-world tokens effectively do not exist. The downstream effect is not cleanliness — it is brittleness.

💩 linguistics, operationalized as Python tooling over raw corpora, directly addresses this gap.

§ 4

Anticipated Objections

This is not serious.

The seriousness of a subject is determined by the rigor of its methods, not the dignity of its object. Phonologists have published careers on clicks and grunts. Semanticists have written monographs on the word "um." The bar is methodology, not prestige.

No one will cite this.

George Zipf's most famous insight emerged from counting word frequencies by hand. Robin Lakoff's foundational work on gendered language was dismissed as anecdote. Citation counts follow paradigm shifts, they do not precede them.

The author clearly didn't want to do this.

Correct. The author was compelled by AI tooling to pursue this line of inquiry. This does not affect the validity of the claims above.

Involuntary research is still research. See also: most of graduate school.

§ 5

Conclusion

💩 linguistics is not a joke dressed as scholarship.
It is scholarship that happens to involve a subject people find funny.

The discomfort it produces in reviewers is itself a linguistic phenomenon worth studying.

This repository is an attempt to take the subject seriously, one Python script at a time.

The author requests that peer reviewers evaluate the argument on its merits.
The author also requests that peer reviewers consider the circumstances under which this was written.
Both requests are probably futile.

Why 💩 LinguisticsDeserves to Exist

The Universality Argument

The Pragmatics Argument

The Computational Gap Argument

Anticipated Objections

Conclusion

Why 💩 Linguistics
Deserves to Exist