Friday, August 1, 2014

for linguists, by linguists

The Speculative Grammarian is at it again, offering a happy hour discount on an already ridiculously inexpensive book of linguistic fun: The Speculative Grammarian Essential Guide to Linguistics.
Speculative Grammarian is the premier scholarly journal featuring research in the oft neglected field of satirical linguistics—and it is now available in book form!

a sidelong look at all that is humorous about the field. Containing over 150 articles, poems, cartoons, humorous ads and book announcements—plus a generous sprinkling of quotes, proverbs and other witticisms—the book discovers things to laugh about in most major subfields of Linguistics. It pokes good-natured fun at linguists (famous or otherwise), linguistic theory, and many aspects of language. The authors and editors are linguists who love their field, but who at the same time love to celebrate the funny aspects of Linguistics. The book invites readers to laugh along.

Sunday, June 29, 2014

Facebook "emotional contagion" Study: A Roundup of Reactions

In case you missed it, there was a dust-up this weekend around the web because of a social science study involving manipulation of Facebook news feeds of users (which might include you, if you are an English language user). Here are three points of contention (in order of intensity):
  • Ethics - Was there informed consent?
  • Statistical significance - The effect was small, but the data large, what does this mean?
  • Linguistics - How did they define and track "emotion "?
First, the original study itself:

Experimental evidence of massive-scale emotional contagion through social networks. Kramer et al. PNAS. Synopsis (from PNAS)
We show, via a massive (N = 689,003) experiment on Facebook, that emotional states can be transferred to others via emotional contagion, leading people to experience the same emotions without their awareness. We provide experimental evidence that emotional contagion occurs without direct interaction between people (exposure to a friend expressing an emotion is sufficient), and in the complete absence of nonverbal cues.
My two cents: We'll never see the actual language data, so the many questions this study raises are destined to be left unanswered.

The Roundup

In Defense of Facebook: If you can only read one analysis, read Tal Yarkoni's deep dive response to the study and its critics. It's worth a full read (comments too). He makes a lot of important points, including the weakness of the effect, the rather tame facts of the actual experiments, and the normalcy of manipulation (that's how life works) but for me, this take-down of the core assumptions underlying the study is the Money Quote:
the fact that users in the experimental conditions produced content with very slightly more positive or negative emotional content doesn’t mean that those users actually felt any differently. It’s entirely possible–and I would argue, even probable–that much of the effect was driven by changes in the expression of ideas or feelings that were already on users’ minds. For example, suppose I log onto Facebook intending to write a status update to the effect that I had an “awesome day today at the beach with my besties!” Now imagine that, as soon as I log in, I see in my news feed that an acquaintance’s father just passed away. I might very well think twice about posting my own message–not necessarily because the news has made me feel sad myself, but because it surely seems a bit unseemly to celebrate one’s own good fortune around people who are currently grieving. I would argue that such subtle behavioral changes, while certainly responsive to others’ emotions, shouldn’t really be considered genuine cases of emotional contagion

the Empire strikes back: Humanities Professor Alan Jacobs counters Yarkoni, using language that at times seemed to verge on unhinged, but hyperbole aside, he takes issue with claims that the experiment was ethical simply because users signed a user agreement (that few of them ever actually read). Money Quote:
This seems to be missing the point of the complaints about Facebook’s behavior. The complaints are not “Facebook successfully manipulated users’ emotions” but rather “Facebook attempted to manipulate users’ emotions without informing them that they were being experimented on.” That’s where the ethical question lies, not with the degree of the manipulation’s success. “Who cares if that guy was shooting at you? He missed, didn’t he?” — that seems to be Yarkoni’s attitude

Facebook admits manipulating users' emotions by modifying news feeds: Across the pond, The Guardian got into the kerfuffle. Never one to miss a chance to go full metal Orwell on us, the Guardian gives us this ridiculous Money Quote with not a whiff of counter-argument:
In a series of Twitter posts, Clay Johnson, the co-founder of Blue State Digital, the firm that built and managed Barack Obama's online campaign for the presidency in 2008, said: "The Facebook 'transmission of anger' experiment is terrifying." He asked: "Could the CIA incite revolution in Sudan by pressuring Facebook to promote discontent? Should that be legal? Could Mark Zuckerberg swing an election by promoting Upworthy [a website aggregating viral content] posts two weeks beforehand? Should that be legal?"
This Clay Johnson guy is hilarious, in a dangerously stupid way. How does his bonkers ranting rate two paragraphs in a Guardian story?

Everything We Know About Facebook's Secret Mood Manipulation Experiment: The Atlantic provides a roundup of sorts and a review of the basic facts, and some much needed sanity about the limitations of LIWC (which is a limited, dictionary tool that, except for the evangelical zeal of its creator James Pennebaker, would be little more than a toy for undergrad English majors to play with). Article also provides important quotes from the study's editor, Princeton's Susan Fiske. This also links to a full interview with professor Fiske.

Emotional Contagion on Facebook? More Like Bad Research Methods: If you have time to read two and only two analyses of the Facebook study, first read Yarkoni above, then read John Grohol's excellent fisking of the (mis-)use of LIWC as tool for linguistic study. Money Quote:
much of human communication includes subtleties ... — without even delving into sarcasm, short-hand abbreviations that act as negation words, phrases that negate the previous sentence, emojis, etc. — you can’t even tell how accurate or inaccurate the resulting analysis by these researchers is. Since the LIWC 2007 ignores these subtle realities of informal human communication, so do the researchers.
Analyzing Facebook's PNAS paper on Emotional Contagion: Nitin Madnani provides an NLPers
detailed fisking of the experimental methods, with special attention paid to the flaws of LIWC (with bonus comment from Brendan O'Connor, recent CMU grad and new U Amherst professor). Money Quote:
Far and away, my biggest complaint is that the Facebook scientists simply used a word list to determine whether a post was positive or negative. As someone who works in natural language processing (including on the task of analyzing sentiment in documents), such a rudimentary system would be treated with extreme skepticism in our conferences and journals. There are just too many problems with the approach, e.g. negation ("I am not very happy today because ..."). From the paper, it doesn't look like the authors tried to address these problems. In short, I am skeptical the whether the experiment actually measures anything useful. One way to address comments such as mine is to actually release the data to the public along with some honest error analysis about how well such a naive approach actually worked.

Facebook’s Unethical Experiment: Tal Yarkoni's article above provides a pretty thorough fisking of this Slate screed. I'll just add that Slate is never the place I'd go to for well reasoned, scientific analysis. A blow-by-blow deep dive into the last episode of Orange Is The New Black? Oh yeah, Slate has that genre down cold.

Anger Builds Over Facebook's Emotion-Manipulation Study: The site that never met a listicle it didn't love, Mashable provides a short article that fails to live up to its title. They provide little evidence that anger is building beyond screen grabs of a whopping four Twitter feeds. Note, they completely ignore the range of people supporting the study (no quotes from the authors, for example). As far as I can tell, there is no hashtag for anti-Facebook study tweets.

Facebook Manipulated User News Feeds To Create Emotional Responses: Forbes wonders aloud about the mis-use of the study by marketers. Money Quote:
What harm might flow from manipulating user timelines to create emotions?  Well, consider the controversial study published last year (not by Facebook researchers) that said companies should tailor their marketing to women based on how they felt about their appearance.  That marketing study began by examining the days and times when women felt the worst about themselves, finding that women felt most vulnerable on Mondays and felt the best about themselves on Thursdays ... The Facebook study, combined with last year’s marketing study suggests that marketers may not need to wait until Mondays or Thursdays to have an emotional impact, instead  social media companies may be able to manipulate timelines and news feeds to create emotionally fueled marketing opportunities.
You don't have to work hard to convince me that marketing professionals have a habit of half-digesting science they barely understand to try to manipulate consumers. That's par for the course in that field, as far as I can tell. Just don't know what scientists producing the original studies can do about it. Monkey's gonna throw shit. Don't blame the banana they ate.

Creepy Study Shows Facebook Can Tweak Your Moods Through ‘Emotional Contagion’. The Blaze witer Zach Noble summed up the negative reaction this way: a victory for scientific understanding with some really creepy ramifications. But I think it only seems creepy if you mis-understand the actual methods.

Final Thought: It's the bad science that creeps me out more than the questionable ethics. Facebook is data, lets use it wisely.

Friday, June 6, 2014

would you like vocal fries with that?

Actual linguist Christian DiCanio debunks non-linguists' study about perceptions of fake-vocal fry (if The Onion did linguistics parodies, surely this would be it): Vocal fry doesn't harm your career prospects, but not being yourself just might.

Money quote:
...listeners judge the female speakers with vocal fry as sounding "untrustworthy", there is a good possibility that they are simply making such a judgment based on the speaker not sounding like herself. The better lesson that one might take home instead here is that one's job prospects are harmed if you try to talk (or act) like someone who you are not.
Read the full take-down here (including bonus spectrogram!)

PS: I knew Christian briefly when he was an undergrad at SUNY Buffalo. He was talented and motivated. Now he's slumming it at some shady, slacker *university* in Connecticut. Damn waste.

Tuesday, May 27, 2014

mathematical linguistics for high school students

I received the following email this weekend:

I'm a high school junior from southern California.

For our final project in AP Calculus class, I'm doing a presentation on the connection between mathematics and linguistics, and I stumbled on your blogpost "Why Linguists Should Study Math" while researching my topic.

I was wondering if you could point me towards some resources (that are relatively easy to understand) about how math is present in and affects our written and spoken language.
Some things that I am considering are:
- the occurrences of words in our language
- how grammar uses mathematical principles
- algorithms we use to construct sentences

My [edited] response (suggestions from y'all as to better resources are much appreciated; I'll forward; I wanted to get a response out quickly because the final is presumably fast approaching):


Thanks for reaching out to me. Of course, I think you’ve chosen a good topic. There are two broad ways in which linguistics and math intersects:
  • How the human brain uses math in natural language (psycholinguistics)
  • How linguists use math to study and model languages (computational linguistics)
From your email, it appears you are mostly interested in #1. However, in contemporary linguistics, the two are fast becoming one. Most contemporary linguists use math as a tool.

Let me address your three areas of interest with respect to how the human brain might use math to process and produce language:

The occurrences of words in our language: For the most part, this means “frequency” which really means counting. Linguists love to count. We use large corpora of texts to count words and phrases. Lancaster University in the UK is a well-known corpus linguistics school. Their web page has a lot of good introductory information (although I find it a bit clunky looking).

UPDATE: I forgot to include the one item that most directly answers the basic question: frequency effects in language. Human's are very aware of how often they hear words. In some way, we count words automatically, even if it's not quite a specific count like 75, somehow we know which words, phonemes, syntactic structures we hear/read more than others. This gives rise to a variety of frequency effects in language processing. This is the clearest example of how the brain uses math for language.

For example, we recognize high frequency words much faster than low frequency words. The website for Paul Warren's book "Introducing Psycholinguistics" has an online demo for a word frequency task you can walk through to see how linguists study this.
What do linguists count?
  • Words: I’m sure you’ve seen word clouds like Wordle. This is composed of simple word frequency counts. One of the most enduring facts about word counts is Zipf’s Law which says “the most frequent word [in a corpus of texts] will occur approximately twice as often as the second most frequent word, three times as often as the third most frequent word, etc.” Why would this be true? Linguists have been studying this for decades.
  • Ngrams: sets of two-word, three-words, four-word strings, etc. This helps provide more context than mere single word frequencies. Have some fun playing around with Google’s Ngram Viewer if you haven’t already. Try plotting the change in frequency of “mathematical linguistics” and “corpus linguistics” (paste those two phrases into the search box with no quotes and only a comma separating them). Scholars are trying to use this to plot changes in culture. For example, take a look at this PDF.
  • Other: We also count many other things too, like parts of speech (verbs, nouns, prepositions, etc). We also count the co-occurrence of linguistics items that are not right next to each other. If you want to dig into more frequency fun, check out the more advanced tools at BYU. You can read more about how these tools help us study language here.

How grammar uses mathematical principles: One of the most commonly studied types of mathematical principle in language is statistical learning. A good example of this is transitional probabilities, which are sets of probabilities for what linguistic item might come next given a string of items (e.g., words or phonemes). For example, if you read “The author signed the _______”, you could guess what the blank word is based on the previous four words (most likely, it’s “book”).  This is based on the psycholinguistic tests called “Cloze tests”. Linguists have discovered that the brain tracks transitional probabilities for all kinds of linguistic items. In fact, this is one of the most robust areas of study in language acquisition. Linguists study how babies use transitional probabilities to learn language. For example, one of the most challenging problems is figuring out how babies learn to separate a continuous stream of audio noise coming in to their ears into separate words, without any knowledge of what words are or what they mean. One theory is that babies quickly learn transitional probabilities of sounds that tell them where one word ends and another begins. But transitional probabilities alone are not enough. For a challenge, try reviewing this PDF:

Algorithms we use to construct sentences: This is the most controversial area you’ve asked about. The fact is, we linguists don’t really know how the brain constructs sentences. As I mentioned above, there are models based on transitional probabilities like Markov models, a computer algorithm designed to make those same kinds of guesses we made about “book”. Markov models and Cloze tests are a good example of psycholinguistics and computational linguistics coming together. As a theoretical contrast to statistical models, there are rule-based models like formal grammars. These are not mathematical in a typical sense, but they are based on formal logic, which is the underlying foundation of mathematics. Linguistics is in the middle of a war between the formal grammar camp and the statistical grammar camp. There’s no consensus on which is the *correct* model of language. However, in the last decade or so, the statistical side seems to have gained the advantage. If you really want to dig in to this war, here’s a challenging read.

Additional Reading:
Linguists who count (the comments are especially engaging; your teacher might be particularly interested in the calculus vs. algebra debate that ensues).

I hope this gets you off to a good start. Please don’t hesitate to ask for clarifications or more resources (especially let me know if you need more intro level or more advanced level; I wasn’t sure if I hit the level right or not). I’m happy to be of more assistance if I can. As a smart, dedicated student, I’m sure you’re ready to dig in to ngrams and Markov models. But, as a high school junior in southern California with June fast approaching, I’m also sure you’re ready for the beach. Both are required for a healthy life of the mind.

Wednesday, May 21, 2014

Jobs for linguists - May 2014

California is awash in jobs for linguists this Spring...

Update 5/24/14: Branding and Marketing
Consultant, Verbal Identity

B.A. degree, backgrounds of interest include any verbal-focused or writing intensive field (e.g. Linguistics)
Apply Here

Text Analytics Consultant
Medallia, Inc. - Palo Alto,California
Bachelor'€™s degree
Background in Linguistics
Demonstrated interest in technology
Strong preference for a French or German native speaker
(Not visible on company website, found on LinkedIn, sign in required)
Apply Here

Linguistic Intern
Bosch Group
SF Bay Area
Responsibilities: Support the development of next-generation language products in the areas of speech and language technologies and systems. Support the administration of user studies
Qualifications: Senior undergraduate or graduate students in Applied Linguistics, or related fields
Apply Here

Analytical Linguist, Ads Human Evaluation
Los Angeles, CA, USA
Product Management
Responsibilities: Direct, monitor, train, and manage the day-to-day work of temporary workers.
Design and implement tests on data and worker quality, analyzing and reporting on the results using Python, XML/CSS, HTML/JavaScript, database queries, and Google-internal technologies.
Work directly with engineers and statisticians to devise and run experiments to answer specific questions about advertising and product quality.
Minimum Qualifications: MA/MS or PhD degree in an analytical field (e.g., Linguistics, Cognitive Science, Statistics, Mathematics), or equivalent practical experience.
Experience with one or more of the following: Python or another scripting language, Java or C++, XML/HTML/CSS/JavaScript, SQL or specialized database query languages and/or specialized analysis software such as Matlab, R, SPSS, STATA, SAS, Praat, or E-Prime.
Experience working with large quantities of data.
Apply Here
And see my context here

Lexical Resource Manager
M.A. or PhD in Linguistics or related field
Strong background in phonology
Apply Here

Friday, February 21, 2014

RIP Charles Fillmore

I never met Charles Fillmore, but he had a deep influence on my linguistics education. When I was a graduate student in linguistics at SUNY Buffalo we only half jokingly called it Berkeley East because half the faculty had been trained at Berkeley and the department had a *perspective* on linguistics that was undeniably colored by Berkeley theory. Charles Fillmore was a hero at SUNY Buffalo and it was hard to take a class that didn't reference his work. His work on constructions and frame semantics was the underpinning of my interest in verb classes and prepositions.

I can't offer any unique thoughts on the man, so I'll simply point to some folks around the web who have offered theirs:

A Roundup of Reactions

Paul Kay - Charles J. Fillmore
The magnitude of Fillmore’s contributions to linguistics can hardly be exaggerated

George Lakoff - He Figured Out How Framing Works
He discovered that we think, largely unconsciously, in terms of conceptual frames — mental structures that organize our thought. Further, he found that every word is mentally defined in terms of frame structures.

Dominik Lukes - Linguistics According to Fillmore
Charles J Fillmore who was a towering figure among linguists without writing a single book. In my mind, he changed the face of linguistics three times with just three articles (one of them co-authored).

UC Berkeley - Linguistics Department
He was a gifted teacher, a beloved mentor, a treasured colleague and friend, and one of the great linguists of the last half-century.

Arnold Zwicky - Chuck Fillmore
...with a link to a wonderful video he made about his career in 2012.

Friday, January 31, 2014

The SOTU and Reading Level

Evan Fleischer wrote a cheeky little bit about the reading level of the SOTU over at Esquire: Is the State of the Union Getting Dumber?

It was triggered by this graph in The Guardian:

Even emailed me and several other linguists to get some reactions. He quotes me, Ben Zimmer, and Angus B. Grieve-Smith. We generally agreed that trend noted by the graph probably had more to do with changing trends in who the speech is for, rather than any change in intelligence level.

It's a fun little read.

Tuesday, January 28, 2014

Anticipating the SOTU

In anticipation of President Obama's 2014 State Of The Union speech tonight, and the inevitable bullshit word frequency analysis to follow, I am re-posting my post from 2010's SOTU reaction, in hope that maybe, just maybe, some political pundit might be slightly less stupid than they were last year ... sigh .. here's to hope

BTW, Liberman has been on top of the SOTU story for a while now. here's his latest.

(cropped image from Huffington Post)

It has long been a grand temptation to use simple word frequency* counts to judge a person's mental state. Like Freudian Slips, there is an assumption that this will give us a glimpse into what a person "really" believes and feels, deep inside. This trend came and went within linguistics when digital corpora were first being compiled and analyzed several decades ago. Linguists quickly realized that this was, in fact, a bogus methodology when they discovered that many (most) claims or hypotheses based solely on a person's simple word frequency data were easily refuted upon deeper inspection. Nonetheless, the message of the weakness of this technique never quite reached the outside world and word counts continue to be cited, even by reputable people, as a window into the mind of an individual. Geoff Nunberg recently railed against the practice here: The I's Dont Have It.

The latest victim of this scam is one of the blogging world's most respected statisticians, Nate Silver who performed a word frequency experiment on a variety of U.S. presidential State Of The Union speeches going back to 1962 HERE. I have a lot of respect for Silver, but I believe he's off the mark on this one. Silver leads into his analysis talking about his own pleasant surprise at the fact that the speech demonstrated "an awareness of the difficult situation in which the President now finds himself." Then, he justifies his linguistic analysis by stating that "subjective evaluations of Presidential speeches are notoriously useless. So let's instead attempt something a bit more rigorous, which is a word frequency analysis..." He explains his methodology this way:

To investigate, we'll compare the President's speech to the State of the Union addresses delivered by each president since John F. Kennedy in 1962 in advance of their respective midterm elections. We'll also look at the address that Obama delivered -- not technically a State of the Union -- to the Congress in February, 2009. I've highlighted a total of about 70 buzzwords from these speeches, which are broken down into six categories. The numbers you see below reflect the number of times that each President used term in his State of the Union address.

The comparisons and analysis he reports are bogus and at least as "subjective" as his original intuition. Here's why:

Sunday, January 12, 2014

causation in verbal semantics

Causation is a major area of study within linguistic semantics. There is a thorough wiki page on the Causative that provides a good overview. Also, unsurprisingly, Beth Levin has written a nice discussion of the issues in these LSA 09 notes: Lexical Semantics of Verbs III: Causal Approaches to Lexical Semantic Representation.

To list the troubles with defining causation would fill a dissertation, so I won't bother here. Often, semanticists are interested in argument realization (see Levin's notes above). But there are deeper issues with causality that often go unaddressed. The deepest of all: what the hell is causality?

To this point, I ran across an old draft of a grad school buddy's qualifying paper on causation. It's just a draft, and it's old, but it had a nice section that tried to outline the constitutive criteria for causation*. I have since lost touch with this guy (I'll call him "BB"), but I thought this list of criteria is good food for though for anyone interested in causation. I post these as discussion points only. And if BB sees this, give me a buzz :-)

First, here's a taste of the range of causative types taken from the wiki page on Causation (don't be fooled by these English examples, the issues permeate all languages. Causation is tough):

  • The vase broke — autonomous events (non-causative).
  • The vase broke from a ball’s rolling into it — resulting-event causation.
  • A ball’s rolling into it broke the vase — causing-event causation
  • A ball broke the vase — instrument causation.
  • I broke the vase in rolling a ball into it author causation (unintended).
  • I broke the vase by rolling a ball into it  agent causation (intended) 
  • My arm broke when I fell  undergoer situation (non-causative).
  • I walked to the store  self-agentive causation.
  • I sent him to the store  caused agency (inductive causation).

BB's Nine Criteria for the treatment of causation (c. 2002)
  1. Change of state. The caused event must denote a change of state.
  2. Causers must be events. The causer A can not simply be an individual but must be an event.
  3. Argument sharing. The causing event must contain the causee in its representation.
  4. Impingement. There must be a clear indication of impingement between the causer and the causee such that the causer impinges on the causee.
  5. Occurrence condition. The caused event must occur.
  6. Co-occurrence condition. The occurrence of the caused event must be conditional with the occurrence of the causing event, that is, the caused event can only take place if the causing event takes place.
  7. Non-co-occurrence condition. The non-occurrence of the caused event must be conditional with the non-occurrence of the causing event; that is, the caused event does not take place if the causing event does not take place.
  8. Directness of causation. It must be apparent when indirect causation is allowable for causality in lexical items.
  9. Spatiotemporal equivalence. The causing event and the caused event must have an equivalent time and place.

BTW, I recall objecting to #5 "the caused event must occur" because of negative causative verbs like prevent (feel free to read my previous post on these kinds of verbs). I don't know how or if he addressed that in his final version.

* There's so much literature on causation, it would take years to review it all to see if anyone else has done such a thing at quite such a level (many authors mention criteria, but not quite as exhaustively). I wouldn't be surprised if there is a better variation out there, and I'm happy to post it if someone wants to point it out to me.

Monday, December 16, 2013

Porn for Linguists!

Finally! One thing in linguistics Len Talmy, Paul Postal, Noam Chomsky, and Joan Bresnan can agree on: At $12.99, The Speculative Grammarian Essential Guide to Linguistics is a modest last minute Christmas gift that takes less effort to purchase than red sweaters with white fluffy trim, yay! Nerdy uncles around the world thank you!

At 10,700 single spaced pages, 9 point font, Vera Sans Bold, this thin volume is a reminder of why my dissertation never quite fulfilled its promise, or never quite filled 50 pages, for that matter (can you say Ay Bee Dee, boys and girls?).

This volume of linguistic paraphernalia appears to be an elaborate sting designed to con some otherwise reputable institution into bestowing a commemorative matchbook cover on Trey Jones, a linguist best known for not being Terry Jones.

Out of kindness to the editors, I will refrain from discussing their shocking decision, vis à vis two white spaces after a period or one (I'll leave it to you, dear reader, to judge the depth of their depravity on your own). As to their policy regarding the Oxford comma, scandalous!

Am I paranoid, or was the blank page four a none-too-subtle homage to covert logical form? Obvious Chomskyan propaganda, I was disgusted.

'Tis not without its charms, though. A personal fave: Kean Kaufmann's cartoon depiction of when Daniel Jones discovered history's first cardinal vowel by plucking it, virginal and innocent, from his perfectly formed vowel space:

The volume also contains some rarely discussed dark moments in linguistics history, such as the catastrophic linguistic consequences of the 2004–5 NHL lockout on Canadian language production. So many "ehs" lost in time, like teardrops in the rain...

Rumor has it that Steven Pinker saw the book and immediately cried out, "Jones? TREY Jones? That guy owes me money!"

There are worse things you can do than spend $12.99 on pure linguistics fun.