How Complex is Natural Language? The Chomsky Hierarchy

So let’s talk about complexity. For a codebreaker
who spends all day decrypting messages and unlocking secrets, knowing what kind of cipher
was used makes puzzling out the answer a lot easier. But the vast machine of everyday communication
didn’t come with any blueprints. We need to work extra hard at trying to reverse engineer
language into its component parts to get out just how complicated a system it really is.
So what do we find when we finally crack the code? I’m Moti Lieberman, and this is The
Ling Space. Any language can be broken up into different
systems: syntax, semantics, morphology, phonology. And different languages divvy up the workload
in different ways. But linguists generally agree that, on the whole, all languages are
equally capable of expressing pretty complicated ideas. But what about human language overall?
Just how complex is it, and how do we even measure that? To get to the bottom of things,
we’ll need to hone in on what language actually is. As we’ve touched on before, one thing that
we can rule out pretty much right away is the idea that a language is just a bunch of
sentences, pre-built and ready to be pronounced. If that were true, each of us would need an
infinite memory to carry them all around in our heads. That’s because, in principle,
there’s no upper limit on how many sentences there are, or even how long each one can be!
Like, take the sentence “Joan should be allowed to work at Bletchley Park”. We can always add a few words to the beginning
of it and still end up with a perfectly good sentence, like in “Alan thinks that Joan
should be allowed to work at Bletchley Park”. And there’s nothing stopping us from doing that again. And again. And again. So, it doesn’t seem very likely that we’ve
just got all these possible sentences memorized ad infinitum, especially since you’ve probably
never even heard most of them before, but you still recognize them as English. Instead
of a list, language is more like a set of rules encoded in our heads, which we can tap
into at any moment, to produce any sort of sentence we like! And we’ve seen these rules in action before.
When we talked about X-bar theory, we showed you the sorts of trees that we use to represent
the structures of sentences. And at the roots of these trees are the rules that generate
them. Take a sentence like “Christopher might help”, which looks like this. Underneath this lies a couple of rules, which
can be put together and fed into each other. These say that a sentence — or inflectional
phrase — can be made up out of a noun phrase, plus some indication of tense or modality,
plus a verb phrase. There’s a lot of different kinds of rules
we can come up with to show how elements come together, different combinations that encode
any linguistic system. But how complex do the rules need to be to fully capture actual
human language? Depending on the answer, we can get an idea of how complex natural language
really is compared to other systems, like the artificial languages that we program computers
with. That’s because once we can nail down the kinds of rules we need to account for
it, we can fit them into the Chomsky hierarchy, named after linguist Noam Chomsky. The Chomsky hierarchy is like a ranking of
different types of rules, along with the different kinds of languages they’re able to generate.
The rules at the very bottom of the hierarchy form what are called Type-3 grammars, with
more sophisticated rules showing up the higher you go on the list, all the way up to Type-0.
It’s like golf, where smaller numbers score higher. So where does natural language fit? Well,
let’s start with the simplest possible grammars, the Type-3s, and see if it fits there. These
work using rules that look a bit like this. That first rule works by outputting some word
in the language you want to generate, followed by a placeholder which can be filled in with
more words later on. So this rule could start by producing something like “If phrase”.
Then, we can apply the rule again to replace that phrase part with another word, followed
by yet another phrase; so, we get “If you phrase”, “If you pass phrase”, and so
on. We can keep going like this as long as we
like, until we get to the end of our sentence and apply our second rule. It says we can
fill in that last bit with just one word, so we can get “If you pass the Turing test,
then you’re conscious”. The sort of language generated by a Type-3
grammar is known as a regular, or finite state, language. Since we managed to get
our rule to produce an English sentence, maybe English is a finite state language! But as you probably guessed, it isn’t quite
that simple. For one, the number of “if”s in a sentence like that usually has to match
the number of “then”s. So, you can’t have sentences like “If if you pass the Turing
test, then you’re conscious” or “You pass the Turing test, then then then you’re
conscious”. And our rules are too simple to account for that – they’ll let you stack
things up wrong, or give you garbled results. Now, there is a bit of flexibility here, and
with the right amount of tinkering, we could probably debug our rules and sidestep these
specific cases. But the problem only gets worse. Since any clause in a finite state grammar can be replaced with any other clause of the same sort, we can replace the first half of
that sentence with something like “either you pass the Turing test, or you have a mind”,
so the end result winds up as “If either you pass the Turing test or you have a mind,
then you’re conscious”. And again, nothing’s stopping us at just
one swap, so we can also get a sentence like “If either you pass either the Turing test
or a psychological exam, or you have a mind, then you’re conscious”. In general, you
need as many ors as there are eithers; there’s really no room for error. This is working so far, with a lot of effort,
but it looks like we really need some kind of counter to keep track of how many “if”s and
“either”s we use, or the rules might just start giving us nonsense. Our grammar needs some kind of
memory, which a Type-3 grammar just doesn’t have. So, what about the next level up — Type-2?
Well, these grammars give us a lot more flexibility. Some of the many rules allowed by a Type-2
grammar look like this. This new set-up looks a lot like our old one,
just with a couple of extra rules that make sure our sentences don’t run amok with too many
ifs or too few ors. With just a couple of extra, more powerful rules, we’re able keep these
pairs consistently matched up. And we can get a good sense of just how complex language
has to be, too, since now we can compare it to other Type-2 languages! For instance, the
artificial language of sentential logic, which we talked about a while back, falls into this
class, along with most computer programming languages. And our X-bar rule for sentences
does too! So we’re all done! Right? Well… not so fast. As it happens, there
are patterns in language that can’t be generated by these rules, either. Take a sentence like
“Charles says that we helped John paint the house”. Like we saw before, we can keep
extending this sentence indefinitely, so “Charles says that we let the children help John paint
the house”, “Charles says that we let Mary let the children help Peter help John
paint the house”, and so on. It’s not a problem for English, since our existing
rules are more than capable of handling it. But when we translate these sentences into
a dialect of Swiss German, things get tricky. For one, the verbs “let” and “help”
each need their own kind of specially marked noun phrase — an accusative one, in the
case of “let”, and a dative one, in the case of “help”. Overall, there need to
be as many accusative nouns as there are lets, and as many dative nouns as there are helps.
And there’s no upper limit to how many lets and helps there can be. What really throws a wrench into the works,
though, is that in Swiss German, all of the nouns actually get grouped together at the
front of the embedded clause, before all the verbs. To visualize what’s going on, you can think
of the overall pattern like this: sandwiched in between the beginnings and ends of each
of these sentences, we always find some number of accusative nouns, followed by some other
number of dative nouns, followed by however many lets you need, followed by however many
helps. And because of this bunching together of all
the nouns and all the verbs, when we draw lines connecting each noun to each verb, those
lines cross. For this reason, Swiss German is said to have crossing dependencies. And
that’s a real problem, since it can be mathematically proven that this kind of language can’t
be generated by a Type-2 grammar – check the video description if you want to know how. Because of that, we’ve concluded that natural
language is at least mildly context-sensitive. The rules of the next grammar up the list,
so Type-1, can reference contexts, and they generally look like this: And we’ve actually seen these kinds of rules
already. When Chomsky was originally developing his theory of grammar, he proposed adding
context-sensitive transformations to solve the problem posed here. In modern linguistics,
this has taken the form of syntactic movement — something we now see just about everywhere,
from question formation to raising verbs, and more! So since natural language needs
the type of rules that only Type-1 systems or higher have, we know that’s what
we want. But what about Type-0? Well, as it turns out,
Type-0 grammars end up being just too needlessly powerful to apply to human language, like
using a supercomputer to do your taxes. Type-0 grammars have rules which are “unrestricted”,
which means that you can have any combination of symbols to the left of the arrow, and any
combination of symbols on the right. There’s no real template for these sorts of rules,
except something like “A→B”, where “A” and “B” can be literally anything. So
although we can make machines talk to each other with unrestricted grammars, when you
line that up with what we know of natural languages, it just doesn’t compute. That’s why Type-1
has just the right degree of complexity. So, we’ve reached the end of The Ling Space
for this week. If you solved the enigma of language, you learned that we can think of
a language like a set of rules that come together to generate all of its sentences; that we
can rank those rules, using the Chomsky hierarchy; and that human language lands somewhere in
the upper half, in terms of power and sophistication. The Ling Space is produced by me, Moti Lieberman,
and directed by Adèle-Elise Prévost. This week’s episode was written by Stephan Hurtubise.
Our editor is Georges Coulombe, our music is by Shane Turner, and our graphics team
is atelierMUSE. We’re down in the comments below, or you can bring the discussion back
over to our website, where we’ll have some extra material on this topic. Check us out on Tumblr, Twitter, and Facebook,
and try dropping by our store. And if you want to keep expanding your own personal Ling
Space, please subscribe. And we’ll see you next Wednesday. Vidimo se uskoro!

30 Replies to “How Complex is Natural Language? The Chomsky Hierarchy

  1. I'm sorry but… some of your key terms don't seem very well defined here. One important question: what does the arrow actually mean?

  2. I'd love to say that you take complex concepts and make them easier to understand…I'd love to say that but I can't

  3. The piece starts by mentioning systems but doesn't make clear the difference between syntax and grammar, yet these two terms seem to be used interchangeably throughtout the presentation. Can you help?

  4. I watch lots of youtube and I always notice how much quieter your videos are compared to others. With my settings I turn the volume down in crash course videos but here I turn the player to the max and I think it's still not quite enough.

  5. 8:40 "[Using type-0 languages] just doesn't compute" I see what you did there. iirc type-0 is recursively enumerable. I can eventually tell you if it's valid, but I can't tell you if it's invalid in general.

  6. 6:37 I’m currently learning German and I can tell you for sure, that isn’t German. I looks more like French, in some ways. d’Marie, d‘chind…what is that? Das ist kein Deutsch!

  7. Thus we can say that there are some languages more complicated than others. Is Swiss German, with all its rules, more complicated than English? Well, English is an extremely simple grammar language. Probably that is one of the reason it has been so successful. Anybody can learn it and this does not happen with many other languages.

  8. It's funny that Chomskey Hierarchy is directly related to computability and therefore teached in Computer Science courses.

  9. Nice to see how this applies to actual natural language outside what we study computer science. In CS we handle formal languages with "words" being symbols and "phrases" being words, so we usually have an alphabet of a's and b's, or 0's and 1's, which gets abstract really fast. It helps reasoning about these constructs abstractly without getting caught up in the significance of e.g. the variables, though.

  10. Does this "uselessness"of type-0 languages have to do with introducing the possibility of infinite loops when trying to recognize phrases in them? Because, if I'm not mistaken, that's what they introduce in formal languages – they're the languages that can be decided by Turing machines, along with the ones that cannot

  11. Please make a video how to actually wire a neural circuit to produce real natural language. I tried to do it but it was full of language pathologies the output was not a normal natural language.

  12. There's a lot of negativity in these comments so I wanted to say thanks for helping me decipher the wikipedia page on The Chomsky HIerarchy. This video gave a good explanation of the syntaxes used to explain grammers and primed my learning with an easy-to-understand framework.

  13. Thank you for easy explanations! I know nearly nothing about automata theory and Chompsky hierarchy, which kept me from understanding what they are talking about. Now I get a sense

Leave a Reply

Your email address will not be published. Required fields are marked *