W4 Corpus Linguistics (part 2)

Continuing our Python/NLTK tutorial

Our goal this week is to learn how to compare different datasets. However, for that, we will go through a lot of “preparation steps”, learning how to open files, revisiting how list comprehensions work, etc. This week videos are a little more practical. Hopefully this is useful.

Language Modeling

In the weekly synchronous meetings, I mentioned that this week would cover some Language Modeling. However, since I feel that I’d be doing redundant work if I created videos on this topic (but also because I want to save time) I decided to not make any videos. I will present a “curated” list of materials here, which I hope are going to be useful.

This looks like a lot of materials, but you’ll notice that you can go through the videos in 1h~2h. Then you’ll only need to read the two text materials, which may take some 2h more.

From all of this, there are a few concepts that I think are crucial to understand and some others that are just interesting.I thought I’d try to “guide” you to pay attention to certain topics more than others. Here is a list of what I think is important that you understand from the materials:

Additional Materials

I found this playlist in Youtube which goes well along with the Corpus Linguistics topic. This might be a nice place for you to start if you are interest in the topic.

https://www.youtube.com/watch?v=SZ2RtyKzU6o&list=PLKgdsSsfw-fau4PsTEOCcXsKxSmk6pJTY&index=1