From 9b696addbeadac234763daad0486a064b2ef7306 Mon Sep 17 00:00:00 2001 From: Oliver Sampson Date: Wed, 6 Oct 2021 15:27:04 +0200 Subject: [PATCH 1/3] Fixed tags for the graphical condordance The tags for the graphical concordance example were incorrect. --- book/ch05.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/ch05.rst b/book/ch05.rst index 4a25d6ac..395f0e3e 100644 --- a/book/ch05.rst +++ b/book/ch05.rst @@ -316,7 +316,7 @@ category of the Brown corpus: We can use these tags to do powerful searches using a graphical POS-concordance tool ``nltk.app.concordance()``. Use it to search for any combination of words and POS tags, e.g. -``N N N N``, ``hit/VD``, ``hit/VN``, or ``the ADJ man``. +``NOUN NOUN NOUN NOUN``, ``hit/VBD``, ``hit/VBN``, or ``the ADJ man``. .. Screenshot From 41993c22a1eff79421378749d145272042e73e62 Mon Sep 17 00:00:00 2001 From: Oliver Sampson Date: Thu, 7 Oct 2021 09:53:07 +0200 Subject: [PATCH 2/3] Fixed the CFD query Following thorugh the book, checking for VBD or VBN in cfd1 is not possible, because it is based on the universal tagset. cfd1 has to recalculated using the WSJ tagset. --- book/ch05.rst | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/book/ch05.rst b/book/ch05.rst index 395f0e3e..7281d446 100644 --- a/book/ch05.rst +++ b/book/ch05.rst @@ -416,8 +416,9 @@ will do this for the WSJ tagset rather than the universal tagset: To clarify the distinction between ``VBD`` (past tense) and ``VBN`` (past participle), let's find words which can be both ``VBD`` and -``VBN``, and see some surrounding text: +``VBN`` from the WSJ tagset, and see some surrounding text: + >>> cfd1 = nltk.ConditionalFreqDist(wsj) >>> [w for w in cfd1.conditions() if 'VBD' in cfd1[w] and 'VBN' in cfd1[w]] ['Asked', 'accelerated', 'accepted', 'accused', 'acquired', 'added', 'adopted', ...] >>> idx1 = wsj.index(('kicked', 'VBD')) From e8ab22c3b84297db0eb2567aef4890ffb07d4edc Mon Sep 17 00:00:00 2001 From: Oliver Sampson Date: Thu, 7 Oct 2021 11:12:44 +0200 Subject: [PATCH 3/3] Fixed comparison In order to get the presented output words of length 3 also have to be included. --- book/ch05.rst | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/book/ch05.rst b/book/ch05.rst index 7281d446..8d50d440 100644 --- a/book/ch05.rst +++ b/book/ch05.rst @@ -566,7 +566,7 @@ the distinctions between the tags. >>> data = nltk.ConditionalFreqDist((word.lower(), tag) ... for (word, tag) in brown_news_tagged) >>> for word in sorted(data.conditions()): - ... if len(data[word]) > 3: + ... if len(data[word]) >= 3: ... tags = [tag for (tag, _) in data[word].most_common()] ... print(word, ' '.join(tags)) ...