The Quantum Supposition of Oz #137

spc476 · 2014-11-30T09:11:08Z

A Markov chain of order-3 based on the Oz novels written by L. Frank Baum (14 novels in total). The only unusual thing here is that I considered punctuation as "words" in addition to the end-of-paragraph, so that you don't get a "wall of text" but something that is a bit more readable (even if the punctuation is separated by space when it shouldn't be).

The code is github: https://github.com/spc476/NaNoGenMo-2014 and the sample novel can be read at https://github.com/spc476/NaNoGenMo-2014/blob/master/TheQuantumSuppositionOfOz.txt

And my blog entry that goes into more detail about how it works: http://boston.conman.org/2014/11/29.1

cpressey · 2014-11-30T10:58:33Z

Nice technique for handling the punctuation -- it does make it more coherent(-seeming) than a run-of-the-mill Markov chain.

It should be possible to clean up the intervening spaces with a postprocessor... I wrote one (here) for my own novel, but admittedly I didn't have quotation marks to deal with.

ikarth · 2014-12-01T19:19:05Z

I ran into a similar issue with punctuation last year and ended up solving it with a postprocessing step. I'm starting to think that it makes sense to have the generator emit marked-up XML or something and then run clean-up on it as a matter of course.

cpressey · 2014-12-02T11:04:25Z

Outputting some kind of tree structure (like XML) and then flattening it (sensibly) is a good approach.

On the other hand, this level of punctuation/spacing messiness is nothing a few rewriting rules can't clean up.

Given that this seems to be a "problem" that several participants have encountered, I'm working on generalizing the code I wrote into a proper reusable tool of some sort. (nice change to be doing engineering again after all that ~~hackery~~ science, too)

Here's what it does, so far, on an excerpt from The Quantum Supposition of Oz:

“Please tell Ozma, Dorothy, and when I visit Ozma she sometimes allows
me to ride upon his back, one seat for each member of the council. The”
H. M. “meant Highly Magnified, if you like,” said he.

“I dunno where this tunnel in the mountain he said to himself:

“Do,” said Nikobob, “said the stuffed one, seriously.

“I've forgotten, and I'm surprised that I was not a live thing; you're a
dummy.”

“It's just nonsense!” declared Dorothy.

(I love that last line :)

I don't know how long I'll spend on perfectionistically engineering this, but I'm hoping to end up with something like BeautifulSoup except for plain text.

If I'm happy with it before 11 more months have passed, I'll announce it on next year's Resources issue :)

enkiv2 · 2014-12-02T12:00:12Z

I've had good luck in the past treating punctuation as its own token, then
normalizing with sed 's/ *([.,?!:;]) */\1 /g;s/ *([([])
*([A-Za-z0-9])/\1\2/g;s/([A-Za-z0-9]) *([)]]) */\1\2/g' -- in other
words, left-aligning all the stops and the right-hand grouping symbols and
right-aligning the left-hand grouping symbols. Then, you need another stage
for handling quotes -- but without balancing, that's more of a pain.

On Tue Dec 02 2014 at 6:04:28 AM Chris Pressey notifications@github.com
wrote:

Outputting some kind of tree structure (like XML) and then flattening it
(sensibly) is a good approach.

On the other hand, this level of punctuation/spacing messiness is nothing
a few rewriting rules can't clean up.

Given that this seems to be a "problem" that several participants have
encountered, I'm working on generalizing the code I wrote into a proper
reusable tool of some sort. (nice change to be doing engineering again
after all that hackery science, too)

Here's what it does, so far, on an excerpt from The Quantum Supposition
of Oz:

“Please tell Ozma, Dorothy, and when I visit Ozma she sometimes allows
me to ride upon his back, one seat for each member of the council. The”
H. M. “meant Highly Magnified, if you like,” said he.

“I dunno where this tunnel in the mountain he said to himself:

“Do,” said Nikobob, “said the stuffed one, seriously.

“I've forgotten, and I'm surprised that I was not a live thing; you're a
dummy.”

“It's just nonsense!” declared Dorothy.

(I love that last line :)

I don't know how long I'll spend on perfectionistically engineering this,
but I'm hoping to end up with something like BeautifulSoup
http://www.crummy.com/software/BeautifulSoup/ except for plain text.

If I'm happy with it before 11 more months have passed, I'll announce it
on next year's Resources issue :)

—
Reply to this email directly or view it on GitHub
#137 (comment)
.

MichaelPaulukonis · 2014-12-02T13:51:27Z

A different approach to markov tokenization - I've worked with punctuation before in different ways, but for text blobs, so I never had to worry about the spacing. I appreciated the links to Racter/PBiHC, since I hadn't seen the template details before.

spc476 · 2014-12-02T15:22:01Z

You're welcome. It's surprising there's so little information about Racter out there (and according to Google, I appear to be one of the experts about Racter---sigh). The source to Racter is out there, but what is there appears to be the post-processed output from INRAC, a custom language used to write Racter. It's bizarre (http://boston.conman.org/2008/06/18.2).

cpressey · 2014-12-02T16:24:04Z

That... is actually a pretty nifty control structure. "Find all labels that match this pattern, then pick one of those labels at random and call it."

spc476 closed this as completed Nov 30, 2014

spc476 reopened this Nov 30, 2014

hugovk added the completed label Nov 30, 2014

hugovk mentioned this issue Nov 30, 2014

Languages used in NaNoGenMo2014 #109

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The Quantum Supposition of Oz #137

The Quantum Supposition of Oz #137

spc476 commented Nov 30, 2014

cpressey commented Nov 30, 2014

ikarth commented Dec 1, 2014

cpressey commented Dec 2, 2014

enkiv2 commented Dec 2, 2014

MichaelPaulukonis commented Dec 2, 2014

spc476 commented Dec 2, 2014

cpressey commented Dec 2, 2014

The Quantum Supposition of Oz #137

The Quantum Supposition of Oz #137

Comments

spc476 commented Nov 30, 2014

cpressey commented Nov 30, 2014

ikarth commented Dec 1, 2014

cpressey commented Dec 2, 2014

enkiv2 commented Dec 2, 2014

MichaelPaulukonis commented Dec 2, 2014

spc476 commented Dec 2, 2014

cpressey commented Dec 2, 2014