Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intent to participate #29

Open
christiaanw opened this issue Oct 25, 2014 · 7 comments
Open

Intent to participate #29

christiaanw opened this issue Oct 25, 2014 · 7 comments

Comments

@christiaanw
Copy link

I know NaNoGenMo from scouring Github for useful python code for a project I am (was?) working on. Given that there were some interesting contributions last year, such as In-Dialogue, The-Swallows and the NovelHarvesterBot I'm thinking of a hack taking off from one of these approaches to get something interesting.

@christiaanw
Copy link
Author

Ok, so I made this thing that runs throught the wordnet ontology starting from a root noun and going over every hyponym recursively, and dumping definitions and names in to sentence templates. I started off with animal (after the Celestial Emporium of Beneficial Knowledge), but that does not yield quite enough words, so then I ran it for entity.

Might need some formatting into a nice pdf, but am having issues with pandoc.

I'm not feeling done with NaNoGenMo, yet, though.

--updated comment after moving around with repositories

@hugovk
Copy link
Collaborator

hugovk commented Nov 4, 2014

Interesting animal stays almost animally all the way down, but entity quickly spreads out.

Seems to be a problem with capital vowels:

A packhorse is...
An omnivore is...
A Irish water spaniel is...

And any chance to detect mass nouns?

A seawater is...

@christiaanw
Copy link
Author

I hadn't noticed that capitalization doesn't work for capital vowels. I used out-of the box functions from Pattern and NLTK for everything with some workarounds for the most obvious issues. I had noticed the mass nouns:

A matter is...

And I figured I could maybe infer whether to use the article from the use of articles in the usage examples for each synset, but that is not reliable enough. Also, some synset definitions lack an initial 'a/an'

A hearing dog is dog trained to assist the deaf by signaling the occurrence of certain sounds.

But just adding a/an might improperly catch mass nouns.

I'm satisfied with this for now, as I'm more interested in exploring meaning than getting the details of grammar in order for my entry for NaNoGenMo.

@christiaanw
Copy link
Author

Forked the repo, moved the print-wordnet thingy into it and added some other things I'm working on to it. In extract-phrases I've hacked together thephrase extraction utility from patent-generator to extract two different kinds of text chunks to make a single huge sentence. Could be longer but I has some kind of error with the gutenberg header cleaning that halted the extraction process prematurely. It yields something quite similar in tone as @cpressey 's poetic inventory.

Extract from the generated novel:

Consider a low man, or a hard-working brother, or a first princess, or an old
kingfisher, or an ingenious master, or a stout woman, or a new tree, or a soft
female, or a sixth man, or a heroic defender, or a slender-culmed grass, or an
unfortunate companion, or an importunate person, or an own partner, or a
black cook, or a third caller, or a head wife, or a christian monarch, or an
unlucky friend, or

Lastly, in segmented-markov I'm trying to mess with a Markov language model based on Peter Norvig's letter n-gram counts to generate a weighted random string of characters, which then get pushed through his text segmentation functions, yielding stuff like:

st men ag gazon dfesses tura media ls of fork to texputoneculdesoumst on forded for tsr urns or
misha m who ment tea tions t

Python gets into recursions fast when segmenting the text with Norvig's code, so strings larger than about 200 chars will give a RuntimeError: maximum recursion depth exceeded. Might generate a 250.000 character string and pass it in 100 char chunks to it. Could also write a weighted monkey script banging out Cicero with it? Or it could be combined with checkerboard layout, perhaps? Maybe toss out the junk?

@cpressey
Copy link

Interesting stuff. Note that it is possible to increase Python's recursion depth limit, if you think it will help, at your own peril -- a quick web search returned this eye-opening article...

@enkiv2
Copy link

enkiv2 commented Nov 12, 2014

You could just convert it to a loop with an explicit stack. It would be a
lot safer than screwing with the recursion limit, and you could even keep
args in order by using a list of tuples as your stack.

On Wed Nov 12 2014 at 11:25:33 AM Chris Pressey notifications@github.com
wrote:

Interesting stuff. Note that it is possible to increase Python's
recursion depth limit, if you think it will help, at your own peril -- a
quick web search returned this eye-opening article
http://seriously.dontusethiscode.com/2013/04/14/setrecursionlimit.html
...


Reply to this email directly or view it on GitHub
#29 (comment)
.

@cpressey
Copy link

@enkiv2 Yes, rewriting it with an explicit stack would be proper engineering (but I'm all about the science this year you see, and I thought that article was a nice bit of, uh, Python science. shudder)

Plus there's always that certain faint dishearteningness that comes with making edits to third-party code, no matter how nice the code and/or the license. Should I send these upstream? Should I maintain a fork? Etc.

@christiaanw I'd be honoured if you (or anyone) could do something with checkerboard-layout; I have a few more ideas along those lines, but didn't want to do too many "optical" experiments because they seem... slightly out of sync with the rest of NaNoGenMo.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

4 participants