Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Word vector text inflector #126

Open
mbwolff opened this issue Nov 27, 2016 · 1 comment
Open

Word vector text inflector #126

mbwolff opened this issue Nov 27, 2016 · 1 comment

Comments

@mbwolff
Copy link

mbwolff commented Nov 27, 2016

Using gensim to build a word2Vec model based on over 1300 French texts from the nineteenth century, I am writing code that takes a pair of words (e.g. "homme" and "femme") and a text (Le Père Goriot, by Balzac) as parameters and generates an "modulated" text. Each word in the original text is replaced by a word that is "most similar" to it according to the word pair. For instance, if "roi" is a word in the original text, it would be replaced thusly:

>>> model.most_similar(positive=['femme', 'roi'], negative=['homme'], topn=1)
[(u'reine', 0.8085041046142578)]

Handling verb conjugations and adjective agreements in French is tricky but I aim to produce a mostly readable text. The code will hopefully be able to "modulate" any text in French against any pair of words.

@mbwolff mbwolff changed the title Word vector text inflector Madame Bovary déclinée Nov 28, 2016
@mbwolff mbwolff changed the title Madame Bovary déclinée Word vector text inflector Nov 28, 2016
@mbwolff
Copy link
Author

mbwolff commented Nov 28, 2016

And it's more or less done! Here's the repository with the input text, code, vector data and output. The generated novel is Madame Bovary Modulée, based on Flaubert's famous text.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants