-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Almost-but-not-quite #10
Comments
Link Dump
|
start of crude proof-of-concept code here. Includes some not-quite-as-crude code from another project I've done. Which uses the nlp-compromise package, instead of natural. I'm going to look into swapping those out. |
Sooooooo.... the light dawns on Marblehead: I'm using Levenshtein (edit-distance), wheras Kazemi used Word2Vec - which gives a semantic distance. Edit-distance is purely an accident of orthography. So, what I've got is not nearly as interesting as I was hoping for (as usual). It is of some interest, and I'll post some examples later this week (I'm desperately short on time this year, le sigh). |
If you could normalize both to a scale between 0 and 1 you could multiply On Tue, Nov 8, 2016 at 11:15 AM Michael Paulukonis notifications@github.com
|
I think I'm going to do some overkill and play with retext and the nodes of its natural language concrete syntax tree. Which has some charms as paragraph and sentence tokenization, and the ability to recreate the original text. I find the online examples of using retext and nlcst to be sub-optimal. Also, I'm curious why the project works asynchronously, when there are no asynchronous sub-elements. |
@enkiv2 - What would that do? Pretend I'm almost statistically innumerate.... There are libs that provide a 0..1 edit distance; I happened to pick a package that didn't. We've got a baby coming in < 3 weeks, so I'm not going to get into too much craziness. Figuring out how to get |
If you had the two factors scaled the same way, and multiplied them, you On Fri, Nov 11, 2016 at 11:17 AM Michael Paulukonis <
|
@enkiv2 we're ranking sentences, not words. I'm still not clear on what I would multiply. It only took 11 hours, but that's also because the computer slept for much of that time. |
I guess if we're ranking sentences that's a much harder problem. I don't On Fri, Nov 18, 2016 at 10:51 AM Michael Paulukonis <
|
There's been some work with vectors at the sentence, paragraph, and document level. Look into doc2vec. |
Kazemi's project last year used word2vec - which I missed when I started the project. I was trying to do a single-language (NodeJS) solution. Not quite possible. |
@enkiv2, you may want to give skip-thought vectors a try. |
@ikarth part of this was NOT using doc2vec since that's not NodeJS. Another part was thinking that Kazemi had not used it, either. Something I did discover is some word-vectors as JSON - https://igliu.com/word2vec-json/ I'm going to call it quits for the month. I've got a novel, I didn't hit my objective of a nicely packaged We've got another baby due on Dec 1, so I'm going to finish off the month focusing on that! The entire novel has been appended to gist @ https://gist.github.com/MichaelPaulukonis/2b2d47a5e22066e950c39841b9a6c889 |
My main project will be to complete an npm module for getting texts that are almost-but-not-quite the same as the source text.
The idea is rougly the same as @dariusk's Harpooners and Sailors (here (source) and here (output+notes)) from last year - but wrapped up into a nice reusable package.
I think I would like to use such a module for other projects, so this is a good time to git-r-done.
Plus, I've been holding off the implementation of it until November, anyway.
The text was updated successfully, but these errors were encountered: