Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

该项目成功在Win10上部署 #20

Open
LMY-nlp0701 opened this issue Jul 17, 2020 · 22 comments
Open

该项目成功在Win10上部署 #20

LMY-nlp0701 opened this issue Jul 17, 2020 · 22 comments

Comments

@LMY-nlp0701
Copy link

LMY-nlp0701 commented Jul 17, 2020

成功运行流程及效果图

1.启动MongoDB服务
图片1

2.启动neo4j服务
图片2

3.浏览器上访问 http://localhost:7474/
显示以下的界面:
图片3
用户名和密码与代码中保持一致:auth=("neo4j", "123")
图片4

4.启动Pycharm,在终端中输入运行命令,开始运行
图片5
注:虽然还是有Warning,但是目前还未影响程序运行。

5.最终效果图,neo4j知识图谱会根据抽取的结果动态更新
图片6
图片7
图片8
图片9
注:生成的知识图谱还是有点奇怪,这需要我后续深入研究。

Win10环境配置

前言:所有需要在Windows上的包我都上传到百度网盘上了,有需要的直接下载。
链接:https://pan.baidu.com/s/1buizBSSuT4wIgPUFtUQW9g
提取码:jay1

图片10
下面逐步介绍如何展开配置
1.安装pycharm 社区版 + python3.7.8
2.安装MongoDB 3.2.22
MongoDB安装指南
注:记得跟着指南 安装MongoDB服务
3.安装neo4j
neo4j安装指南
注:由指南可知(第一部分)可知,我们需要先安装Java JRE,我安装的是jdk-14.0.2_windows-x64_bin
neo4j-community-4.1.1-windows直接解压到D盘就好了,记得跟着指南启动Neo4j程序(指南的第四部分)
4.pycharm里的包安装
scrapy 1.6.0
pymongo 3.10.1
neo4j 1.7.6
neo4j-driver 1.7.6

最后一个注意事项:WEB_KG-master\baike\spiders\baike.py文件的29行应改为:driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "123"), encrypted=False)。
还有该baike.py文件中的logging报错是因为linux和windows的文件夹表示不一样,可以改,也可以直接注释掉。

非常感谢作者开源分享了此工作,希望大家能一起学习,我还有些bug,先告辞了!

@lixiang0
Copy link
Owner

@LMY-nlp0701 感谢感谢,一上班看到这个Issue真惭愧,你比我还认真负责。我可以把Issue写进README作为Windows部署参考吧。再次感谢。

@LMY-nlp0701
Copy link
Author

@lixiang0 哈哈哈,还是非常感谢你开源了此项目给我们学习,关注你了,希望以后能再次从你这学到知识!

@zihao-miao
Copy link

我为什么会出现这个问题?——DeprecationWarning: The 'neo4j.v1' package is deprecated, import from 'neo4j' instead
from neo4j.v1 import GraphDatabase。求大佬解答

@dorians5689
Copy link

我为什么会出现这个问题?? — DeprecationWarning:不建议使用“ neo4j.v1”包,请从“ neo4j”导入,而不
要从neo4j.v1 import GraphDatabase。
pip install neo4j
下载neo4j
去掉.v1 使用这个 from neo4j import GraphDatabase

@zihao-miao
Copy link

—弃用警告:不建议使用“ neo4j.v1”包,请从“ neo4j”导入,而不
要从neo4j.v1导入GraphDatabase。pip
install neo4j
下载neo4j
去掉.v1使用这个来自neo4j导入GraphDatabase

谢谢大佬,但是我去掉.v1之后运行不报错但是啥也爬不到。能解答吗
图片

@lixiang0
Copy link
Owner

—弃用警告:不建议使用“ neo4j.v1”包,请从“ neo4j”导入,而不
要从neo4j.v1导入GraphDatabase。pip
install neo4j
下载neo4j
去掉.v1使用这个来自neo4j导入GraphDatabase

谢谢大佬,但是我去掉.v1之后运行不报错但是啥也爬不到。能解答吗
图片

cd WEB_KG/baike
scrapy crawl baike
以上面这种方式运行,直接python ...这种方式是不行的。

@zihao-miao
Copy link

—弃用警告:不建议使用“ neo4j.v1”包,请从“ neo4j”导入,而不
要从neo4j.v1导入GraphDatabase。pipinstall
neo4j
下载neo4j
去掉.v1使用这个来自neo4j导入GraphDatabase。

谢谢大佬,但是我去掉.v1之后运行不报错但是啥也爬不到。能解答吗
图片

cd WEB_KG /
baike scrapy crawl baike
以上述方式运行,直接python ...这种方式是不行的。

我爬取的东西在neo4j上和上面的一样就是,无法爬取出一个完整的关系网

@zihao-miao
Copy link

—弃用警告:不建议使用“ neo4j.v1”包,请从“ neo4j”导入,而不
要从neo4j.v1导入GraphDatabase。pipinstall
neo4j
下载neo4j
去掉.v1使用这个来自neo4j导入GraphDatabase。

谢谢大佬,但是我去掉.v1之后运行不报错但是啥也爬不到。能解答吗
图片

cd WEB_KG /
baike scrapy crawl baike
以上述方式运行,直接python ...这种方式是不行的。

image
这个问题是上传neo4j出现的嘛

@dorians5689
Copy link

dorians5689 commented Sep 10, 2020 via email

@dorians5689
Copy link

dorians5689 commented Sep 10, 2020 via email

@dorians5689
Copy link

dorians5689 commented Sep 10, 2020 via email

@ryc365
Copy link

ryc365 commented Oct 11, 2020

大家好,非常高兴看到这个项目,这个可以改成爬取新冠的吗?我研究生论文中期用呢,谢谢大家帮啊,我也着急呢

@hua7448
Copy link

hua7448 commented Jan 8, 2021

成功运行流程及效果图

1.启动MongoDB服务
图片1

2.启动neo4j服务
图片2

3.浏览器上访问 http://localhost:7474/
显示以下的界面:
图片3
用户名和密码与代码中保持一致:auth=("neo4j", "123")
图片4

4.启动Pycharm,在终端中输入运行命令,开始运行
图片5
注:虽然还是有Warning,但是目前还未影响程序运行。

5.最终效果图,neo4j知识图谱会根据抽取的结果动态更新
图片6
图片7
图片8
图片9
注:生成的知识图谱还是有点奇怪,这需要我后续深入研究。

Win10环境配置

前言:所有需要在Windows上的包我都上传到百度网盘上了,有需要的直接下载。
链接:https://pan.baidu.com/s/1buizBSSuT4wIgPUFtUQW9g 提取码:jay1
图片10
下面逐步介绍如何展开配置
1.安装pycharm 社区版 + python3.7.8
2.安装MongoDB 3.2.22
MongoDB安装指南
注:记得跟着指南 安装MongoDB服务
3.安装neo4j
neo4j安装指南
注:由指南可知(第一部分)可知,我们需要先安装Java JRE,我安装的是jdk-14.0.2_windows-x64_bin
neo4j-community-4.1.1-windows直接解压到D盘就好了,记得跟着指南启动Neo4j程序(指南的第四部分)
4.pycharm里的包安装
scrapy 1.6.0
pymongo 3.10.1
neo4j 1.7.6
neo4j-driver 1.7.6

最后一个注意事项:WEB_KG-master\baike\spiders\baike.py文件的29行应改为:driver = GraphDatabase.driver("bolt://localhost:7687", auth=("neo4j", "123"), encrypted=False)。
还有该baike.py文件中的logging报错是因为linux和windows的文件夹表示不一样,可以改,也可以直接注释掉。

非常感谢作者开源分享了此工作,希望大家能一起学习,我还有些bug,先告辞了!

十分感谢!!
跟着做已经基本OK了,但是有一点小错误,就是处理三元组的时候会有错误
--- Logging error ---
Traceback (most recent call last):
File "d:\users\david\appdata\local\continuum\anaconda3\lib\logging_init_.py", line 1028, in emit
stream.write(msg + self.terminator)
UnicodeEncodeError: 'gbk' codec can't encode character '\xb2' in position 176: illegal multibyte sequence
Call stack:
File "d:\users\david\appdata\local\continuum\anaconda3\lib\runpy.py", line 193, in _run_module_as_main
"main", mod_spec)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\runpy.py", line 85, in run_code
exec(code, run_globals)
File "d:\Users\David\AppData\Local\Continuum\anaconda3\Scripts\scrapy.exe_main
.py", line 7, in
sys.exit(execute())
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\cmdline.py", line 145, in execute
_run_print_help(parser, _run_command, cmd, args, opts)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\cmdline.py", line 100, in _run_print_help
func(*a, **kw)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\cmdline.py", line 153, in _run_command
cmd.run(args, opts)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\commands\crawl.py", line 27, in run
self.crawler_process.start()
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\crawler.py", line 327, in start
reactor.run(installSignalHandlers=False) # blocking call
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\twisted\internet\base.py", line 1283, in run
self.mainLoop()
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\twisted\internet\base.py", line 1292, in mainLoop
self.runUntilCurrent()
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\twisted\internet\base.py", line 913, in runUntilCurrent
call.func(*call.args, **call.kw)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\twisted\internet\task.py", line 671, in _tick
taskObj._oneWorkUnit()
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\twisted\internet\task.py", line 517, in _oneWorkUnit
result = next(self._iterator)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\utils\defer.py", line 74, in
work = (callable(elem, *args, **named) for elem in iterable)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\utils\defer.py", line 120, in iter_errback
yield next(it)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\utils\python.py", line 353, in next
return next(self.data)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\utils\python.py", line 353, in next
return next(self.data)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\spidermiddlewares\offsite.py", line 29, in process_spider_output
for x in result:
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\spidermiddlewares\referer.py", line 340, in
return (_set_referer(r) for r in result or ())
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\spidermiddlewares\urllength.py", line 37, in
return (r for r in result or () if _filter(r))
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\spidermiddlewares\depth.py", line 58, in
return (r for r in result or () if _filter(r))
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\scrapy\core\spidermw.py", line 62, in _evaluate_iterable
for r in iterable:
File "D:\work-place\knowledge Graph\WEB_KG-master\baike\spiders\baike.py", line 108, in parse
self.add_node, entity, attr, value)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\neo4j\work\simple.py", line 403, in write_transaction
return self._run_transaction(WRITE_ACCESS, transaction_function, *args, **kwargs)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\neo4j\work\simple.py", line 309, in _run_transaction
result = transaction_function(tx, *args, **kwargs)
File "D:\work-place\knowledge Graph\WEB_KG-master\baike\spiders\baike.py", line 45, in add_node
name1=name1, name2=name2)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\neo4j\work\transaction.py", line 118, in run
result._tx_ready_run(query, parameters, **kwparameters)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\neo4j\work\result.py", line 57, in _tx_ready_run
self._run(query, parameters, None, None, None, **kwparameters)
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\neo4j\work\result.py", line 97, in _run
on_failure=on_failed_attach,
File "d:\users\david\appdata\local\continuum\anaconda3\lib\site-packages\neo4j\io_bolt4.py", line 184, in run
log.debug("[#%04X] C: RUN %s", self.local_port, " ".join(map(repr, fields)))
Message: '[#%04X] C: RUN %s'
Arguments: (6392, "'MERGE (a:Node {name: $name1}) MERGE (b:Node {name: $name2}) MERGE (a)-[:占地面积]-> (b)' {'name1': '福州教育学院第二附属中学', 'name2': '73854.2 m²'} {}")
望有空之余能帮忙解决,万分感谢!

@lixiang0
Copy link
Owner

@hua7448 把logging相关的都注释掉

@kamisama2b
Copy link

卧槽,太感谢了,所有的错误我都遇到一遍,全部照着你的运行流程配置。windows太难了

@chrislouis0106
Copy link

哈哈看到大家都可以复现成功,我也来试试

@hua7448 把logging相关的都注释掉
哈哈没啥,就是你在定义log name的时候将时间(with “:”)出现在文件路径中。

logfile_name = time.ctime(time.time()).replace(' ', '_').replace(':','_')

将baike.py文件的第13行简单修改下。

@ada-carl
Copy link

大家有遇到MongoDB中存的text全为空的情况吗

@hua7448
Copy link

hua7448 commented Dec 29, 2021 via email

@adventurexw
Copy link

想问一下大家,怎么修改才能爬取别的数据库啊? 尝试后发现只修改这一句好像没用start_urls = ['https://baike.baidu.com/item/文汇报']

@hua7448
Copy link

hua7448 commented Jul 21, 2022 via email

@LLMApple
Copy link

大家有遇到MongoDB中存的text全为空的情况吗

同!!!然后好像就保存不进去db_triples了

@Bruce-Yue
Copy link

为什么运行后直接退出了?
运行后直接退出

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests