Skip to content

Tokenizer additional fixes and span method #36

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 319 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
319 commits
Select commit Hold shift + click to select a range
15e8c4a
notebooks and document tweaks
tpeng Oct 16, 2013
246e13e
fix NL/IE openhours train notebooks
tpeng Oct 16, 2013
696cf1b
tweak IE openhours training notebook
tpeng Oct 17, 2013
73d9b65
set encoding argument implicitly
tpeng Oct 18, 2013
37e432e
Merge pull request #3 from scrapinghub/fix-encoding
kmike Oct 18, 2013
ce8ac23
fix some IE training data
tpeng Oct 21, 2013
b9d843d
bump version to 0.1.1
tpeng Oct 22, 2013
876f20c
retrain IE openhours parser and checkin the generated model to make i…
tpeng Oct 22, 2013
0fa6f9a
annotate IE pages for parsing address
tpeng Nov 1, 2013
e98d5e0
add notebook for training IE address parser
tpeng Nov 1, 2013
bbbd394
fix some IE annotated data
tpeng Nov 1, 2013
834ac42
add inside_bold_feature and also tokenize the comma on end of a string
tpeng Nov 1, 2013
44e1e3c
update IE address parser nb
tpeng Nov 1, 2013
84e5d46
add utils to do more cleanups in html
tpeng Nov 1, 2013
e1cb18a
update the IE address parser nb
tpeng Nov 1, 2013
c43e89d
move the htmls cleanups to HTML feature generator's subclass
tpeng Nov 4, 2013
815e0dd
convert the h2 to strong too
tpeng Nov 4, 2013
4962dec
fix typo
tpeng Nov 4, 2013
f487f9a
split on comma after remove the comma in digits
tpeng Nov 4, 2013
3a674d6
fix tokenizer
tpeng Nov 4, 2013
e3616b0
fix GATE broken br elements
tpeng Nov 6, 2013
6b9050b
retrain the IE address parser and check in the models
tpeng Nov 6, 2013
644c486
more fixes on the broken annotated pages by GATE
tpeng Nov 7, 2013
09d194a
Merge pull request #4 from scrapinghub/annotate-ie-address
tpeng Nov 7, 2013
c048a6b
update document and bump version
tpeng Nov 7, 2013
819cfce
training data fixes
kmike Dec 13, 2013
31a85fc
big refactoring
kmike Dec 13, 2013
01949fa
split features into token features and global features
kmike Dec 17, 2013
8e3f7e3
make HtmlToken.token unicode
kmike Dec 17, 2013
61553a6
«tag» now means NER tag
kmike Dec 24, 2013
ccaf723
HtmlLoader
kmike Dec 24, 2013
f7efc8b
added support for WebAnnotator > 1.14 title annotation feature
kmike Dec 25, 2013
848e94b
break interface again: fit/transform methods now accepts multiple seq…
kmike Dec 25, 2013
e1257eb
tokenization changes: split by «|», make tokenizer aware of some unic…
kmike Dec 26, 2013
2f5d34d
one more tokenization fix
kmike Dec 26, 2013
9867983
trainer for Wapiti CRF models
kmike Dec 26, 2013
00c328e
make HtmlTokenizer and HtmlFeatureExtractor work on lists of trees by…
kmike Dec 26, 2013
5a5b455
add load_trees helper for bulk loading data
kmike Dec 26, 2013
7f5ecb1
add WapitiCRF to top-level exports
kmike Dec 26, 2013
02b79f5
update requirements.txt
kmike Dec 26, 2013
00bdf22
remove WapitiChunker; add transform and score methods to WapitiCRF
kmike Dec 26, 2013
a09b0a4
attributes are renamed to fix serialization and __repr__
kmike Dec 27, 2013
8942108
HtmlLoader cleanup
kmike Dec 27, 2013
fcf9840
smart_join utility function
kmike Dec 27, 2013
0bf420e
add support for auto-extracting dev data for wapiti training
kmike Dec 27, 2013
8917fba
move load_trees to the bottom and expose it in webstruct top-level na…
kmike Dec 27, 2013
874a2d5
a couple of helpers for easier training and prediction
kmike Dec 27, 2013
89e5779
smarter smart_join
kmike Dec 27, 2013
04e1728
improved docstring for IobEncoder.group
kmike Jan 9, 2014
113ae9c
extract_raw method for model.NER
kmike Jan 9, 2014
62131d9
heuristic algorithm for grouping entities into clusters
kmike Jan 10, 2014
e809864
minor docstring fix
kmike Jan 10, 2014
3892a7c
[wip] gazetteers support
kmike Dec 17, 2013
81a567c
Drop prebuilt gazetteer features; better utils for creating own gazet…
kmike Jan 13, 2014
a262e0b
minor docstring fix for geonames.read_geonames; extract csv parameter…
kmike Jan 14, 2014
e2eb818
utility for reading zipped geonames files
kmike Jan 14, 2014
b205c04
import pandas only on demand
kmike Jan 14, 2014
ab797de
don’t remove forms and annoying tags by default
kmike Jan 18, 2014
0111de9
support passing LongestMatch instances to LongestMatchGlobalFeature
kmike Jan 29, 2014
6d8793c
split token_shape feature function into several smaller functions
kmike Jan 29, 2014
945f219
split prefix and suffix features
kmike Jan 29, 2014
545ae30
cut-off support for HtmlFeatureExtractor
kmike Jan 30, 2014
9064c58
Merge pull request #6 from kmike/refactor
kmike Feb 12, 2014
5e2a77b
move Ireland address parsing out
tpeng Feb 13, 2014
a609e29
Merge pull request #9 from scrapinghub/moving-ie-address-parser
kmike Feb 13, 2014
0c46c82
Simpler regex for email matching.
kmike Feb 21, 2014
e16d0f0
add dev requirements - they are needed only to run tests and build docs
kmike Feb 25, 2014
d513c33
add functions and classes from webstruct.model to top-level namespace
kmike Feb 25, 2014
a6d7758
rename htmltoken_lists argument to html_token_lists
kmike Feb 25, 2014
deb6aab
make marisa_trie import optional
kmike Feb 25, 2014
2ca2bc8
a lot of documentation improvements
kmike Feb 25, 2014
67867c9
requirements fixes
kmike Feb 25, 2014
8177cb9
allow to build docs without installing lxml and scikit-learn
kmike Feb 25, 2014
1a80ed9
requirements-doc.txt
kmike Feb 25, 2014
c72a4c4
DOC more docs
kmike Feb 26, 2014
e531fca
rename some attributes_ to attributes
kmike Feb 26, 2014
eac30ea
DOC better tutorial (work in progress)
kmike Feb 27, 2014
bafd2c9
DOC tutorial improvements
kmike Feb 27, 2014
4d3b47d
DOC minor tutorial fixes
kmike Feb 27, 2014
e5eec90
DOC split api.rst into several files and other doc improvements
kmike Feb 28, 2014
86d1ade
(backwards-incompatible) move create_wapiti_pipeline to webstruct.wapiti
kmike Feb 28, 2014
8a4c426
DOC tutorial improvements
kmike Feb 28, 2014
0433506
a hook for customizing NER.extract results
kmike Feb 28, 2014
626fc56
NER.extarct_groups method
kmike Feb 28, 2014
3d5efa2
DOC minor tutorial improvements
kmike Feb 28, 2014
a061c33
fix NER extract methods
kmike Feb 28, 2014
01de5be
DOC entity grouping docs
kmike Feb 28, 2014
e3ff287
DOC minor entity grouping docs fixes
kmike Feb 28, 2014
8736a34
DOC move NER and Entity Grouping chapters out of parent section
kmike Mar 1, 2014
bac8829
HtmlToken.root attribute
kmike Mar 3, 2014
b8f9c1f
Don't let __START/END_TAG__ special tokens appear in trees accessible…
kmike Mar 3, 2014
f2c3963
make encoding argument optional for html_document_fromstring
kmike Mar 3, 2014
1ca1761
start webstruct.webannotator module
kmike Mar 3, 2014
c69d9ec
HtmlTokenizer.detokenize_single method for undoing HtmlTokenizer.toke…
kmike Mar 3, 2014
3b1090a
WIP crfsuite support
tpeng Mar 3, 2014
ab7b3de
some small refactor and document improvement
tpeng Mar 5, 2014
b35923e
better stripping regex for __START/END_TAG__ tokens
kmike Mar 6, 2014
8002062
WebAnnotator writer
kmike Mar 8, 2014
36595a0
workaround for lxml < 3.1.2
kmike Mar 8, 2014
9556c2d
fix tostr in previous change
tpeng Mar 8, 2014
59e127b
Less aggressive cleaning: preserve scripts and stylesheets, but don't…
kmike Mar 12, 2014
af239ee
fix webstruct.webannotator handling of attributes that are valid in H…
kmike Mar 12, 2014
1117ef7
make it possible to have consistent colors in to_webannotator function
kmike Mar 18, 2014
5045e60
abandon CRF++ template in CRFsuite backend
tpeng Mar 18, 2014
e791950
fix test failures
tpeng Mar 18, 2014
9637efe
implement ngrams as global feature function
tpeng Mar 24, 2014
f767d7a
allow WebAnnotatorLoader handle nested annotation by introducing know…
tpeng Mar 26, 2014
1a5d66f
change known_tags to known_entities and make it optional for WebAnnot…
tpeng Mar 26, 2014
e404a13
fix error message
tpeng Mar 26, 2014
0e90395
fix error message again
tpeng Mar 26, 2014
e3e5972
DOC: add example for WebAnnotatorLoader
tpeng Mar 26, 2014
7d65b73
change known_entities from list to set in GateLoader too
tpeng Mar 26, 2014
d74a7f6
Merge pull request #11 from tpeng/fix-wa-loader
kmike Mar 26, 2014
c48e748
add us_contact_pages converted to WebAnnotator format
kmike Apr 21, 2014
7c6809d
small fixes to annotation guidelines
kmike Apr 21, 2014
373f62e
delete stale README.rst file which contents was migrated to docs
kmike Apr 21, 2014
70c7408
small cleanup
kmike Apr 21, 2014
f1a860b
DOC nuke old README; minor other documentation fixes.
kmike Apr 21, 2014
f194a1a
improved setup.py
kmike Apr 21, 2014
eea07a6
DOC notes about model development and other tutorial improvements
kmike Apr 21, 2014
1bfc028
DOC installation notes
kmike Apr 21, 2014
dcdbbbe
DOC Python 2.7 is required.
kmike Apr 21, 2014
a8d7f96
hello 0.2
kmike Apr 21, 2014
ccddce0
DOC better wording
kmike Apr 22, 2014
ebaa38c
DOC document webstruct.webannotator
kmike Apr 22, 2014
9f3420e
TST better test coverage for webstruct.utils
kmike Apr 22, 2014
6185516
ignore html coverage reports
kmike Apr 22, 2014
3abe528
TST better test coverage for webstruct.webannotator
kmike Apr 22, 2014
1d6e560
(backwards-incompatible) rename webstruct.tokenizers to webstruct.tex…
kmike Apr 22, 2014
4fb4419
split the PR into 2 parts
tpeng Apr 22, 2014
568862f
missing file in previous change
tpeng Apr 22, 2014
98b3ae4
Remove example notebooks and models from repo.
kmike Apr 22, 2014
e3defff
Merge pull request #10 from tpeng/crfsuite-backend
kmike Apr 22, 2014
40a5415
simplify CombinedFeatures and make it private
kmike Apr 22, 2014
45a005f
features.utils -> feature.global_features
kmike Apr 22, 2014
4ee7f40
TST fix tests
kmike Apr 22, 2014
a636a9a
replace Ngram global feature with Pattern
kmike Apr 22, 2014
04eed65
DOC fix autodocs
kmike Apr 22, 2014
115a5a4
DOC minor fixes
kmike Apr 23, 2014
52759bd
(backwards-incompatible) kill default features:
kmike Apr 23, 2014
a91d1c9
(backwards-incompatible) rename "transform" to "predict" for estimato…
kmike Apr 23, 2014
ab1b589
TST don't require NLTK for tests
kmike Apr 24, 2014
9204eec
simple __repr__ for HtmlToken
kmike Apr 24, 2014
829f708
(backwards-incompatible) all create_wapiti_pipeline wapiti params
kmike Apr 24, 2014
e52ab9e
WordTokenizer.tokenize rewritten
chekunkov May 5, 2014
98a2a0b
doctests indent
chekunkov May 5, 2014
989072c
fix unicode handling for a new tokenizer; add pounds char to rules
kmike May 12, 2014
177ad80
Merge branch 'speed_up_text_tokenizer' of https://github.com/chekunko…
kmike May 12, 2014
5fe04f6
Merge pull request #16 from scrapinghub/speed_up_text_tokenizer
kmike May 12, 2014
226e53f
small tokenizer cleanup
kmike May 13, 2014
24926c5
make min_length and max_length arguments required for utils.substrings
kmike May 14, 2014
b6d60f1
add crfsuite backend base on python-crfsuite
tpeng Apr 23, 2014
e3ef37a
DOC: fix crfsuite docstring
tpeng Apr 24, 2014
f96cae1
DOC fix style and typo
tpeng Apr 24, 2014
383f8b7
fix HtmlTokenizer pickling
kmike May 15, 2014
0adaaf2
WapitiCRF.fit returns self
kmike May 15, 2014
92553b7
train_test_split_noshuffle
kmike May 15, 2014
55598e0
TST runcoverage script
kmike May 15, 2014
a2111d4
python-crfsuite support; tests for NER and crfsuite pipeline
kmike May 15, 2014
01b0ee6
expose CRFsuiteCRF and CCRFsuiteFeatureEncoder
kmike May 16, 2014
0f248b6
rename wapiti_kwargs to crf_kwargs for consistency
kmike May 16, 2014
441ebf4
move tostr to wapiti module because it is wapiti-specific
kmike May 16, 2014
7d12376
NER.annotate and NER.annotate_url methods
kmike May 16, 2014
85e9407
Abstract temporary model files handling; add this feature to wapiti. …
kmike May 16, 2014
9525c46
A corpus (not annotated yet) with 450 pages from business websites in…
kmike May 19, 2014
38730d8
add EMAIL to dtd in order to load annotated files properly
kmike May 19, 2014
4619e8f
annotation fixes
kmike May 19, 2014
be9a91c
Fix html produced by WebAnnotator.
kmike May 19, 2014
591051d
(backwards incompatible) drop existing `load_trees`; rename `load_tre…
kmike May 20, 2014
5bb3768
make it possible to use existing WebAnnotator colors
kmike May 20, 2014
6cd6265
+100 annotated pages
kmike May 20, 2014
2e746c4
annotation fixes
kmike May 21, 2014
223d8f1
annotation fixes
kmike May 21, 2014
8875d3c
more annotation fixes
kmike May 21, 2014
146ad5e
+100 pages
kmike May 21, 2014
448048e
annotation fixes
kmike May 21, 2014
87279df
BUG fix an issue with WebAnnotatorLoader: it shouldn't add extra "Non…
kmike May 21, 2014
2150bda
fix a test after annotation fix
kmike May 21, 2014
79d81c5
easier Trainer customization for CRFsuiteCRF
kmike May 26, 2014
a98431e
X_dev and y_dev support for webstruct.crfsuite
kmike May 26, 2014
1c47f9e
+100 pages
kmike May 27, 2014
e9ebeaa
doctests (failing) for some tokenization gotchas
kmike May 27, 2014
f80c382
expose LongestMatchGlobalFeature
kmike May 27, 2014
1c17e7c
annotations fix
kmike May 27, 2014
17a5d4e
one more failing tokenization example
kmike May 27, 2014
9d8fcdc
webstruct.gazetteers.geonames.read_geonames_zipped: try to handle geo…
kmike May 28, 2014
ce775e6
DAWG gazetteers support (they are much faster than MARISA-based, but …
kmike May 28, 2014
6ee718f
more annotated data
kmike May 28, 2014
ed40e3e
CRFsuiteFeatureEncoder is not needed with python-crfsuite==0.6
kmike May 28, 2014
b2cb0e7
Undocumented HtmlFeatureExtractor post-processing step is removed to …
kmike May 28, 2014
649c814
bias feature
kmike May 28, 2014
12be72e
tiny speedup for BestMatch._find_matches
kmike May 28, 2014
727f61b
NER.extract_groups_from_url
kmike May 30, 2014
cd1860d
export webstruct.smart_join
kmike May 30, 2014
56cd57e
annotation fixes (more locations for about 70 pages)
kmike May 30, 2014
4019595
DOC suggest to use "Save as" in WebAnnotator
kmike Jul 7, 2014
c0448c9
get rid of seqlearn dependency
tpeng Aug 11, 2014
3dc2024
fix document
tpeng Aug 11, 2014
6e36995
Merge pull request #23 from tpeng/remove-seqlearn-deps
kmike Aug 11, 2014
9cfe657
Update requirements so that they will work automatically
Suor Feb 14, 2015
1c4c378
Set up tox to test py27, py33, py34 and docs
Suor Feb 14, 2015
3060950
Add Travis CI config
Suor Feb 14, 2015
ee25440
Use miniconda to test on Travis CI
Suor Feb 26, 2015
225cc76
Merge pull request #28 from Suor/travis
kmike Feb 26, 2015
c7c79b5
Migrate code to support Python 3
Suor Feb 27, 2015
4de7573
Rename cross module to compat
Suor Feb 27, 2015
b5d19c8
Get rid of bprint()/bformat()
Suor Feb 28, 2015
0e66518
Return to more natural doctest in HtmlTokenizer.tokenize_single()
Suor Feb 28, 2015
d2d3d5c
Set ELLIPSIS and IGNORE_UNICODE as default doctest options
Suor Mar 3, 2015
22c27c4
Add Python 3 version modifiers to setup.py
Suor Mar 3, 2015
86d44e6
Update python version requirements in installation docs
Suor Mar 3, 2015
d7e2fae
Merge pull request #29 from Suor/py3-clean
kmike Mar 3, 2015
b3be38c
add Travis badge to readme
kmike Mar 3, 2015
eba8084
fix requirements.txt: cython is no longer needed; bump python-crfsuit…
kmike Mar 3, 2015
a54dae3
Fix setup.py requires
Suor Apr 14, 2015
d21a83f
fixing typo: toolikit -> toolkit
carlosp420 Jul 18, 2015
8133674
Merge pull request #31 from carlosp420/patch-0
kmike Jul 19, 2015
06be1b4
declare Python 3.5 support
kmike Sep 19, 2016
d8f1d0a
bump version to 0.3
kmike Sep 19, 2016
6d3d109
Merge pull request #30 from Suor/master
kmike Nov 16, 2016
005c88b
fixed compatibility with recent scikit-learn
kmike Nov 16, 2016
f8fa440
TST simplify travis.yml. See GH-33.
kmike Nov 16, 2016
d043435
TST don’t test with Python 3.3
kmike Nov 16, 2016
0dfc6ac
TST don’t run tests twice for pull requests
kmike Nov 16, 2016
e7d552e
Merge pull request #34 from scrapinghub/fix-ci
kmike Nov 16, 2016
920df38
(backwards incompatible) remove custom CRFsuite wrapper, use sklearn-…
kmike Nov 16, 2016
c49301f
Merge pull request #35 from scrapinghub/sklearn-crfsuite
kmike Nov 16, 2016
93fc8c2
DOC more documentation for webstruct_data datasets
kmike Nov 16, 2016
db287d2
annotation fixes: emails, org names
kmike Nov 16, 2016
2c611c4
preserve comments in loaded trees
kmike Nov 16, 2016
03a82b4
annotations: remove problematic js code
kmike Nov 17, 2016
c51140a
DOC clarify known_entities of GateLoader
kmike Nov 17, 2016
1d0f4ac
add country names gazetteer
kmike Nov 17, 2016
9000067
TST switch to pytest, check that docs are building without warnings
kmike Nov 25, 2016
71f1e34
gitignore more files
kmike Nov 25, 2016
54b61a6
TST revert strict doc check
kmike Nov 25, 2016
a509bcb
Update codecov.yml
kmike Nov 25, 2016
e0fde7e
add codecov badge
kmike Nov 25, 2016
51684c0
DOC whoops, fix whitespaces in README
kmike Nov 25, 2016
3e05642
fixed NER.extract_groups_from_url `dont_penalize` argument
kmike Nov 25, 2016
d44d6f4
extract_entity_groups utility function
kmike Nov 25, 2016
6000221
move HtmlTokenizer to its own module
kmike Nov 25, 2016
8e5d98c
DOC trying to fix readthedocs build
kmike Nov 26, 2016
786a1f0
DOC try to fix readthedocs, again..
kmike Nov 26, 2016
9b7986b
bump version to 0.4; add changelog
kmike Nov 26, 2016
784fd3a
DOC typo fixes
kmike Nov 26, 2016
071bc78
fixed NER.extract bug
kmike Nov 28, 2016
628c8c2
bump version
kmike Nov 28, 2016
2f8ff81
tokenizer - dot regex fix. WordTokenizer refactoring to be able to re…
chekunkov Jun 7, 2014
2be9325
fixed broken doctests
chekunkov Jun 7, 2014
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,9 @@ pip-log.txt
# Unit test / coverage reports
.coverage
.tox
cover
nosetests.xml
.cache

# Translations
*.mo
Expand All @@ -35,5 +37,9 @@ nosetests.xml
.pydevproject

# Other
.idea
webstruct_data/datastore

.ipynb_checkpoints
docs/_build
webstruct_data/todo
notebooks/old
33 changes: 33 additions & 0 deletions .travis.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
language: python
python: 3.5
sudo: false
branches:
only:
- master
- /^\d\.\d+$/
env:
- TOXENV=py27
- TOXENV=py34
- TOXENV=py35
- TOXENV=docs

addons:
apt:
packages:
- python-numpy
- python-scipy
- libatlas-base-dev
- liblapack-dev
- gfortran

install:
- pip install -U pip tox codecov

script: tox

after_success:
- codecov

cache:
directories:
- $HOME/.cache/pip
29 changes: 29 additions & 0 deletions CHANGES.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
Changes
=======

0.4.1 (2016-11-28)
------------------

* fixed a bug in NER.extract().

0.4 (2016-11-26)
----------------

* sklearn-crfsuite_ is used as a CRFsuite wrapper, CRFsuiteCRF class
is removed;
* comments are preserved in HTML trees because recent Firefox puts
``<base>`` tags to a comment when saving pages, and this affects
WebAnnotator;
* fixed 'dont_penalize' argument of webstruct.NER.extract_groups_from_url;
* new webstruct.model.extract_entity_groups utility function;
* HtmlTokenizer and HtmlToken are modev to their own module
(webstruct.html_tokenizer);
* test improvements;

.. _sklearn-crfsuite: https://github.com/TeamHG-Memex/sklearn-crfsuite

0.3 (2016-09-19)
----------------

There are many changes from previous version: API is changed,
Python 3 is supported, better gazetteers support, CRFsuite support, etc.
35 changes: 35 additions & 0 deletions README.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
Webstruct
=========

.. image:: https://travis-ci.org/scrapinghub/webstruct.svg?branch=master
:target: https://travis-ci.org/scrapinghub/webstruct

.. image:: https://codecov.io/gh/scrapinghub/webstruct/branch/master/graph/badge.svg
:target: https://codecov.io/gh/scrapinghub/webstruct


Webstruct is a library for creating statistical NER_ systems that work
on HTML data, i.e. a library for building tools that extract named
entities (addresses, organization names, open hours, etc) from webpages.

Unlike most NER systems, webstruct works on HTML data, not only
on text data. This allows to define features that use HTML structure,
and also to embed annotation results back into HTML.

Read the docs_ for more info.

License is MIT.

.. _docs: http://webstruct.readthedocs.org/en/latest/
.. _NER: http://en.wikipedia.org/wiki/Named-entity_recognition

Contributing
------------

* Source code: https://github.com/scrapinghub/webstruct
* Bug tracker: https://github.com/scrapinghub/webstruct/issues

To run tests, make sure tox_ is installed, then run
``tox`` from the source root.

.. _tox: https://tox.readthedocs.io/en/latest/
13 changes: 0 additions & 13 deletions block_model/README.md

This file was deleted.

11 changes: 0 additions & 11 deletions block_model/convert_html.py

This file was deleted.

16 changes: 0 additions & 16 deletions block_model/convert_labeled_data.py

This file was deleted.

132 changes: 0 additions & 132 deletions block_model/data/1.html

This file was deleted.

32 changes: 0 additions & 32 deletions block_model/data/1.txt

This file was deleted.

Loading