Skip to content

Commit

Permalink
Links update (#251)
Browse files Browse the repository at this point in the history
* Updated desc for odd size in readme

* fixed language

* Added odd size issue to issues description in tutorial:

* Removed unnecessary line causing a warning message

* Updated instructions for skipping notebook execution

* Updated absolute links to relative links in documentation

* added hidden tags to dataset download cells

* Updated link checker'

* Updated link checker'

* Updated link checker'

* Updated link checker'

* Updated link checker'

* Updated tutorial

* Revert accidentally hidden cells

* Updated tqdm to tqdm.auto

* Updated docs requirements

* Updated tutorial notebooks

* Updated tags
  • Loading branch information
sanjanag authored Feb 13, 2024
1 parent 972f060 commit 8d8ffaf
Show file tree
Hide file tree
Showing 11 changed files with 72 additions and 46 deletions.
13 changes: 9 additions & 4 deletions .github/workflows/links.yml
Original file line number Diff line number Diff line change
Expand Up @@ -17,10 +17,15 @@ jobs:
find . -name '*.html' -delete
- run: |
find . -name '*.md' -exec pandoc -i {} -o {}.html \;
- uses: anishathalye/proof-html@v1
- uses: anishathalye/proof-html@v2
with:
directory: .
check_html: false
check_favicon: false
empty_alt_ignore: true
url_ignore_re: |
^https:\/\/twitter.com\/CleanlabAI
ignore_missing_alt: true
ignore_empty_alt: true
tokens: |
{"https://github.com": "${{ secrets.GITHUB_TOKEN }}"}
swap_urls: |
{"^(\\..*)\\.md(#?.*)$": "\\1.md.html\\2",
"^(https://github\\.com/.*)#.*$": "\\1"}
2 changes: 1 addition & 1 deletion DEVELOPMENT.md
Original file line number Diff line number Diff line change
Expand Up @@ -123,7 +123,7 @@ pip install -r docs/requirements.txt
sphinx-build docs/source cleanvision-docs
```

**Note for faster build**: Executing the Jupyter Notebooks (i.e., the .ipynb files) that make up some portion of the docs, such as the tutorials, takes a long time. If you want to skip rendering these, set the environment variable `SKIP_NOTEBOOKS=1`. You can either set this using `export SKIP_NOTEBOOKS=1`
**Note for faster build**: Executing the Jupyter Notebooks (i.e., the .ipynb files) that make up some portion of the docs, such as the tutorials, takes a long time. If you want to skip rendering these, add `nbsphinx_execute = 'never' to [sphinx configuration](docs/source/conf.py)

4. To view the docs open the file `cleanvision-docs/index.html` file in a browser.

Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -89,7 +89,7 @@ In any collection of image files (most [formats](https://pillow.readthedocs.io/e
| 6 | Light | Irregularly bright images (*over*exposed) | light | ![](https://github.com/cleanlab/assets/master/cleanvision/example_issue_images/light.jpg) |
| 7 | Grayscale | Images lacking color | grayscale | ![](https://github.com/cleanlab/assets/master/cleanvision/example_issue_images/grayscale.jpg) |
| 8 | Odd Aspect Ratio | Images with an unusual aspect ratio (overly skinny/wide) | odd_aspect_ratio | ![](https://github.com/cleanlab/assets/master/cleanvision/example_issue_images/odd_aspect_ratio.jpg) |
| 9 | Odd Size | Images that are abnormally large or small | odd_size | <img src="https://github.com/cleanlab/assets/master/cleanvision/example_issue_images/odd_size.png" width=20% height=20%> |
| 9 | Odd Size | Images that are abnormally large or small compared to the rest of the dataset | odd_size | <img src="https://github.com/cleanlab/assets/master/cleanvision/example_issue_images/odd_size.png" width=20% height=20%> |

CleanVision supports Linux, macOS, and Windows and runs on Python 3.7+.

Expand Down
29 changes: 11 additions & 18 deletions docs/requirements.txt
Original file line number Diff line number Diff line change
@@ -1,19 +1,12 @@
sphinx==5.1.1
sphinx-tabs==3.4.1
nbsphinx==0.8.8
autodocsumm==0.2.9
sphinx==7.1.2
sphinx-tabs==3.4.5
nbsphinx==0.9.3
autodocsumm==0.2.12
sphinx-multiversion==0.2.4
sphinx-copybutton==0.5.0
sphinxcontrib-katex==0.8.6
sphinx-autodoc-typehints==1.19.2
furo==2022.06.21
numpy>=1.20.0
pandas>=1.1.5
Pillow>=9.3
matplotlib>=3.4
tqdm>=4.53.0
imagehash>=4.2.0
datasets>=2.7.0
torchvision>=0.12.0
ipykernel==6.8.0
ipywidgets==7.6.5
sphinx-copybutton==0.5.2
sphinxcontrib-katex==0.9.9
sphinx-autodoc-typehints==1.25.2
furo==2023.09.10
ipykernel==6.29.0
ipywidgets==8.1.1
ipython==8.0.1
1 change: 0 additions & 1 deletion docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -77,7 +77,6 @@

html_title = ""
html_theme = "furo"
html_static_path = ["_static"]
html_logo = "https://github.com/cleanlab/assets/master/cleanlab/cleanlab_logo_only.png"

html_theme_options = {
Expand Down
2 changes: 1 addition & 1 deletion docs/source/faq.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ CleanVision is independent of any machine learning tasks as it directly works on
2. **Can I check for specific issues in my dataset?**


Yes, you can specify issues like ``light`` or ``blurry`` in the issue_types argument when calling ``Imagelab.find_issues``
Yes, you can specify issues like ``light`` or ``blurry`` in the issue_types argument when calling :py:meth:`~cleanvision.imagelab.Imagelab.find_issues`

.. code-block:: python3
Expand Down
8 changes: 4 additions & 4 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
Documentation
=======================================

CleanVision automatically detects various issues in image datasets, such as images that are: (near) duplicates, blurry,
CleanVision automatically detects various issues in your image data, such as images that are: (near) duplicates, blurry,
over/under-exposed, etc. This data-centric AI package is designed as a quick first step for any computer vision project
to find problems in your dataset, which you may want to address before applying machine learning.

Expand Down Expand Up @@ -120,9 +120,9 @@ CleanVision works smoothly with Torchvision datasets too:
Additional Resources
--------------------
- Get started with our `Example Notebook <https://cleanvision.readthedocs.io/en/latest/tutorials/tutorial.html>`_
- Explore more `Example Notebooks <https://github.com/cleanlab/cleanvision-examples>`_
- Learn how to contribute in the `Contribution Guide <https://github.com/cleanlab/cleanvision/blob/main/CONTRIBUTING.md>`_
- Get started with `Starter Tutorial <tutorials/tutorial.ipynb>`_.
- View more `code examples <https://github.com/cleanlab/cleanvision-examples>`_ that demonstrate how to use CleanVision on various datasets.
- Interested in contributing to CleanVision? Check out our `Contribution Guide <https://github.com/cleanlab/cleanvision/blob/main/CONTRIBUTING.md>`_ to get started.


.. toctree::
Expand Down
2 changes: 1 addition & 1 deletion docs/source/tutorials/custom_issue_manager.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
import numpy as np
import pandas as pd
from PIL import Image
from tqdm import tqdm
from tqdm.auto import tqdm

from cleanvision.dataset.base_dataset import Dataset
from cleanvision.issue_managers import register_issue_manager
Expand Down
23 changes: 19 additions & 4 deletions docs/source/tutorials/huggingface_dataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -44,6 +44,19 @@
"from cleanvision import Imagelab"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {
"nbsphinx": "hidden"
},
"outputs": [],
"source": [
"import warnings\n",
"\n",
"warnings.filterwarnings(\"ignore\")"
]
},
{
"cell_type": "markdown",
"metadata": {},
Expand All @@ -60,7 +73,9 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"dataset = load_dataset(\"cats_vs_dogs\", split=\"train\")"
Expand Down Expand Up @@ -184,7 +199,7 @@
"metadata": {},
"outputs": [],
"source": [
"imagelab.issues"
"imagelab.issues.head()"
]
},
{
Expand Down Expand Up @@ -243,7 +258,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"**For more detailed guide on how to use CleanVision, check the [tutorial notebook](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/tutorial.ipynb).**"
"**For more detailed guide on how to use CleanVision, check the** [tutorial notebook](tutorial.ipynb)."
]
}
],
Expand All @@ -263,7 +278,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.5"
"version": "3.11.7"
}
},
"nbformat": 4,
Expand Down
11 changes: 7 additions & 4 deletions docs/source/tutorials/torchvision_dataset.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -70,9 +70,12 @@
"cell_type": "code",
"execution_count": null,
"id": "3d207006",
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"%%capture\n",
"train_set = CIFAR10(root=\"./\", download=True)\n",
"test_set = CIFAR10(root=\"./\", train=False, download=True)"
]
Expand Down Expand Up @@ -200,7 +203,7 @@
"metadata": {},
"outputs": [],
"source": [
"imagelab.issues"
"imagelab.issues.head()"
]
},
{
Expand Down Expand Up @@ -264,7 +267,7 @@
"id": "75912aea",
"metadata": {},
"source": [
"**For more detailed guide on how to use CleanVision, check the [tutorial notebook](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/tutorial.ipynb).**"
"**For more detailed guide on how to use CleanVision, check the** [tutorial notebook](tutorial.ipynb)."
]
}
],
Expand All @@ -284,7 +287,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.0"
"version": "3.10.0"
}
},
"nbformat": 4,
Expand Down
25 changes: 18 additions & 7 deletions docs/source/tutorials/tutorial.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -59,6 +59,7 @@
"| 6 | Blurry | Images that are blurry or out of focus | blurry |\n",
"| 7 | Grayscale | Images that are grayscale (lacking color) | grayscale |\n",
"| 8 | Low Information | Images that lack much information (e.g. a completely black image with a few white dots) | low_information |\n",
"| 9 | Odd Size | Images that are abnormally large or small compared to the rest of the dataset | odd_size |\n",
"\n",
"\n",
"The **Issue Key** column specifies the name for each type of issue in CleanVision code. See our examples which use these keys to detect only particular issue types and specify nondefault parameter settings to use when checking for certain issues."
Expand Down Expand Up @@ -150,7 +151,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"The main way to interface with your data is via the [Imagelab](https://cleanvision.readthedocs.io/en/latest/cleanvision/imagelab.html#cleanvision.imagelab.Imagelab) class. This class can be used to understand the issues in your dataset at a high level (global overview) and low level (issues and quality scores for each image) as well as additional information about the dataset. It has three main attributes:\n",
"The main way to interface with your data is via the [Imagelab](../cleanvision/imagelab.rst#cleanvision.imagelab.Imagelab) class. This class can be used to understand the issues in your dataset at a high level (global overview) and low level (issues and quality scores for each image) as well as additional information about the dataset. It has three main attributes:\n",
"\n",
"- `Imagelab.issue_summary`\n",
"- `Imagelab.issues`\n",
Expand Down Expand Up @@ -645,7 +646,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"You can also create a custom issue type by extending the base class `IssueManager`. CleanVision can then detect your custom issue along with other pre-defined issues in any image dataset! Here's an example of a custom issue manager, which can also be found in the [examples/](https://github.com/cleanlab/cleanvision/blob/main/examples/custom_issue_manager.py) folder of the source code."
"You can also create a custom issue type by extending the base class [IssueManager](../cleanvision/utils/base_issue_manager.rst#cleanvision.utils.base_issue_manager.IssueManager). CleanVision can then detect your custom issue along with other pre-defined issues in any image dataset! Here's an example of a custom issue manager, which can also be found [here](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/custom_issue_manager.py)"
]
},
{
Expand All @@ -659,7 +660,7 @@
"import numpy as np\n",
"import pandas as pd\n",
"from PIL import Image\n",
"from tqdm import tqdm\n",
"from tqdm.auto import tqdm\n",
"\n",
"from cleanvision.dataset.base_dataset import Dataset\n",
"from cleanvision.issue_managers import register_issue_manager\n",
Expand Down Expand Up @@ -778,11 +779,21 @@
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"metadata": {
"tags": []
},
"outputs": [],
"source": [
"issue_types = {issue_name: {}}\n",
"imagelab.find_issues(issue_types)\n",
"imagelab.find_issues(issue_types)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"imagelab.report()"
]
},
Expand All @@ -791,7 +802,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"Beyond the collection of image files demonstrated here, you can alternatively run CleanVision on: [Hugging Face datasets](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/huggingface_dataset.ipynb), [torchvision datasets](https://github.com/cleanlab/cleanvision/blob/main/docs/source/tutorials/torchvision_dataset.ipynb), as well as [files in cloud storage buckets like S3, GCS, or Azure](https://github.com/cleanlab/cleanvision-examples/blob/main/cloud_dataset.ipynb)."
"Beyond the collection of image files demonstrated here, you can alternatively run CleanVision on: [Hugging Face datasets](huggingface_dataset.ipynb), [torchvision datasets](torchvision_dataset.ipynb), as well as [files in cloud storage buckets like S3, GCS, or Azure](https://github.com/cleanlab/cleanvision-examples/blob/main/cloud_dataset.ipynb)."
]
}
],
Expand All @@ -811,7 +822,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.11.2"
"version": "3.11.7"
}
},
"nbformat": 4,
Expand Down

0 comments on commit 8d8ffaf

Please sign in to comment.