Watermark Remover

This python script removes watermarks from PDFs using opencv.

Usage

Path to your pdf: Set PDF_PATH global variable or pass it as an argument in the command line.
Area Selection: Draw on the image to create a mask for the watermark.

Median Image Calculation: This process sorts pixel values at each location in a set of images and selects the middle value. It helps to identify the constant features in the image.
Thresholding: Adjust the threshold value to fine-tune the mask.

Mask Adjustment: Use 'd', 'e', or 'r' to dilate, erode, or reset the mask.

Color Range: Set the color range with trackbars to finalize the mask

Watermark Removal: The script removes the watermark from the images and saves the output as a new PDF file. The area of the removed watermark is filled with the most common color in the image.

Python Version Compatibility

This script is compatible with Python 3.6 and above.

Dependencies

The script requires the following Python libraries:

pdf2image: Used for converting PDF files into images. This library depends on poppler-utils, which is a set of command line tools for working with PDF files. You need to install poppler-utils separately for pdf2image to work. The installation process depends on your operating system.
img2pdf: Used for converting images back into a PDF file.
opencv-contrib-python: A wrapper package for OpenCV python bindings along with its extra modules.
tqdm: Used for displaying progress bars in the console.

Installation

Clone the repository

git clone git@github.com:banatibalazs/pdf-watermark-remover.git

cd pdf-watermark-remover

Create a virtual environment and activate it

python -m venv env_name

on Windows:
```
env_name\Scripts\activate
```
on Linux:
```
source env_name/bin/activate
```

Install the required libraries.

pip install pdf2image img2pdf opencv-contrib-python tqdm

Install poppler-utils
- For Windows: Download a precompiled version of Poppler from this link. Extract the contents of the zip file and add the bin folder to your system's PATH environment variable.
- For Ubuntu/Debian: Install poppler-utils with the following command: sudo apt-get install -y poppler-utils
- For macOS: If you have Homebrew installed, you can install poppler with the following command: brew install poppler

Running the Script

python remover.py

The arguments for the script are as follows (all are optional)

pdf_path: The path to the PDF file. Default is input.pdf.
save_path: The path to save the output PDF file. Default is output.pdf.
--dpi: The resolution of the images extracted from the PDF file. Default is 200.
--max_width: The maximum width of the images shown during the mask selection. Default is 1920.
--max_height: The maximum height of the images shown during the mask selection. Default is 1080.

python remover.py input.pdf output.pdf --dpi 300 --max_width 1920 --max_height 1080

Note

Large files or high DPI can slow the reading of the PDF file. If the process is too slow, consider decreasing the DPI or using smaller PDFs. This project is for educational purposes and should not infringe on copyrights.

Name		Name	Last commit message	Last commit date
Latest commit History 26 Commits
gifs		gifs
README.md		README.md
input.pdf		input.pdf
remover.py		remover.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Watermark Remover

Usage

Python Version Compatibility

Dependencies

Installation

Running the Script

Note

About

Releases

Packages

Languages

banatibalazs/pdf-watermark-remover

Folders and files

Latest commit

History

Repository files navigation

Watermark Remover

Usage

Python Version Compatibility

Dependencies

Installation

Running the Script

Note

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages