Web Scraping with Python

This lesson teaches people with basic Python knowledge the tools and libraries to do web scraping, which means extracting data from websites. It has three episodes.

Episode 1 begins with an introduction to how websites are structured using HTML. You’ll learn how to explore this structure using your browser and how to extract information from it using the BeautifulSoup package.

In Episode 2, you’ll learn how to retrieve the HTML of a webpage using the requests package and continue practicing how to parse and extract specific content with BeautifulSoup.

Toward the end of the workshop, in Episode 3, we’ll explore the difference between static and dynamic webpages, and how to scrape dynamic content using Selenium.

This workshop is intended for learners who already have a basic understanding of Python. In particular, you should be comfortable with:

Install and import packages and modules
Use lists and dictionaries
Use conditional statements (if, else, elif)
Use for loops
Calling functions, understanding parameters/arguments and return values

The rendered version of the lesson is available at: https://ucsbcarpentry.github.io/web-scraping-python/

Teaching and contributing

We'd love to know if you are teaching this lesson and the suggestions you have for improving it!

You can do this by submitting an issue in this repo, or sending an email to dreamlab@library.ucsb.edu or jose_nino@ucsb.edu.

If you want to know more about contributing to this lesson and other Carpentries efforts, please read the CONTRIBUTING guide.

Maintainer

Current maintainer of this lesson: - Jose Niño Muriel

Acknowledgements

Thanks to Noah Spahn, Ronald Lencevičius, and Seth Erickson for their feedback the first time this workshop was taught at UCSB.

Citation

Please cite this lesson using the information in the CITATION.CFF file when you refer to it in publications, and/or if you re-use, adapt, or expand on the content in your own training material.

License

The use and adaptation of this instructional content is made available under the Creative Commons Attribution license - CC BY 4.0. Review the LICENSE.md file for additional information.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
.github/workflows		.github/workflows
.vscode		.vscode
episodes		episodes
instructors		instructors
learners		learners
notebooks		notebooks
profiles		profiles
site		site
.gitignore		.gitignore
CITATION.cff		CITATION.cff
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
FIXME.Rproj		FIXME.Rproj
LICENSE.md		LICENSE.md
README.md		README.md
config.yaml		config.yaml
index.md		index.md
links.md		links.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Scraping with Python

Teaching and contributing

Maintainer

Acknowledgements

Citation

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors 3

Uh oh!

Languages

License

UCSBCarpentry/web-scraping-python

Folders and files

Latest commit

History

Repository files navigation

Web Scraping with Python

Teaching and contributing

Maintainer

Acknowledgements

Citation

License

About

Topics

Resources

License

Code of conduct

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors 3

Uh oh!

Languages

Packages