Skip to content

UCSBCarpentry/web-scraping-python

Repository files navigation

Web Scraping with Python

This lesson teaches people with basic Python knowledge the tools and libraries to do web scraping, which means extracting data from websites. It has three episodes.

Episode 1 begins with an introduction to how websites are structured using HTML. You’ll learn how to explore this structure using your browser and how to extract information from it using the BeautifulSoup package.

In Episode 2, you’ll learn how to retrieve the HTML of a webpage using the requests package and continue practicing how to parse and extract specific content with BeautifulSoup.

Toward the end of the workshop, in Episode 3, we’ll explore the difference between static and dynamic webpages, and how to scrape dynamic content using Selenium.

This workshop is intended for learners who already have a basic understanding of Python. In particular, you should be comfortable with:

  • Install and import packages and modules
  • Use lists and dictionaries
  • Use conditional statements (if, else, elif)
  • Use for loops
  • Calling functions, understanding parameters/arguments and return values

The rendered version of the lesson is available at: https://ucsbcarpentry.github.io/web-scraping-python/

Teaching and contributing

We'd love to know if you are teaching this lesson and the suggestions you have for improving it!

You can do this by submitting an issue in this repo, or sending an email to dreamlab@library.ucsb.edu or jose_nino@ucsb.edu.

If you want to know more about contributing to this lesson and other Carpentries efforts, please read the CONTRIBUTING guide.

Maintainer

Current maintainer of this lesson: - Jose Niño Muriel

Acknowledgements

Thanks to Noah Spahn, Ronald Lencevičius, and Seth Erickson for their feedback the first time this workshop was taught at UCSB.

Citation

Please cite this lesson using the information in the CITATION.CFF file when you refer to it in publications, and/or if you re-use, adapt, or expand on the content in your own training material.

License

The use and adaptation of this instructional content is made available under the Creative Commons Attribution license - CC BY 4.0. Review the LICENSE.md file for additional information.

Releases

No releases published

Packages

No packages published

Contributors 3

  •  
  •  
  •