Skip to content

Latest commit

 

History

History
191 lines (158 loc) · 11.6 KB

index_spring_2020.md

File metadata and controls

191 lines (158 loc) · 11.6 KB
layout
course-single

{{ site.description }}

Upon completing this course, my goal is for you to be able to:

  1. Understand methods for encoding and manipulating text, images, and sound.
  2. Use computational tools to visualize and summarize information in text documents.
  3. Calculate similarity metrics for documents in a corpus.
  4. Cluster similar documents with hierarchical algorithms.
  5. Learn topic models for documents and derive semantic closeness of words.
  6. Generate and understand sequence information found in text.
  7. Develop models for sentiment and structure analysis.
  8. Employ computational methods for generating music and art.
  9. Pursue independent explorations of advanced topics.

{% include resources.html content=site.resources %}

We will be using no textbook but instead supplemental material such as relevant web-pages for the course. Readings will be assigned before material will be covered in class. You are expected to review the material and come to class prepared. As readings are assigned, they will be posted here.

| Date | Reading | |:--:||-----| | Jan 21 | Unicode | | | UTF-8 | | | Python Encodings | | | The Great Newline Schism | | | Texting in Ancient Mayan Hieroglyphs | | Jan 23 | Pandas | | | plotnine | | | A Layered Grammar of Graphics | | | english2.txt | | Jan 28 | Zipf's Law | | Jan 30 | Understanding the Nuances of Typography Classification | | | Typeface | | | Wordle | | | How the Word Cloud Generator Works | | | Stop Word List | | Feb 4 | tf-idf | | | Why does tf-idf use a log? | | Feb 6 | Cosine Similarity | | | UPGMA | | Feb 11 | Stemming Online | | | Porter Stemmer | | | pyporter2 | | | WordNet Online | | Feb 20 | Word Embeddings | | | Word2Vec | | | Word2Viz | | | GloVe | | | Gensim Word2Vec | | Feb 25 | Beta Distribution Calculator | | | Multinomial Distribution | | | Visualizing Dirichlet Distributions | | | Latent Dirichlet Allocation | | | Probabilistic Topic Modeling | | | LDA Details | | | Gensim Topic Models | | Feb 27 | Hu and Liu Opinion Lexicon | | | AFINN Online | | | VADER Explained | | | NLTK Sentiment Library | | | NLTK Sentiment How To | | Mar 3 | Why you should care about generative text | | | NaNoGenMo | | | Botnik Predictive Writer | | | Harry Potter and the Portrait of what Looked Like a Large Pile of Ash | | Mar 10 | Huffman Encoding | | | Color Depth, RGB Color Model | | | BMP, GIF, PNG | | Mar 12 | LZ77 Data Compression | | | LZSS Improvements | | | Color Model Translator | | | JPEG Format | | | YCbCr |

When we write code together in class, it will be posted here!

| Date | Topic | Code | |:----:|------||-----|| | T Jan 21 | Character Text Encodings | UTF-8 | | R Jan 23 | Data Visualization Day 1 | Summary Statistics | | T Jan 28 | Data Visualization Day 2 | Summary Statistics Day 2 | | R Jan 30 | Fonts and Word Clouds | Word Clouds | | R Feb 20 | Stemming and Lemmatizing | Stemming and Lemmatizing | | R Feb 20 | Word2Vec | Word2Vec and Gensim | | T Feb 25 | Latent Dirichlet Allocation | LDA Example | | R Feb 27 | Sentiment Analysis | Sentiment Analysis | | R Mar 5 | Markov Chains | Markov Chain In Class | | R Apr 16 | Epidemiology | SIR Compartment Modeling |


# Coursework

Each student has four late days to spend throughout the semester as they wish. Simply inform the instructor any time prior to the due date for an assignment that you wish to use a late day; you may then turn in the assignment up to 24 hours late. Multiple late days may be used on the same assignment. There are no partial late days; turning in an assignment 2 hours late or 20 hours late will both use 1 late day. Note that late days are intended to cover both normal circumstances (you simply want more time to work on the assignment) and exceptional circumstances (you get sick, travel for a game or family obligation, etc.). After you have used up your late days, late assignments will receive at most half credit. you will be given more, because this is a crazy semester. All work must be completed the day before final grades are due, and you must be in communication with me when assignments are late.

Labs: 500 points

| # | Name | Assigned | Due | |:--:|-----||:--------:|:---:| |0 | Student Survey | Jan 17 | Jan 21 | |1 | Creating a Corpus | Jan 21 | Jan 23 | |2 | Summary Statistics | Jan 23 | Jan 31 | |3 | Clouds and Drawings | Jan 30 | Feb 7 | |4 | Document Clustering | Feb 6 | Feb 14 | |5 | Topic Modeling | Feb 25 | Mar 4 | |6 | Sentiment Analysis | Mar 3 | Mar 7 | |7 | Detecting and Generating Language | Mar 6 | Mar 13 | |8 | Deep Dreams | Apr 2 | Apr 7 | |9 | Lullabies | Apr 9 | Apr 14 |

Much of your experience with the techniques of computational humanities in this course will be through weekly labs. Each lab will be assigned in class with some time allotted to work through the materials, and will be due in approximately one week. All labs are weighted equally within the lab portion of your final grade.

On these labs, you can work with a partner on the lab assignments. Their name must be listed on any code you hand in as joint work. A partnership should only turn in a single copy of the assignment. If students working as partners wish to turn in a lab late, both students must use a late day.

Class corpus

Many of our labs will be using the corpus below that we collected in Lab 1.

Project: 210 points

| # | Name | Points | Assigned | Due | |:--:|-----||:------:|:--------:|:---:| |1 | Final Project | 210 | Mar 19 | Apr 9 |

You will have a final project in this course. Further details on the grading standards and handin instructions for each project will be given when they are assigned.

Exams: 250 points

| # | Topics | Points | Date | |:--:|-----||:------:|:----:| |1 | text encoding, data processing, fonts, word clouds, and clustering | 100 | Feb 18 | | 2 | stemming, lemmatizing, topic modeling, sentiment analysis, and Markov models | 150 | Mar 17 |

There will be two one in-class exams, the first worth 100 and the second worth 150 points of your final grade. They will consist of short answer questions along with writing and debugging code. There is no final exam; you will complete a final project instead, as described above under Projects.

Two times throughout the semester, you are expected to make an office hours appointment and check in with me about the course. Each checkin meeting will count for 20 points. This will be conversation and feedback about your current progress and understanding. These should be scheduled during the weeks shown on the course calendar.

Score Grade
750-850 A
650-749 B
550-649 C
450-549 D
0-449 F