Skip to content

Releases: DocNow/twarc

v0.5.1

15 Jan 21:17
Compare
Choose a tag to compare

I've been seeing some intermittent 500 errors from the Twitter API search endpoint. This small update will catch them and back off, until eventually logging the error and giving up.

v0.5.0

02 Dec 17:06
Compare
Choose a tag to compare

The --stream option has been separated out into --track --follow and
--locations to better match Twitter's filter stream API.

Similarly the twarc.stream function has been renamed to twarc.filter
and it now takes three parameters: track, follow and locations.

v0.4.0

09 Nov 17:16
Compare
Choose a tag to compare

Added --warnings flag to log warnings from the Twitter API about dropped tweets during streaming.

v0.3.4

07 Oct 17:18
Compare
Choose a tag to compare

In this release the utils/archive.py script has been renamed to utils/twarc-archive.py and pip install will now make it available on the command line just like warc.py. See #80 for context.

v0.3.3

03 Aug 10:29
Compare
Choose a tag to compare

Now handles weird 404s from Twitter API that have been noticed.

v0.3.1

03 Jul 21:13
Compare
Choose a tag to compare
  • handle connection reset errors during hydrate
  • updated utils/archive.py to use config file

v0.3.0

10 Jun 06:23
Compare
Choose a tag to compare

New functionality for managing keys in a config file .twarc. You can also have multiple sets of credentials in your config which can be used with the --profile command line option.

V0.2.7

06 May 15:50
Compare
Choose a tag to compare
  • handle connection reset error which are now occurring during search
  • added zenodo integration for citing twarc by DOI
  • minor changes to utilities for python3

v0.2.2

17 Feb 01:12
Compare
Choose a tag to compare
  • Python3 support
  • now accepts twitter credentials on the command line

v0.2.0

30 Jan 03:38
Compare
Choose a tag to compare

v0.2.0 of twarc includes big changes to both the command line api and the programmatic api. You now invoke twarc from the command line using one of three modes:

  • search: twarc.py --search ferguson > tweets.json
  • stream: twarc.py --stream ferguson > tweets.json
  • hydrate: twarc.py --hydrate ids.txt > tweets.json

Notice that twarc no longer decides what filename to use, and attempt to pick up where it once left off by reading the last tweet id from a previous file. The reason for this is that this functionality predated the ability to stream directly. twarc.py now just writes line oriented JSON to stdout, which you can send where you want including potentially compressing it:

twarc.py --search ferguson | gzip - > tweets.json.gz

The three command line modes map directly on to the programmatic usage. You first create a Twarc instance and then call search, stream and hydrate methods:

from twarc import Twarc

t = Twarc()

for tweet in t.search('ferguson'):
    print tweet

for tweet in t.stream('ferguson'):
    print tweet

for tweet in t.hydrate(open('ids.txt')):
    print tweet

The nice thing about these changes is that they have consolidated and simplified the rate limiting logic, and have removed about 1/3 of the code base. Please give it a try and let us know how it goes!