Skip to content

zhouhanc/informationtracer

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

36 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Information Tracer API Python Library

This Github repo provides Python scripts to interact with the Information Tracer API. Information Tracer is a system to collect social media posts and generate intelligence.

Pre-requisite

  • Python 3
  • You must have a valid token. If not, contact us to get a token.

Overview our different API endpoins

  1. Use Submit API to submit a query, and get a unique identifier called id_hash256
  2. Use Status API to check status of a running query, based on id_hash256
  3. Use the Download Endpoint to get result of a query, based on id_hash256

Quick Start

  1. Close this repository
  2. pip install requests pandas
  3. Update parameters in example.py, including query, start_date, end_date, token
  4. informationtracer_token=XXX python example.py (or add informationtracer_token in bash_profile)

How to build a search query

Rule 1: AND, OR, NOT must be all-cap. Otherwise they are treated as normal English words Rule 2: Use parenthesis to group multiple words with AND. For example, (Word1 AND Word2) Rule 3: Query limit is 512 characters. Sending a query above the limit might get empty results.

Example: (Ukraine AND NATO) OR (Ukraine AND EU) Meaning: Any posts that contain "Ukraine" and "NATO" or "Ukraine" and "EU".

Example: (Ukraine AND NATO) NOT Putin Meaning: Any posts that contain "Ukraine" and "NATO", without word "Putin".

Example: from:elonmusk Meaning: Collect tweets created by user @elonmusk

Example: from:elonmusk Tesla Meaning: Collect tweets created by user @elonmusk, and with word "tesla"

Details about Submit API

Input: query, token, start_date, end_date

Optional input: twitter_sort_by: 'time' or 'engagement' (default is 'engagement', if not specified)

  • set twitter_sort_by to 'time' to collect tweets in reverse chronological order (latest to older)
  • set twitter_sort_by to 'engagement' to collect tweets in reverse like_count order

Output: id_hash256 (a unique string identifier for this search)

Example:

import requests
SUBMIT_URL = 'https://informationtracer.com/submit'

query = 'nvidia AND stock'
token = 'YOUR_TOKEN'
start_date = '2023-11-03'
end_date = '2023-11-08'

response = requests.post(SUBMIT_URL, 
                             timeout=10,
                             json={'query': query, 
                                   'token': token,
                                   'start_date': start_date,
                                   'end_date': end_date,
                                   'twitter_sort_by': 'engagement'
                                   }                                   
                            )
if 'id_hash256' in response.json():
    id_hash256 = response.json()['id_hash256']

Details about Status API

Input: id_hash256, token Output: json (detail below)

Example:

import requests
STATUS_URL = 'https://informationtracer.com/status'

url = "{}?token={}&id_hash256={}".format(STATUS_URL, token, id_hash256)
results = requests.get(url).json()

Format of output Because each collection can take 30-60 seconds, to send partial results to users as soon as possible, we provide a field called tweet_preview. Thie field is initially empty. When the system has collected 10 tweets, tweet_preview will contain a list of dictionaries. Please check the result API v1 details for a detailed explanation of each key-value pair (d, i, l, ...).

{'status': 'started', 
 'status_percentage': '10', 
 'status_text': 'Collecting cross-platform posts...', 
 'tweet_preview': [{'d': '@Apple Unless you buy a MacBook circa 2010',
                    'i': 0, 
                    'l': 'https://twitter.com/heathdollars/status/1721998289388896312', 'n': 'heathdollars', 
                    'p': 'https://pbs.twimg.com/profile_images/1641987731181142018/tECQ8Xy1_normal.jpg', 
                    't': '2023-11-07T21:09:20', 
                    'u_d': 'join your union\n\nhttps://t.co/4sxV02E2aI', 
                    'u_id': '1000720137106866176', 
                    'u_t': '2018-05-27T12:47:27'
                    }, 
                    {...}, 
                    {...}, 
                    ...
                   ]
}

Result API (new, by platform)

Input (required): source, id_hash256, token

source is data source, which can be 'twitter', 'youtube', 'reddit', 'all'

Format of output Output: a pandas dataframe, which can be converted to csv, json, etc,.

  • The columns should be self-explanatory
  • Note that some columns (those with prefix country_, sentiment_ , account_type_ ) are only available to premium users.
import pandas as pd
url = 'https://informationtracer.com/download?source={}&type=csv&id={}&token={}'.format(source, id_hash256, token)
df = pd.read_csv(url)

Result API (v1, depracated)

Result API v1 details.

Web Interface

  • To help people visualize the information, we provide a web interface available at https://informationtracer.com.
  • To visualize a query you searched recently, you can visit https://informationtracer.com/?result={id_hash256}.
  • Log in is required. Please contact us and we will help you register an account

Contact / Bug Report

For bug report or any inquiry, please contact Zhouhan Chen zhouhan@safelink.network

About

Python client to interact with Information Tracer

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages