Skip to content

Commit

Permalink
fetch.py: address tz-related error during pd.concat
Browse files Browse the repository at this point in the history
See issue #11.

This explicit index construction will create a
tz-aware DatetimeIndex even if the data is empty:


>>> pd.DatetimeIndex(data=[], tz="UTC")
DatetimeIndex([], dtype='datetime64[ns, UTC]', freq=None)
  • Loading branch information
jgehrcke committed Apr 14, 2021
1 parent fd52811 commit 55bc993
Showing 1 changed file with 7 additions and 6 deletions.
13 changes: 7 additions & 6 deletions fetch.py
Original file line number Diff line number Diff line change
Expand Up @@ -234,21 +234,22 @@ def clones_or_views_to_df(items, metric):
series_count_total = []
series_count_unique = []
series_timestamps = []

for sample in items:
# GitHub API docs say
# "Timestamps are aligned to UTC"
# `sample.timestamp` is a naive datetime object. Make it tz-aware.
ts_aware = pytz.timezone("UTC").localize(sample.timestamp)
series_timestamps.append(ts_aware)
# GitHub API docs say "Timestamps are aligned to UTC".
# `sample.timestamp` is a tz-naive datetime object.
series_timestamps.append(sample.timestamp)
series_count_total.append(int(sample.count))
series_count_unique.append(int(sample.uniques))

# Attach timezone information to `pd.DatetimeIndex` (make this index
# tz-aware, leave actual numbers intact).
df = pd.DataFrame(
data={
f"{metric}_total": series_count_total,
f"{metric}_unique": series_count_unique,
},
index=series_timestamps,
index=pd.DatetimeIndex(data=series_timestamps, tz="UTC"),
)
df.index.name = "time_iso8601"

Expand Down

0 comments on commit 55bc993

Please sign in to comment.