-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problem with MultiIndex upsampling / resampling #28313
Comments
So the core request is allowing upsampling in with a MultiIndex and
And probably the rest of the post can be trimmed / removed (things like workarounds using groupby).
It's likely not on anyones critical path. We can keep this issue open though in case someone interested in fixing it comes along. |
Thanks for the quick reply. I realize the post was a bit long, but it was important to explain the reason behind the issue. It took me a long time to prepare the post so it was easy to understand the problem, and it will take me even more time to "trim it down", and considering you don't think it's a high-priority issue, I would prefer to spend my time on more important and productive things. Besides, the post contains all the information you ask for in a clearly laid out manner, if you take 5 minutes to read it. Small comment: You have 400 open issues related to |
OK. Was
A fair summary? |
Well ... Allowing upsampling of a |
The following is the workaround I will probably use. It is a wrapper function for upsampling either a Pandas
We can now upsample a
Output:
We can also upsample a
Output:
|
Thanks for the workaround! However, a native solution would be great. The same problem exists for asfreq()... etc. |
The solution I am using for now, and which might be of use to others, is to have my own helper-function:
It is then fairly simple to make functions that work for both
I have a bunch of these for |
Same here. I think the current pandas solution is quite complicated. |
I'm not sure I'm seeing this problem or a different problem. I'm trying to get use groupby.resample on a multi-index with a datetime index but return a period index using resample 'kind' argument. It works fine for the resample returning index as is:
But, when I add the kind='period" arg, I get this error:
|
Hi! I would like this issue to be assigned to me. Thanks! |
I have the same problem when I want to upsample a MultiIndex dataframe with pd.Grouper().
Now, I want to change the frequency of the data. But I find that
|
Summary
Thanks for making Pandas I have used it in a lot of projects! But now I have a problem.
I have spent nearly 3 days trying to figure out how to resample / upsample a Pandas MultiIndex elegantly and correctly. I have read and tried numerous posts on StackOverflow and GitHub. My conclusion is that I don't think this is supported very well in Pandas. Let me explain what I want to do.
Background
I am currently building a Python API in collaboration with www.simfin.com that makes it very easy to download and use financial data (share-prices, fundamentals, etc.) for free. This will enable people to conduct and share financial research very easily. It works by downloading bulk-data in CSV files from the SimFin server and loading them in Pandas. The fundamental data such as Income Statements and Balance Sheets is usually indexed by the Ticker and Report Date which creates a Pandas DataFrame with a MultiIndex.
Data Example
Let us say we have a Pandas DataFrame
df
with this data:Resample a single Ticker (DatetimeIndex)
Let us first resample for a single ticker:
This works and the result is:
Resample multiple Tickers (MultiIndex)
Let us now try and resample for all tickers in the DataFrame. The
resample()
function takes an argumentlevel
which is supposed to work with a MultiIndex DataFrame:But this apparently doesn't work for upsampling e.g. annual data to daily data, because we get this error message:
One solution is to use
groupby()
(adapted from e.g. #13699):This works, but it now has duplicated the Ticker both as an index and as a column:
We can avoid one of them by adding the arg
group_keys=False
:This works, but now the Ticker is a data-column instead of an index:
To get the original MultiIndex back with both Ticker and Report Date, we need to do:
Which produces the desired result:
But this is so complicated that nobody can be expected to remember how to do it. So I would have to make a small helper-function that does all of this. But because the resampling method (pad, interpolate, etc.) is invoked through a function call on the groupby-object, my helper-function would get big and awkward if I want to allow different methods of resampling.
Conclusion
It appears that upsampling a MultiIndex is not supported very well in Pandas, unless I have misunderstood how it is supposed to work.
I think that by far the most elegant solution would be if the
resample()
function supported thelevel
argument for upsampling, because the syntax and semantics would be very similar for upsampling DatetimeIndex and MultiIndex:I have taken a look at the Pandas source-code, but it is complicated and so sparsely documented, that it would take me forever to figure out how everything is connected and how it works, so I don't think I will be able to fix this myself. Is this something you could fix, because it would make it so much easier to upsample DataFrames with a MultiIndex?
Thanks!
INSTALLED VERSIONS
commit : None
python : 3.6.8.final.0
python-bits : 64
OS : Linux
OS-release : 4.15.0-60-generic
machine : x86_64
processor : x86_64
byteorder : little
LC_ALL : None
LANG : en_US.UTF-8
LOCALE : en_US.UTF-8
pandas : 0.25.1
numpy : 1.16.4
pytz : 2019.2
dateutil : 2.8.0
pip : 19.1.1
setuptools : 41.0.1
Cython : None
pytest : None
hypothesis : None
sphinx : None
blosc : None
feather : None
xlsxwriter : None
lxml.etree : None
html5lib : None
pymysql : None
psycopg2 : None
jinja2 : 2.10.1
IPython : 7.5.0
pandas_datareader: None
bs4 : None
bottleneck : None
fastparquet : None
gcsfs : None
lxml.etree : None
matplotlib : 3.1.0
numexpr : None
odfpy : None
openpyxl : None
pandas_gbq : None
pyarrow : None
pytables : None
s3fs : None
scipy : 1.3.1
sqlalchemy : None
tables : None
xarray : None
xlrd : None
xlwt : None
xlsxwriter : None
The text was updated successfully, but these errors were encountered: