Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

flight.phases() returns a Pandas error on some flights #242

Closed
tomschelsen opened this issue Sep 16, 2022 Discussed in #241 · 2 comments
Closed

flight.phases() returns a Pandas error on some flights #242

tomschelsen opened this issue Sep 16, 2022 Discussed in #241 · 2 comments

Comments

@tomschelsen
Copy link

Discussed in #241

Using traffic version 2.8.0

I have a Traffic object containing a bunch of flights, that I created from custom parsed data (not the formats/sources directly supported by the library). I am trying to keep only the climb phases.

It works ok on many flights (I had tested .phases().query('phase == "CLIMB"') on a bunch of flights from this dataset and visualised the results beforehand), but it stops on a given flight with this error :

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [13], in <cell line: 6>()
      1 # working flight : BEL3631
      2 
      3 # not working flight : BAW45EM
      5 example_flight = traffic_flights["BAW45EM"]
----> 6 example_climb = example_flight.phases().query('phase == "CLIMB"')


File /usr/local/lib/python3.8/dist-packages/traffic/algorithms/openap.py:33, in OpenAP.phases(self, twindow)
     25 fp = FlightPhase()
     26 fp.set_trajectory(
     27     (self.data.timestamp.values - np.datetime64("1970-01-01"))
     28     / np.timedelta64(1, "s"),
   (...)
     31     self.data.vertical_rate.values,
     32 )
---> 33 return self.assign(phase=fp.phaselabel(twindow=twindow)).assign(
     34     phase=lambda df: df.phase.str.replace("GND", "GROUND")
     35     .str.replace("CL", "CLIMB")
     36     .str.replace("DE", "DESCENT")
     37     .str.replace("CR", "CRUISE")
     38     .str.replace("LVL", "LEVEL")
     39 )

File /usr/local/lib/python3.8/dist-packages/traffic/core/mixins.py:304, in DataFrameMixin.assign(self, *args, **kwargs)
    298 def assign(self: T, *args: Any, **kwargs: Any) -> T:
    299     """
    300     Applies the Pandas :meth:`~pandas.DataFrame.assign` method to the
    301     underlying pandas DataFrame and get the result back in the same
    302     structure.
    303     """
--> 304     return self.__class__(self.data.assign(*args, **kwargs))

File /usr/local/lib/python3.8/dist-packages/pandas/core/frame.py:4512, in DataFrame.assign(self, **kwargs)
   4509 data = self.copy()
   4511 for k, v in kwargs.items():
-> 4512     data[k] = com.apply_if_callable(v, data)
   4513 return data

File /usr/local/lib/python3.8/dist-packages/pandas/core/frame.py:3655, in DataFrame.__setitem__(self, key, value)
   3652     self._setitem_array([key], value)
   3653 else:
   3654     # set column
-> 3655     self._set_item(key, value)

File /usr/local/lib/python3.8/dist-packages/pandas/core/frame.py:3832, in DataFrame._set_item(self, key, value)
   3822 def _set_item(self, key, value) -> None:
   3823     """
   3824     Add series to DataFrame in specified column.
   3825 
   (...)
   3830     ensure homogeneity.
   3831     """
-> 3832     value = self._sanitize_column(value)
   3834     if (
   3835         key in self.columns
   3836         and value.ndim == 1
   3837         and not is_extension_array_dtype(value)
   3838     ):
   3839         # broadcast across multiple columns if necessary
   3840         if not self.columns.is_unique or isinstance(self.columns, MultiIndex):

File /usr/local/lib/python3.8/dist-packages/pandas/core/frame.py:4535, in DataFrame._sanitize_column(self, value)
   4532     return _reindex_for_setitem(value, self.index)
   4534 if is_list_like(value):
-> 4535     com.require_length_match(value, self.index)
   4536 return sanitize_array(value, self.index, copy=True, allow_2d=True)

File /usr/local/lib/python3.8/dist-packages/pandas/core/common.py:557, in require_length_match(data, index)
    553 """
    554 Check the length of data matches the length of the index.
    555 """
    556 if len(data) != len(index):
--> 557     raise ValueError(
    558         "Length of values "
    559         f"({len(data)}) "
    560         "does not match length of index "
    561         f"({len(index)})"
    562     )

ValueError: Length of values (971) does not match length of index (1182)

I attach here as csv a subest of the dataset (one flight, the result of traffic_flights["BAW45EM"] in that case) for which flight.phases() aborts :
problematic_flight_for_phases_climb.csv

Thank you :)

@xoolive
Copy link
Owner

xoolive commented Sep 16, 2022

There must be some invalid value somewhere in the data
(I did not have time to dig much further)

If you do f.resample('5s').phases() it just works. (I took 5 seconds because it looks like the sampling rate you already have)

  • If that fix/hack is enough for you, let us close the issue.
  • If you want to dig further in the data, understand what happens or find a further bug, please comment with what you expect from now on.

@tomschelsen
Copy link
Author

The fix is fine by me ; part of me would like to understand the why but hey, got to move forward to the next steps... ;)
Closing the issue, thank you very much for your help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants