-
Notifications
You must be signed in to change notification settings - Fork 19
Composite Time Series Design document. #1103
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,220 @@ | ||
##################### | ||
Composite Time Series | ||
##################### | ||
|
||
Purpose | ||
======= | ||
|
||
It is a challenge for users to identity what the correct authoritative time series is for a given measurement at a location. Additionally these time series often change over time, either being completely new or changing their interval as newer technologies become available. | ||
|
||
Gather an entire Period of Record for the value at a location is also rather difficult. And the POR record and "authoritative timeseries" may be one-in-the same. | ||
|
||
|
||
Need | ||
==== | ||
|
||
#. CWMS and Access-2-Water require a simple mechanism to allow users of data to retrieve the Authoritative Period of Record data for a given measurement without having to understand all of the possible component time series that may be involved. | ||
#. Period-of-Record time series *should* not be created by duplicating data from the component time series and merging them into a new one. | ||
#. The naming of the time series should fit within the excepting CWMS Time Series Identifier design and not unreasonably interfere with existing usages. | ||
|
||
|
||
Caveats | ||
======= | ||
|
||
#. It is assumed that CWMS-Vue will, as-always, require updates to handle what is created here. | ||
#. e.g. we're not going to let any current limitations of CWMS-Vue hinder our design. | ||
|
||
|
||
Proposal | ||
======== | ||
|
||
Description | ||
----------- | ||
|
||
CDA should handle a concept of a "Composite Time Series". Whether a Time Series is considered composite will be determined by a specific element of the Time Series Identifier. | ||
Data Administrators will configure which Time Series, and the range there-in, are part of the composite time series. | ||
CDA will use this stored information to build the Time Series per the question. | ||
|
||
Additional names not used | ||
------------------------- | ||
|
||
#. Virtual Time Series | ||
#. Period of Record Time Series | ||
|
||
Both names have been discarded. We use "Virtual" in too many other places with a more direct meaning of that word. | ||
For Period-of-Record, while that is the primary use-case, the concept is useful in other situations as well. | ||
|
||
Hence generically have have a "composite time series" | ||
|
||
Axioms | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is there any way to add a comment or remark to a timeseries (e.g. staff gage readings, gate computations, tailwater rating etc...) or to the composite timeseries itself? If not this could be separately managed in a CLOB, but it would be neat if you could optionally comment on the timeseries as you added them. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm expecting that when we add the more direct support for extracting the text timeseries along with a value time series that the time series "notes" will just come along. Alternatively one can also just take the member time series and go retrieve that (definitely not ideal though.) As for the 2nd part of that. There is a "notes" field for each member. |
||
------ | ||
|
||
#. Composite Time Series are Irregular | ||
#. The definition of the composite time series is stored within the CWMS database | ||
#. The members of a composite time series define a continuous range | ||
#. The date ranges of a composite time series *MUST* not overlap | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This could be worked around by using versioning in the underlying implementation. If there's ambiguity (which I 100% guarantee will happen), then pick whichever is the most recent version. |
||
#. The date ranges of a composite time series *MUST* have any gaps | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this missing a "not"? If so, I think it's reasonable to allow for gaps in the data. For example, if a gage gets washed out and isn't replaced for a month (and the new gage uses a different interval) then there simply wouldn't be data for the missing month. The user could theoretically extend the range of preceding or following record to include the gap, but that wouldn't feel very intuitive to me. I would expect that the interpreter could simply return gaps as missing data? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You are correct, the NOT is missing. I wasn't thinking about the processing returning the missings, but that is a good idea, and would required the gap to be defined... and probably some other information. So it may just be easier to put the information into the time series itself (e.g. in notes) and just let the missings be returned. |
||
#. Data may have gaps, an explanation range should be provided. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Is this user defined? Will it allow for definition of a timeseries that may not have extents that cover that time period (e.g. there's a ~2 month gap between timeseries A that ends at 2014-01-03 12:00 and timeseries B which starts at 2014-03-14- 12:00). What does an explanation range look like (e.g. "no data, start 2014-01-03 12:00, end 2014-03-14- 12:00)? Is that assigned automatically if there is a gap in the timeseries? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. ah, I definitely did not describe this clearly enough in the sample data.
A gap here means that you have an end date for one member and then... and not I've realized if you don't have a defined interval it's hard to determine this. perhaps that should change to SHOULD since I don't think the system can meaningfully define what a "gap" in member is. Does the next start have to be the smallest time unit after the previous end (e.g. nano seconds), if not what is acceptable? Here's what I was thinking, how do we handle known gaps in service? be it accidental destruction (2 different SPK/SPN gauges have suffered alcohol related removals from service). One site at SPK is removed during most of a year due to no water and it kept suffering vandalism. So intent is "there's oddly large amount of missing data, how do we report that." There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
Well, this is something that doesn't currently exist in CWMS at all, as I realized when developing a system to read in punch tapes. There are notes on the tapes for station maintenance, but I have no where to save that information in a meaningful, easy to access way. |
||
#. The members of a composite time measure the same thing. (e.g. all members are Elevation, not some are elevation and some are stage.) | ||
|
||
|
||
Time Series Naming | ||
------------------ | ||
|
||
Option 1 | ||
~~~~~~~~ | ||
|
||
`<Location Id>.<Parameter>.<Parameter Type>.Composite.var.<version>` | ||
|
||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
| Element | Description | | ||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Location Id |As the normal CWMS TS ID, the location for this measure | | ||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Parameter |As the normal CWMS TS ID, the measurement (e.g. Stage, Precip, Elevation, flow, etc) | | ||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Parameter Type |As Normal CWMS TS ID, Instantaneous, average, total, etc | | ||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Interval -\> Composite| Marker that this time series does not have a fix information and is build of various member time series. | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm of the opinion that:
Tying back into an earlier section, that means the composite doesn't have to be irregular, and could potentially be regular, lrts, or prts depending on the sources. Now, that leads to the section I highlighted:
If that's not feasible, then perhaps adding to the interval, like lrts did: Otherwise, how would you specify composite data for different intervals and types, and keep them separate? At SPK we have period-of-record data like this: Currently that's a separate timeseries with duplicate data. But it doesn't have to be. Otherwise, you end up with something like: New Bullards Bar.Elev.Composite.~1Day.0.POR - Is that averaged data, or instantaneous? |
||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Duration -\> var |Duration of average or total may change over time with new members, duration will be indicated in the member definition | | ||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Version |As Normal CWMS TS ID | | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Could a common version name denote it as authoritative? Or does just the existence of a composite timeseries imply that? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That's a possibility. I had put in a place older "is-authoritative" flag in the composite definition. Though I agree the version is technically a good place for that, it does seem to get a bit... overused at times. There are certainly arguments to be me in either case, so we'll wait for commentary from others to tip the scales. |
||
+----------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|
||
|
||
Option 2 | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Option 2 makes more sense to me. I'm unsure how Option 1 would deal with potentially varying parameter types among a set of composite data. I'm a little curious about the implications of doing something like There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think @msweier made a fairly decent case for why you might have more than one composite for a location+measure. It does makes sense to me to have a "single authoritative" time series followed by "all data with interval X". Really depends on exactly what you're doing with the data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. As I commented elsewhere, I say make it work like aliases, so if you fetch a composite, it checks the composite list first, if not found there, then regular timeseries, or something like that. |
||
~~~~~~~~ | ||
|
||
`<Location Id>.<Parameter>.Composite.0.0.<version>` | ||
|
||
|
||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
| Element | Description | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Location Id |As the normal CWMS TS ID, the location for this measure | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Parameter |As the normal CWMS TS ID, the measurement (e.g. Stage, Precip, Elevation, flow, etc) | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Parameter Type Composite|Marker that this time series does not have a fix information and is build of various member time series. | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Interval -\> 0 |Interval of data elements. may change over time with new members, duration will be indicated in the member definition | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Duration -\> 0 |Duration of average or total. may change over time with new members, duration will be indicated in the member definition| | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
|Version |As Normal CWMS TS ID | | ||
+------------------------+------------------------------------------------------------------------------------------------------------------------+ | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I like these options, but it would be nice to differentiate between a POR timeseries that includes all best available intervals (e.g. daily inst, 4 hr, 1 hr, 15 minute) and a POR timeseries that includes the best available on a daily interval (e.g. 8 am inst or daily avg). MVP's merged TS denotes these as ~15Minutes and ~1Day but maybe there's a better way. I'm thinking some way like the USGS makes it easy to pick instantaneous value data vs. daily data. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That seems like the type of information to put in the version, if desired. My assumption with the POR time series was that it should be suitable for "this is the full record as we have it" knowing that over time the official record has improved methods of measurement. So like the first few decades could be daily instantaneous, and the next decade 12 hours, then 1 hour, then 15 minutes, and maybe things would change to an average or not. But if you go further down in the document you'll see that the returned time series values also includes the members with their definition. So yes, you could make a composite time series that only included certain intervals and durations, but to the composite system itself it wouldn't care. That said we could open up the definition to allow the interval and duration to be set, we would then need to decide if that is enforced. For example:
I'm not opposed, I don't think that adds too much complexity, but other one of those more feedback from the group would be good type things. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more.
This is completely different from my understanding of what we were going to have for POR. Honestly, I can't think of any use case where having everything jumbled into a single time series is remotely useful. As it is, I have a hard time wanting to classify readings from two different sensor types (e.g. bubbler vs shaft-encoder) into a single POR. It's not the same data. Yes, it represents the same real-word measurement, but how useful is it to have them together? You can't run any worthwhile scientific/mathematical analysis on the data, since difference sensors respond in different ways and can throw off expectations. Also, what if we're actively recording the same measurement with two different sensors? Which do we put into the POR? |
||
|
||
The zero's could also be var | ||
|
||
Composite Time Series Definition | ||
================================ | ||
|
||
.. code-block::jsonc | ||
|
||
{ | ||
"office": "<string>", | ||
"name": "<ts id name>", | ||
"is-period-of-record": true, // or is authoritative. to distinguish between other possible use-cases? | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm a little unclear on the relationship between the "composite time series" and the "authoritative time series". It makes sense that the authoritative TS will generally be a composite TS, but will the functionality for both be handled within this one construct? What would be the implications of setting "is-period-of-record" to true here? I feel like this would be more appropriate as "isAuthoritative" based on my assumptions. Under the CMA paradigm a separate construct exists to link the "authoritative" TS with the parameter for a location. It seems like this will handle that automatically? e.g. if I have a Buckhorn.Flow-Outflow composite record with this set to true, this will automatically be returned when a user requests Flow-Outflow data for Buckhorn (also, is there another endpoint created for this)? Just to clarify -- I don't have strong feelings either way, just trying to understand the intent. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think Though idea behind this flag is that it makes it easier to decide what should be rendered. and now that I think about it I suspect A2W would always want to render the "authoritative" so it makes sense to actually use that language. |
||
"members": [ | ||
{ | ||
"time-series-id": "TS ID for this range", | ||
"start": "start date of this", | ||
"end": "end date of this range", | ||
"notes": "text", | ||
"version", "version date", // maybe not? could just use POR or period-of-record in the ts id version | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Not sure what this would be for -- seems like the version for an individual TSID member would just be the version of that TSID? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. This is exactly why I'm weary of using "virtual" anywhere in the naming. This is the "date based version" that some districts use, not the last portion of the time series id. Though I really should've called that field "version-date" instead of version. My though behind this is that we technically have two rather important periods-of-record that may exist
1 can certainly qualify as an official "record" in the full sense of that legal term. well, and depending on what the student is doing they might want both. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. @MikeNeilson that's an interesting take too. I was thinking most people would be interested in the best available which would include the data from your second point as well as recent realtime data. But would you differentiate the two? The USGS uses "Approved" and "Preliminary" qualifiers, but our qualifiers don't quite translate as well. There's a discourse on the qualifier discussion. https://discourse.hecdev.net/t/best-practices-of-cwms-data-qualifier-codes/3805 |
||
// if values that equals the start or end timestamp are included | ||
"start-inclusive": true, | ||
"end-inclusive": false | ||
// suggest default of "start-inclusive": true, "end-inclusive": false | ||
// it may also make sense to just make this *always* the above and not let the user set it. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think just choosing a standard for this would suffice -- seems like a toggle would over-complicate things |
||
// alternatively if this is always [) then only start is required. | ||
// however, the class would required and end field as the actual Time Series output needs to know the actual end. | ||
} | ||
] | ||
// array above *should* be sorted by start when provided to user. | ||
} | ||
|
||
|
||
Operations required: | ||
|
||
* Create | ||
* Remove member (ts id + range) | ||
* Add member | ||
* List members | ||
* Replace all members? | ||
* Delete | ||
|
||
|
||
Composite Time Series Response | ||
============================== | ||
|
||
.. code-block::jsonc | ||
|
||
{ | ||
// ... as current TimeSeries JSON | ||
"composite-members-present": [ | ||
// member definition from above | ||
] | ||
} | ||
|
||
|
||
Supported Operations: | ||
|
||
* Get, through existing TimeSeries classes. | ||
|
||
Does it make sense to support writing directly to a composite time series. While the write of each element *could* be sent to the underlying member, this seems ripe for error when editing or updating any data. It is likely that any edits would always be to the most recent time series, and configured in some other system. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I agree that seems like a bad idea. It's probably better to write to the underlying timeseries. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm against allowing writes to the composite time series. That sounds like a recipe for disaster (and likely a pain to implement while handling edge cases). Writes should, in my opinion, be done to the underlying time series. |
||
|
||
Storage of member information | ||
================================ | ||
|
||
#. Store in Clob as we refine the design | ||
#. Create appropriate tables once the design is stable. | ||
|
||
System responsibility for "knowing" to process composite. | ||
========================================================= | ||
|
||
Time Series Catalog | ||
------------------- | ||
|
||
Time Series Catalog should show composite time series and allow searching by "authoritative" | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. just FYI: we'd have to teach CWMS-Vue a new Interval definition or a new Parameter Type in order to show up properly. I know you called up inevitable CWMS-Vue updates elsewhere, but wanted to be explicit. |
||
|
||
TimeSeries DTO | ||
-------------- | ||
|
||
Add nullable "members" property. | ||
|
||
TimeSeriesDao | ||
------------- | ||
|
||
If the system sees the "Composite" marker retrieve the members for the range and build the time series. | ||
|
||
NOTE: considering the user may request the *entire* Period-of-record, this is a good opportunity to see that, start the retrieval in a job queue, and return a status URL to the user for future download. I have see such mechanism for bulk data in other systems. Maybe return an "I'm working on it variant" that the controller can know how to format. | ||
|
||
Error handling and other conditions. | ||
==================================== | ||
|
||
Versioned (date) time series | ||
---------------------------- | ||
|
||
As the composite time series is comprised of multiple other time series should this always be an error to specify? | ||
The marker for always latest or always first may make sense to allow, however, at the time series is supposed to be authoritative, that would add ambiguity. | ||
Comment on lines
+200
to
+201
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I can't personally think of any realistic use cases for creating a composite of versioned time series data. Maybe not worth supporting unless someone can give an example? There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See my comment on this (below?). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You have to define what "authoritative" means in order to properly answer this question. Do you want the data as it was in the system when operational decisions were made, or what the 'true' system state that data represents was? |
||
|
||
Datum conversions | ||
----------------- | ||
|
||
Retrievers of the Period-of-Record *SHOULD* be able to retrieve the data as a single datum. Composite retrieval should respond as https://github.com/USACE/cwms-data-api/issues/1102 and convert each member as appropriate | ||
|
||
|
||
On the saving of a composite definition | ||
--------------------------------------- | ||
|
||
The even if only a single member is added, the full definition needs to be check to ensure the ranges are still overlapping and continuous. | ||
|
||
References | ||
========== | ||
|
||
#. https://github.com/USACE/cwms-data-api/discussions/956 | ||
#. https://github.com/USACE/cwms-data-api/issues/955 | ||
#. https://www.hec.usace.army.mil/confluence/spaces/CWMS/pages/290456000/Virtual+Timeseries | ||
#. https://discourse.hecdev.net/t/period-of-record-timeseries/3859/2 |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,12 @@ | ||
################ | ||
Design Documents | ||
################ | ||
|
||
The follow pages formally document current and proposed designs | ||
relating to operations and usage of data. | ||
|
||
.. toctree:: | ||
:maxdepth: 2 | ||
:caption: Introduction | ||
|
||
Composite Time Series <./composite-time-series.rst> |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,3 @@ | ||
#### | ||
Java | ||
#### |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I do think having "virtual" somewhere in the name is somewhat helpful in immediately indicating to the user that the time series doesn't actually exist. If I saw "composite time series" without any additional context my first thought would probably be a merged copy of other time series data.
Possibly "virtual composite time series" or "composite virtual time series"? Although there may be some advantage to keeping the name more succinct as in your recommendation. It will likely be in standard enough usage that users will catch on quickly to whatever the terminology is, so I'm not too worried either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree. adding both composite and virtual would be redundant.
Additionally, we use virtual in the context of Location and Location levels to mean something calculated that doesn't physically exist somewhere (e.g. a "virtual" gauge between two others on a river.)
Whereas this really is just a composite. It's is made of other time series that physically measure something.
@ktarbet came up with the Composite name, he may be able to add more to the argument.
Though as with other things, if enough users say I'm wrong and virtual makes more sense I'll accept the group think.
But I am a little worried about everyone eventually wanting a virtual time series that's more akin to the virtual location levels and then what do we do?