Skip to content
This repository has been archived by the owner on Jan 8, 2020. It is now read-only.

[ZF3] Data Transformer idea #5051

Closed
bakura10 opened this issue Aug 30, 2013 · 18 comments
Closed

[ZF3] Data Transformer idea #5051

bakura10 opened this issue Aug 30, 2013 · 18 comments

Comments

@bakura10
Copy link
Contributor

ZF3 - Data Transformer component

EDIT : This can of course be integrated in future version of ZF2 if there is interest for it.

Hi everyone. I'd like to share some thoughts about a new component that Ocramius and I thought about on IRC that would solve some issues more cleanly, in Zend Framework 3.

This component would be called "DataTransformer" (I kinda like the name) and should be responsible to… well… transform data to one format to another. The whole point of this component would be to have something "higher level" than introducing similar concepts in various components (like normalizers in hydrators).

It looks a bit like hydrators, but in fact this component could transform one array to another array. Hydrators could be thought as specialized data transformers (they could even implement a DataTransformer interface, in fact).

Problems

In order to highlight the problems I want to solve, here are two examples:

ZF2 ClassMethods hydrator

The current ClassMethods hydrator has a "underscoreSeparated" option that is used by the hydrator to transform keys from underscore_separated keys to camelCased (or the opposite.)

This is used among other things so that people can add form elements using the standard ZF2 "underscore_separated" convention for input names, and when hydrating, convert back those keys to "camelCased" so that the key "first_name" calls the "getFirstName" method. And vice-versa.

In my hydrator refactor I've removed those conversions because: it makes the hydrator harder to read and maintains and seems like the wrong place to do it (what about other components like Input Filters?).

REST API

I'm an Ember-Data user (which is a ORM for JavaScript), and this framework has very strong conventions (made by Ruby on Rails developers). It's better not changed those conventions and adapt the backend code to use their convention for communicate.

For instance, let's stay we have the following User entity:

class User
{
    protected $id;
    protected $firstName;
    protected $country;
}

When Ember-Data will make a POST request to the server, it will send data using the following form (according to the JSON API format):

{
    "users": [{
        "first_name": "bakura",
        "country": 4
    }]
}

It automatically wrapped the payload around a word that is the pluralized word of the entity ("users"), and converted everything to underscore_separated. Obviously, the server does not understand this format and will fail. For solving this issue we must write a lot of boilerplate code in input filter, hydrator…

On the other hand, for Ember-Data to understand a GET response, the server must send the data according to this format.

Some may argue that the client ORM should adapt to the backend, but in this case, it's just pain in the ass.

Anyway, this example is just to show you one use case of data-transformers.

Solutions

One first and naive solution is to do what we did in the ClassMethods hydrator: embedding the transformer directly in the hydrator. However, this does not work: if we use a REST API, a POST request will first validate and filter data using an input filter. At this point, hydrator was not executed yet. So for this to work, we should duplicate also the normalizing code to the input filter. Obviously not good.

Furthermore, embedding the transformation in the hydrator make it hard to extend to custom use-cases (for instance Ember or any other JS framework).

DataTransformer

The DataTransformer would be a very simple interface that would define only one method:

interface DataTransformer
{
    /**
     * @param mixed $data
     * @return mixed
     */
    public function transform($data);
}

Transformations could be chained using a TransformationPipeline:

interface TransformationPipeline extends DataTransformer, Countable
{
    /**
     * @param DataTransformer $transformer
     * @return void
     */
    public function addTransformer(DataTransformer $transformer);
}

How to use it?

REST API exemple

The idea is that this thing could be use everywhere, for any data transformation. For instance, I may decide to attach one transformer very early in the process, maybe on "route" event, and convert the POST data from one format to another. On the other hand, I could have another transformer on "render" event that would convert back to another format.

Here is a workflow, using my previous Ember-Data exemple:

  1. Ember-Data POST following payload: {"users": [{"first_name": "Marco"}]}
  2. The data transformer attached to "route" event is executed, and convert this payload to: {"firstName": "Marco"}.
  3. Input filters validate the data.
  4. Hydrators create a new User, hydrate it using the data.
  5. User is persisted to database.
  6. The application extract user with the hydrator. Because the hydrator is not responsible for doing any data transformation any more (it just extracts or hydrates), it returns: {"firstName": "Marco", "id": 4}
  7. The data transformer attached to "render" event is executed, and convert to payload to {"users": [{"id": 4, "first_name": "Marco"}]}

Problems

Here are a few problems:

  • Data transformers should have a mecanism to accept or refuse a transformation. For instance if I use a hybrid REST app, we may not want to transform some data depending on the request type. Or maybe it's just simpler to embed the accept in the "transform" method.
  • EDIT : obviosuy, transforming from underscore_separated to camelCased in hydrator had one advantage: people could use underscore_separated key everywhere, and it was only transformed at the latest step of the process. With this solution it cannot be done.

Thoughts

?

@macnibblet
Copy link
Contributor

Initial response: fucking brilliant!

@Ocramius
Copy link
Member

This is just a piece of the puzzle though.

I'm not saying anything new other than what we've already discussed on IRC, but basically, my idea is a bit more widespread.

What we're currently doing can be represented via pipes:

  • incoming: cat request | data-transformer | input-filter | filter | validator | hydrator > data-model
  • outgoing: cat data-model | hydrator | filter | data-transformer | serializer > output

Data Transformer is just one of the missing pieces, but what I see is a replacement for the huge (always hated) Form component.

Huge +1 for data transformer at first, but I'm just not sure about how to deal with non-array data in this "bigger" plan.

@devosc
Copy link
Contributor

devosc commented Aug 31, 2013

Is a deserializer missing in the incoming? If so, that could make whatever it is an array?

Sent from my iPhone

On Aug 30, 2013, at 8:51 PM, Marco Pivetta notifications@github.com wrote:

This is just a piece of the puzzle though.

I'm not saying anything new other than what we've already discussed on IRC, but basically, my idea is a bit more widespread.

What we're currently doing can be represented via pipes:

incoming: cat request | data-transformer | input-filter | filter | validator | hydrator > data-model
outgoing: cat data-model | hydrator | filter | data-transformer | serializer > output
Data Transformer is just one of the missing pieces, but what I see is a replacement for the huge (always hated) Form component.

Huge +1 for data transformer at first, but I'm just not sure about how to deal with non-array data in this "bigger" plan.


Reply to this email directly or view it on GitHub.

@weierophinney
Copy link
Member

Initial feedback: s/Pipeline/Chain/ to be consistent with the rest of ZF. I'm also with @Ocramius here -- it's one piece of a full toolchain. I like it!

@devosc DataTransformer is to manipulate data either incoming out outgoing from one representation to another. In terms of deserialization, likely you're thinking along the lines of what the current Zend\Stdlib\Hydrator component does via the extract() method.

@devosc
Copy link
Contributor

devosc commented Sep 3, 2013

At the time I was thinking more along the lines of php unserialize for data coming from session storage (as an example)... Actually I wondered if decode and encode were better names... But I may be missing the point here :)

Sent from my iPhone

On Sep 3, 2013, at 10:26 AM, weierophinney notifications@github.com wrote:

Initial feedback: s/Pipeline/Chain/ to be consistent with the rest of ZF. I'm also with @Ocramius here -- it's one piece of a full toolchain. I like it!

@devosc DataTransformer is to manipulate data either incoming out outgoing from one representation to another. In terms of deserialization, likely you're thinking along the lines of what the current Zend\Stdlib\Hydrator component does via the extract() method.


Reply to this email directly or view it on GitHub.

@bakura10
Copy link
Contributor Author

bakura10 commented Sep 3, 2013

Yep, it's not really the same. In my mind DataTransformer are not bi-directional. They just convert any format to anything. This anything can be… well… anything. Changing keys, changing how values are ordered… It would really be a really lightweight component, in fact.

@devosc
Copy link
Contributor

devosc commented Sep 3, 2013

So what's the purpose of the serializer after the data transformer in the output?

Sent from my iPhone

On Sep 3, 2013, at 10:59 AM, Michaël Gallego notifications@github.com wrote:

Yep, it's not really the same. In my mind DataTransformer are not bi-directional. They just convert any format to anything. This anything can be… well… anything. Changing keys, changing how values are ordered… It would really be a really lightweight component, in fact.


Reply to this email directly or view it on GitHub.

@weierophinney
Copy link
Member

@devosc Serializer is primarily for working with binary formats; DataTransformer is targetted at PHP native types.

@flip111
Copy link

flip111 commented Sep 15, 2013

Hi, it's a good idea. I'd like to take the opportunity of this brainstorm phase to widen the discussion before narrowing it down and implementation. So i looked around and wrote some stuff down that might need to be considered. Lists come first, scroll down for questions and some code.

Below is a list of possible steps you might want to do with the data.

  • Changing data format -- hydrator, serializer (see for formats a list below)
    • Example: Change an array to an object
  • Changing data
    • Example: Replace non-allowed words with ****
  • Changing identifer -- key / property name
    • Example: change underscore_seperated to camelCase
  • Order data -- for array, string
    • Example: modify ordering in a JSON string
  • Validating data
    • Example: work with user input like HTML form
  • Validating data database backed
    • Example: check if a user already exist with this e-mail
  • Filtering data
    • Example: Filter out non-defined (in metadata) user input
  • Persisting data
    • Example: Doctrine2 also uses metadata
  • Merging
    • Example: Take user input and merge it with a timestamp of now into an object
  • Splitting
    • Example: Split one object in two entities for different persistence layers
  • Overwrite (update)
    • Example: Take en entity and overwrite some parts of it with user input

What should happen if a steps fails, or completes with errors?

  • Stop the pipe
  • Throw exceptions
  • Have pipe messages
    • Example: Errors that can be shown to the user

Each step can apply at different levels

  • property/method level
    • Example: Validate that the username is at least 4 characters
  • object level
    • Example: Validate that the confirm e-mail is the same as the e-mail
  • object graph level
    • Example: Validate that the user has at least specified 3 books (in a one User to Many Books)

Data formats

  • Flat Array
    • Example: $_POST array
  • Graph Array, Object Graph, String (PHP serialized, JSON, XML, YML)

Datamodels

  • Tree, Directed Aclyclic Graph, Graph.
    • Note: full-blown graphs might be too complex and need other solutions, also bubble up will be difficult.
    • Bubble up example: if a property is invalid it can invalidate the entire object.
  • Inheritence.
    • -- No idea if and how this should be taken into account

Metadata formats

Annotations, XML, YML

A list of metadata already out there.

Some will be the same as the steps defined above. But others like group do not apply to one particular step, but to all the steps (group is like a profile of the entire pipe). A second problem is that traditionally this meta data does exist on a property level or object level, but so far i have not seen metadata that described an object graph.

(a lot of annotations are described here: http://jmsyst.com/libs/serializer/master/reference/annotations)

  • Validation: http://symfony.com/doc/current/reference/constraints.html
  • Metadata group. Instead of just choosing which properties will be affected by transformation groups can be used (@groups, @ExclusionPolicy, @exclude, @expose) to specify all of the metadata applicable.
  • Renaming identifiers. As suggested by bukara10 and implemented as @Serializedname
  • Versioning @SInCE, @until (this is for the maintainer of the software)
  • Hydration mode (@accesstype, @Accessor, @VirtualProperty, @HandlerCallback)
  • Data format output mode (@AccessorOrder)
  • Permissions (@readonly)
  • Type hinting (@type) -- could be used for: serialisation, database, validation, maybe more?

Metadata for Doctrine2

Additional metadata for Doctrine 2 that could be combined with metadata described above.

Possible to use one piece of metadata for multiple purposes:

  • @column:
    • length, precision, scale, unique, nullable --> could be combined with validation
  • @entity
    • readOnly --> could be combined with permissions
  • @UniqueConstraint --> could be combined with validation
  • @Version --> This is a different versioning then jms serializer (version on persist)

Questions

Looking at all these possibilities a few questions come to mind:

  • What should this chain/pipe thing actually become and which things should it implement for what reason? (and if not, why not?)
  • Steps can identity on which level they work.
    • Example: a certain filter can only work on a single property
    • Example 2: a certain Hydrator can work on an object graph
  • In a linux pipe it doesn't matter which component comes after another (or am i doing a wrong assumption here, as my linux knowledge does not go that far). Is it possible to make the order of the steps that arbitrary in php as well? If not .. should the programmer have the responsibility of keeping track of that. Or can one step identify what steps can come before or after it?
  • What level of control should the programmer have over the chain?

Code

Some code of how some of this stuff could work:

class PipeObject {
    public function __construct('PipeName'); // instead of calling it "group" or "profile"
    public function addStep($stepObject) {}
}

class Obj {
    /**
     * @Assert\NotBlank(
     *   pipes={"pipe1"} // instead of calling it "group" or "profile"
     * )
     * @Assert\Length(
     *   pipes={"pipe2"}, // instead of calling it "group" or "profile"
     *   min=5
     * )
     * @Hydrate(
     *   pipes={"pipe1"},
     *   hydrator="Reflection"
     * )
     */
    protected $name;
    protected $email;

}

$pipe = new PipeObject('pipe1');
$pipe->addStep(new Filter);
$pipe->addStep(new Hydrator); // pipe throws an exception now if Hydrator says it can not come after a filter, or if the filter doesn't want a hydrator behind it
$result = $pipe->getOutput($input);

I hope any of this is useful to get some good ideas ! :)

@bakura10
Copy link
Contributor Author

Hi,

I'm not sure to have fully understood everything you're saying. Sorry about that ;-).

I think the main question is to draw clear boundaries about what component should do or not doing (hydrators, serializers, data transformers).

  • Changing data format -- hydrator, serializer (see for formats a list below)

Data transformer.

  • Changing data

Data transformer.

  • Changing identifer -- key / property name

Data transformer.

  • Order data -- for array, string

Data transformer.

  • Validating data

Input filter.

  • Validating data database backed

Input filter.

Globally, it seems that you confuse several things. A lot of what you are saying in your post seems to belong to the Input filter component.

I'm not sure to understand your part about various JMSSerializer annotations. It will be even more confuse because we already have a Serializer component. I mean the terminology is getting really confused. We really should make a clear document about : which components do we have, how do we call them, what are they used for, what kind of data they operate on...

Anyway, thanks for your comment, it is really helpful :).

@flip111
Copy link

flip111 commented Sep 15, 2013

Ooh ok i misunderstood the concept, i was under the assumption that one component just does 1 thing (just like in linux). But as i understand now it's like this:

  • Data Transformer
    • Changing data format
    • Changing data
    • Changing identifer
    • Order data
  • Input filter
    • Validating data
    • Validating data database backed
    • Filtering data
  • Database Abstract Layer
    • Persisting data
  • Where does this go?
    • Merging
    • Splitting
    • Overwrite (update)

I certainly agree that boundaries about what what should do are very important!

@juriansluiman
Copy link
Contributor

Could this component solve the issue worked out in https://github.com/SamsonIT/DataViewBundle as well? I just learned about this pierce while I was thinking about this for quite some time. Basically, just like a .phtml view script transforms an entity into HTML, a DataView transforms an entity into a configurable array. This array can be used to return json from a RESTfull server.

The point there, you don't want to have a 1:1 mapping between entities and json. Either because naming between entities and json can differ (camel case/underscores etc) or not all properties should be converted (hashed passwords from users). A data view is able to make this conversion per entity, if desired.

Not sure if this is relevant for this proposal, but it just sounded similar and hopefully we can get something like this for zf2 too.

@bakura10
Copy link
Contributor Author

Yes, this seems like a sane use case of such a feature. The question is : what should be included by default in data transformer component ? Only some interfaces and a plugin manager or something more complex ?

I'm a bit in the dark :).

Envoyé de mon iPhone

Le 15 sept. 2013 à 21:50, Jurian Sluiman notifications@github.com a écrit :

Could this component solve the issue worked out in https://github.com/SamsonIT/DataViewBundle as well? I just learned about this pierce while I was thinking about this for quite some time. Basically, just like a .phtml view script transforms an entity into HTML, a DataView transforms an entity into a configurable array. This array can be used to return json from a RESTfull server.

The point there, you don't want to have a 1:1 mapping between entities and json. Either because naming between entities and json can differ (camel case/underscores etc) or not all properties should be converted (hashed passwords from users). A data view is able to make this conversion per entity, if desired.

Not sure if this is relevant for this proposal, but it just sounded similar and hopefully we can get something like this for zf2 too.


Reply to this email directly or view it on GitHub.

@localheinz
Copy link
Member

Sounds like a great idea to me.

Best regards,

Andreas

@Thinkscape
Copy link
Member

Data transformers should have a mecanism to accept or refuse a transformation.

I've got mixed feelings about that - it's something that sounds like a job for validators.

Validator and ValidatorChain already work as a pipeline - handle missing values, empty values, can break the chain on failure of select validators, so this sounds like a potential duplication.

In case a transformer doesn't know what to do with the data (i.e. double-deserialization for some reason) or some other case when transformation is not required transform() should just returned untouched data. The question is, which component should handle invalid data ? TransformerValidator ? :-)

@Ocramius
Copy link
Member

@Thinkscape invalid data should simply halt everything. You don't need to process invalid stuff except if you want to log malicious attacks.

@RalfEggert
Copy link
Contributor

Please mark it for the 3.0.0 milestone!

@GeeH
Copy link

GeeH commented Jun 27, 2016

This issue has been closed as part of the bug migration program as outlined here - http://framework.zend.com/blog/2016-04-11-issue-closures.html

@GeeH GeeH closed this as completed Jun 27, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests