Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DDC-2825] Fix persistence on a table with a schema on a platform without schema support #881

Closed

Conversation

michaelperrin
Copy link
Contributor

This PR solves two related issues with the use of a database schema on platforms (such as SQLite) that don't support schemas.

I discovered the issues when I generated the schema from my Doctrine entities on SQLite (for unit test purposes of my application) whereas my main application uses PostgreSQL.

This is one of my first PR on Doctrine, so sorry if I made some things in the wrong way and I'm open to discussion.

First problem: table names dots are not converted in the ORM

On a platform like SQLite, DBAL converts table names with dots (ie. a schema is declared) to double underscores.
However, the ORM doesn't do it, and persisting leads to an exception.

Example:

MyNamespace\Mytable:
    type: entity
    table: myschema.mytable
    # ...

And then somewhere in the code:

$myTable = new MyNamespace\Mytable();
$entityManager->persist($myTable);
$entityManager->flush();

This doesn't work in the current version of Doctrine. The table is created as myschema__mytable but entities are unsuccessfully saved in myschema.mytable.

Second problem: table names with reserved keywords in a database schema are not correctly escaped

When a table name is declared as myschema.order (or any other reserved keyword), only the reserved keyword part is escaped when creating the table, leading to the creation of a table name like myschema__order, which is invalid and therefore fails.

How this PR solves the problem

The classmetadata now stores in 2 separated properties the name of the table and the name of the schema. The schema property was partially implemented but I now make a full use of it.

When metadata is read (from Annotations, YAML, ...), if the table name has a dot (myschema.mytable), it's splitted into 2 parts, and myschema is saved in the schema table property, and mytable is saved in the name table property, instead of storing the whole myschema.mytable in the name table property.

This allows to do specific things about schemas everywhere in Doctrine, and not splitting again parts everywhere it's needed.

By the way, the schema property can now fully be used.

For instance, these 2 YAML configurations are valid and do the same thing:

MyNamespace\Mytable:
    type: entity
    table: myschema.mytable

and:

MyNamespace\Mytable:
    type: entity
    table: mytable
    schema: myschema

This was something which was not finished to be implented since Doctrine 2.0.

The Default quote strategy now converts back the schema and table names to a unique table name, depending on the platform (e.g. myschema.mytable if the platform supports schemas, and myschema__mytable otherwise).

As a result, there is no problem anymore and entities can be persisted without getting any exception.
I added some unit tests for this (the same unit tests failed before of course).

There's however a slight tradeoff on performance, as the getTableName of the DefaultQuoteStrategy class adds some tests to return the correct table name.

This solved these Doctrine issues: DDC-2825 and DDC-2636.

Again, this is one of my first PRs on Doctrine, so if there's anything wrong or if you have any question, feel free to comment this PR.

@Ocramius This PR is about what we talked about in DDC-2825

@doctrinebot
Copy link

Hello,

thank you for creating this pull request. I have automatically opened an issue
on our Jira Bug Tracker for you. See the issue link:

http://www.doctrine-project.org/jira/browse/DDC-2861

We use Jira to track the state of pull requests and the versions they got
included in.

@michaelperrin
Copy link
Contributor Author

I didn't notice that a unit test now fails with this PR (the one concerning DBAL-204 on MySQL). I'm going to have a look at it.

@michaelperrin
Copy link
Contributor Author

All unit tests for all platforms are now fixed.

$sequenceName = $class->getTableName() . '_' . $columnName . '_seq';
$definition = array(

if ($schemaName = $class->getSchemaName()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you extract that block and the one above into a private helper method? also don't use else then, but make it an early return for the matching if only.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beberlei That is not necessary. This block is already changed by PR #890 and should be covered by DBAL. Also see comment above.

@beberlei
Copy link
Member

beberlei commented Jan 2, 2014

Very good PR, please fix the issues mentioned, squash all commits and rebase to master.

$sequenceName = $class->getTableName() . '_' . $columnName . '_seq';
}

$definition = array(
'sequenceName' => $this->targetPlatform->fixSchemaElementName($sequenceName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@beberlei Btw this line looks dangerous to me because it changes and trims the sequence name on the fly if necessary and does not guarantee it to match anymore with the sequence in the database created by DBAL. IMO ORM should not do something like that. It should be coming valid from DBAL in the first place.
The main concern here is about Oracle where sequence names cannot be longer than 30 charachters. It will silently break emulated autoincrementation via triggers as the trigger references the sequence by name which will fail if it was renamed here and return 0 again when calling lastInsertId().
We should think about creating a unique name like for unnamed indexes, unique and foreign key constraints in DBAL to come around this limitation. Please also see PR #890. But I don't know if that would break BC.

@michaelperrin
Copy link
Contributor Author

Thank you @beberlei for your comments, this is now fixed!

As @deeky666 pointed out, I had to make some further changes when rebasing, as some changes have recently been made on PR #890 .

I had to use a little workaround to avoid break compatibility issues on PostgreSQL, so that sequence names have the same before and after this PR changes.

As there is now a DBAL method to get the sequence name, based on table name and column name (see doctrine/dbal#428), I have to prepend the schema name to the table name before delegating the sequence name to DBAL. Maybe there should be a third optional argument for the schema name on the getIdentitySequenceName method on AbstractPlatform.php in DBAL.

Everything works fine, apart from the current build problem on the master branch.

If you have any other comment to improve the code, I'll be glad to help.

@deeky666
Copy link
Member

deeky666 commented Jan 3, 2014

@michaelperrin @beberlei While I definitely like this PR I'm afraid this is a BC break because it will break receiving lastInderstId for existing tables. Tables already created with a sequence in the format <table>_<column>_seq will not be compatible with the new <schema>_<table>_<column>_seq format.
If we should decide that this break is acceptable, I would suggest some further improvements:

  1. Pass the full qualified table name <schema>.<table> to AbstractPlatform::getIdentitySequenceName() and generate a unique hash key for the sequence name like SEQ_<HASH> (as already done for other asset names, see previous comment) to avoid problems with identifier length in Oracle and other platforms.
  2. AbstractPlatform::getIdentitySequenceName() needs to check and split the full qualified table name into schema and table like done in other DBAL methods already.

Otherwise this PR would only fix one problem and probably introduce others for max identifier length and sequence name matching of existing tables.

@michaelperrin
Copy link
Contributor Author

@deeky666 Thank you for your comments. Tell me if I am wrong, but schema name was already included in sequence names before this PR.

As schemas were included as to define a schema, the table name had to be defined as schemaname.tablename, (which is still possible after this PR). The difference is that the getTableName of the ClassMetadataInfo class now only returns tablename instead of schemaname.tablename.

Still, you are right as there is a difference: the sequence name was schemaname.tablename_columnname_seq and it is now schemaname_tablename_columnname_seq (a dot replaced by an underscore).

I like your second idea: AbstractPlatform::getIdentitySequenceName() could be implemented to replace the dot by an underscore (or a hash) for platforms that don't support schemas, and keep the dot for platforms that support schemas.

We keep this way full BC, and the ClassMetadataFactory keeps the same as before this PR.

What do you think? If you think that's the way to go, I update this PR and open a PR on DBAL.

@deeky666
Copy link
Member

deeky666 commented Jan 3, 2014

@michaelperrin Hmm I'm not getting entirely through this but you might be right. But what seems wrong is how the sequence name is now passed to the platform if the table name contans a schema prefix because it now contains an underscore instead of a dot. On PostgreSQL for example it was schema.table before and now is schema_table, right? This would be wrong then. Also I wonder if you should not evaluate AbstractPlatform::supportsSchemas for dotted to underscore notation conversion instead (not entirely sure).

@michaelperrin
Copy link
Contributor Author

@deeky666 Yes, this is what I realized with your previous comment, there is a change on the PostgreSQL platform.
Do you think I should check for platform support in the ORM and decide in my ClassMetadataFactory::getSchemaName method whether to convert the dot to an underscore or not (depending on platform support for schemas), or should this be handled in DBAL? The first solution has the advantage to be easier, the second one might be a bit more flexible.

@deeky666
Copy link
Member

deeky666 commented Jan 3, 2014

@michaelperrin I am not sure if understand the root problem correctly but I'll try to explain my understanding. The problem is that table names that contain a dot do not work on platforms that do not support schemas/schema notation. Then my understanding is that this is a portability fix because why would you define your table names with dots then if you know that the platform does not support that kind of notation? If that is the case I would fix that entirely in ORM as you would not expect platforms that do not support this kind of notation to check for that and fix it by themselves. Then AbstractPlatform::supportsSchemas is the way to check for whether to convert or not in this case IMO. But as I said I am not entirely sure about this as I am not too much into ORM. Maybe someone else can provide feedback on that, too.

@@ -0,0 +1,42 @@
<?xml version="1.0" encoding="utf-8"?>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure you wanted to commit this file into the PR?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aw, sorry, not really used to Github Mac software. Back to command line! This is fixed.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No problem :)

@michaelperrin
Copy link
Contributor Author

@deeky666 Concerning your last comment, I'm going to try to make things a bit clearer.

The short answer: this PR aims to make ORM entities work on all platforms even when schemas are used, as the ORM is an abstraction layer.

Longer answer: Why is it useful to make schemas work on platforms that don't support them?

Here is my use case that made me discover the issue: my web project is developed on PostgreSQL and we make use of schemas (table names are therefore declared with dots).

I want to run unit tests of my project on a clean database. The idea is to create the schema structure in SQLite at the beginning of unit tests, run tests, and destroy the database. This currently doesn't work, as whereas DBAL is able to create the database structure (by emulating schemas), the ORM doesn't manage to read and save entities within this same database.

The ORM should be able to build structure and use entities for any database platform, based on the same database structure configuration (Annotations, YAML files, XML files, etc.). It works actually pretty fine, except this case.

This PR aim is twofold:

  • Fix conversion of dots for platforms that emulate schemas (as done in DBAL) so that entities work
  • Better separation of schema name and table name (instead of declaring {table: myschema.mytable}, it can be declared as {table: mytable, schema: myschema}). The first notation is still available, but things are clearly separated at execution time.

I hope it's a bit more clear.

@deeky666
Copy link
Member

deeky666 commented Jan 3, 2014

@michaelperrin Thank you for that explanation. I think I get it now. So in my opinion you should use dotted notation if the underlying platform supports schemas, use double underscore for platforms that do not support schemas but can emulate. This leaves the question open about platforms that do neither support nor emulate schemas. Is the dotted notation acceptable here on all platforms? As for MySQL it works because it qualifies the database instead. But what about the others? Is that notation acceptable on all platforms then? Another question I have is about the double underscore notation. Is that a convention by Doctrine for all schema emulating platforms or is it SQLite specific?
I don't want to be too picky here. Just make up my mind about the implications of this. IMO the best solution would be to provide some methods on AbstractPlatform to let each platform decide how to deal with each case (but that is out of scope of this PR and might only be a nice extension at some later point).

@michaelperrin
Copy link
Contributor Author

@deeky666 I have fixed the name of sequences for platforms that support schemas. They are now named like myschema.mytable_mycolumn_seq as before, instead of myschema_mytable_mycolumn_seq as the previous version of this PR introduced.

SQLite is currently the only one platform emulating schemas and it uses double underscores (see the SqlitePlatform class in DBAL). A bit off the topic, but the use of PHP traits like "SchemaSupportPlatform" and "SchemaEmulatedPlatform" and so on would be a nice addition in future versions of DBAL (but that wouldn't work on PHP 5.3 anymore).

I agree with you. This PR can be a first step that solves the current problem and gives better support of schemas, and then we should enhance AbstractPlatform on DBAL in an other PR. I would be glad to help on this.

@beberlei What do you think about this PR? Do you think it can be merged and enhancements can be made later on on DBAL?

@michaelperrin
Copy link
Contributor Author

Any news on this?
If there are any concerns, I'd be glad to fix them.

return isset($class->table['quoted'])
? $platform->quoteIdentifier($class->table['name'])
: $class->table['name'];
if ( ! empty($class->table['schema'])) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you switch all these to avoid excessive else?

Something like:

$foo = 'bar';

if ($condition) {
    $foo = 'baz';

    if ($otherCondition) { 
        $foo = 'tab';
    }
}

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Concerning this one, I agree with you about code readability but performance will be slightly affected.
However I think that readability wins on this!

@michaelperrin
Copy link
Contributor Author

@Ocramius I made most of the changes you mentioned.
There are now an extra private method to get the sequence name, and some small code changes have been made accordingly to your comments.

I updated the documentation for all mapping formats. It was not necessary to update the XSD file as it already mentioned the schema attribute.

I agree with you concerning the use of double underscores directly in the ORM, it should be handled directly by platforms in DBAL.

However, do you think we consider this PR as a first step?

I can then open a PR on DBAL to add a method like AbstractPlatform::getSequenceName($columnName, $tableName, $schemaName = null) and make use of it in the ORM.
An other method like this should also be implemented for getting the table name depending on the schema name and table name in DBAL. It would be used in the ORM in DefaultQuoteStrategy::getTableName.

Do you want me to do this already now, or after this PR is merged?

@Ocramius
Copy link
Member

@michaelperrin I think that's up to @deeky666 to decide. DBAL is already in beta, so I'm not sure if it can be added now without messing with the release process.

Is the current failure on travis related with your changes?

@michaelperrin
Copy link
Contributor Author

@Ocramius No, it's not related to my changes. I think it's doctrine/dbal#508 on DBAL that causes Travis failure.

Ocramius added a commit that referenced this pull request Jan 14, 2015
Ocramius added a commit that referenced this pull request Jan 14, 2015
Ocramius added a commit that referenced this pull request Jan 14, 2015
…rimaryTable()` instead of duplicating `explode()` logic
Ocramius added a commit that referenced this pull request Jan 14, 2015
Ocramius added a commit that referenced this pull request Jan 14, 2015
Ocramius added a commit that referenced this pull request Jan 14, 2015
Ocramius added a commit that referenced this pull request Jan 14, 2015
Ocramius added a commit that referenced this pull request Jan 14, 2015
Ocramius added a commit that referenced this pull request Jan 14, 2015
@Ocramius Ocramius closed this in 1d4b96e Jan 14, 2015
@Ocramius
Copy link
Member

I rewrote large chunks of the testing in this PR, as I found various bugs with XML/YAML and PHP mapping types (they were untested).

Other than that, this is merged via 1d4b96e and will land in 2.5.

Thanks @michaelperrin!

@Ocramius Ocramius self-assigned this Jan 14, 2015
@michaelperrin
Copy link
Contributor Author

Thanks @Ocramius! Glad this PR got merged and I'm looking forward to the future 2.5 release and other contributions as well!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants