Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Single command to initialize, send and end a COPY? #27

Open
odo opened this issue Apr 17, 2015 · 8 comments
Open

Single command to initialize, send and end a COPY? #27

odo opened this issue Apr 17, 2015 · 8 comments

Comments

@odo
Copy link

odo commented Apr 17, 2015

Hi,

in my code I do a copy like this (Connection is the PID of a pgsql_connection process):

copy_to(TableName, Data, Connection) ->
    {copy_in, [text]} = pgsql_connection:simple_query(["COPY \"", TableName, "\"(time, key, value) FROM STDIN"], Connection),
    ok                = pgsql_connection:send_copy_data(Data, Connection),
    {copy, Count}     = pgsql_connection:send_copy_end(Connection),
    Count.

Since the Connection is shared by several client processes, there can be some interference here, resulting in errors like "unexpected message type 0x51".

Would it make sense to wrap this three calls into one so they are always executed in sequence?
I can implement this, just wanted to check if this makes sense or if I am using it incorrectly.

@waisbrot
Copy link

waisbrot commented May 7, 2015

It seems a little odd to me, although I've only used COPY when I'm streaming data through so this use-case hadn't occurred to me outside of tests.

In my code, I always use pgsql_connection:open/1 to get an exclusive connection for myself (actually, I generally use episcina to pool connections and get it from there, but it's still exclusive while I'm holding it).

I don't think you should share a connection among multiple processes simultaneously.

@odo
Copy link
Author

odo commented May 8, 2015

It seems my use case is a bit different:

I have a pool of established connections. When picking one I try to avoid the overhead of acquiring a lock as episcina does. Instead I rely on the ability of pgsql_connection being a gen_server to serialize access to the resource/the database.

In this context it seems odd that in order to use copy you have to issue a command (simple_query("COPY...")) which will put the server in a state where its API becomes meaningless unless you continue with the right sequence of commands (send_copy_data/2, send_copy_end/1).

I think sharing a connection could easily made possible.

@waisbrot
Copy link

waisbrot commented May 8, 2015

You get the same behavior with transactions, and COPY is pretty much a special-case transaction.

If your code did

pgsql_connection:simple_query("BEGIN", Connection),
pgsql_connection:extended_query(Insert, Bindings, Connection),
pgsql_connection:extended_query(Insert2, Bindings2, Connection),
pgsql_connection:simple_query("COMMIT", Connection),

You would find that it was unreliable because another process might come in in the middle and issue an error-generating command which would abort your transaction.

But I think I understand: you're thinking of COPY in terms of how it works in the psql program (which is a single client not sharing its connection, but it does make it look like COPY is a single command). In that context, your suggestion does make sense to me.

What about the delay, though? Does it seem OK to you that sometimes a simple SQL command will block for a long time (because it's stuck in line behind a huge COPY) and other times it will respond instantly?

@waisbrot
Copy link

@odo ping. Any opinion on my question about delays?

@odo
Copy link
Author

odo commented Aug 25, 2015

The example with the transaction makes perfect sense. From my perspective this use case would also be better served with a pgsql_connection:transaction/2 command taking the connection and a list of queries as an argument.

Regarding delays I think also a simple query like "SELECT COUNT(*) FROM..." could potentially take considerable longer than a small COPY. In my case all the queries are very similar so round-robin works well.

I you consider a scenario where you have many fast queries and some long running ones it make sense to acquire a lock and have exclusive access to the connection.

Considering this I think the way it works currently makes sense for most and I will just continue with my fork 8).

@pguyot
Copy link
Contributor

pguyot commented Aug 25, 2015

Unfortunately, transaction probably isn't the best name for such a function as the word has a very specific meaning, unless it encloses calls with a BEGIN/COMMIT pair or ROLLBACK if something fails (and therefore wouldn't work for COPY). Also, after mnesia:transaction/1, it probably should take a function instead of a list of queries to execute.

@odo
Copy link
Author

odo commented Aug 25, 2015

I was thinking exactly this, a bunch of queries that should be enclosed in BEGIN...COMMIT.
And COPY implicitly opens a transaction anyway as far as I remember.

That way copy and transaction would be single calls to the pgsql_connection process.

@waisbrot
Copy link

COPY does work within a proper transaction, so that could be a good solution. It does not itself implicitly open a transaction. Rather, it puts the current connection into a "copy" state where the only acceptable actions are sending data or ending (one way or another) the copy.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants