Single command to initialize, send and end a COPY? #27

odo · 2015-04-17T08:48:13Z

Hi,

in my code I do a copy like this (Connection is the PID of a pgsql_connection process):

copy_to(TableName, Data, Connection) ->
    {copy_in, [text]} = pgsql_connection:simple_query(["COPY \"", TableName, "\"(time, key, value) FROM STDIN"], Connection),
    ok                = pgsql_connection:send_copy_data(Data, Connection),
    {copy, Count}     = pgsql_connection:send_copy_end(Connection),
    Count.

Since the Connection is shared by several client processes, there can be some interference here, resulting in errors like "unexpected message type 0x51".

Would it make sense to wrap this three calls into one so they are always executed in sequence?
I can implement this, just wanted to check if this makes sense or if I am using it incorrectly.

waisbrot · 2015-05-07T18:38:04Z

It seems a little odd to me, although I've only used COPY when I'm streaming data through so this use-case hadn't occurred to me outside of tests.

In my code, I always use pgsql_connection:open/1 to get an exclusive connection for myself (actually, I generally use episcina to pool connections and get it from there, but it's still exclusive while I'm holding it).

I don't think you should share a connection among multiple processes simultaneously.

odo · 2015-05-08T09:42:32Z

It seems my use case is a bit different:

I have a pool of established connections. When picking one I try to avoid the overhead of acquiring a lock as episcina does. Instead I rely on the ability of pgsql_connection being a gen_server to serialize access to the resource/the database.

In this context it seems odd that in order to use copy you have to issue a command (simple_query("COPY...")) which will put the server in a state where its API becomes meaningless unless you continue with the right sequence of commands (send_copy_data/2, send_copy_end/1).

I think sharing a connection could easily made possible.

waisbrot · 2015-05-08T10:44:19Z

You get the same behavior with transactions, and COPY is pretty much a special-case transaction.

If your code did

pgsql_connection:simple_query("BEGIN", Connection),
pgsql_connection:extended_query(Insert, Bindings, Connection),
pgsql_connection:extended_query(Insert2, Bindings2, Connection),
pgsql_connection:simple_query("COMMIT", Connection),

You would find that it was unreliable because another process might come in in the middle and issue an error-generating command which would abort your transaction.

But I think I understand: you're thinking of COPY in terms of how it works in the psql program (which is a single client not sharing its connection, but it does make it look like COPY is a single command). In that context, your suggestion does make sense to me.

What about the delay, though? Does it seem OK to you that sometimes a simple SQL command will block for a long time (because it's stuck in line behind a huge COPY) and other times it will respond instantly?

waisbrot · 2015-08-24T15:35:39Z

@odo ping. Any opinion on my question about delays?

odo · 2015-08-25T12:07:16Z

The example with the transaction makes perfect sense. From my perspective this use case would also be better served with a pgsql_connection:transaction/2 command taking the connection and a list of queries as an argument.

Regarding delays I think also a simple query like "SELECT COUNT(*) FROM..." could potentially take considerable longer than a small COPY. In my case all the queries are very similar so round-robin works well.

I you consider a scenario where you have many fast queries and some long running ones it make sense to acquire a lock and have exclusive access to the connection.

Considering this I think the way it works currently makes sense for most and I will just continue with my fork 8).

pguyot · 2015-08-25T12:16:13Z

Unfortunately, transaction probably isn't the best name for such a function as the word has a very specific meaning, unless it encloses calls with a BEGIN/COMMIT pair or ROLLBACK if something fails (and therefore wouldn't work for COPY). Also, after mnesia:transaction/1, it probably should take a function instead of a list of queries to execute.

odo · 2015-08-25T12:33:28Z

I was thinking exactly this, a bunch of queries that should be enclosed in BEGIN...COMMIT.
And COPY implicitly opens a transaction anyway as far as I remember.

That way copy and transaction would be single calls to the pgsql_connection process.

waisbrot · 2015-08-25T13:00:18Z

COPY does work within a proper transaction, so that could be a good solution. It does not itself implicitly open a transaction. Rather, it puts the current connection into a "copy" state where the only acceptable actions are sending data or ending (one way or another) the copy.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Single command to initialize, send and end a COPY? #27

Single command to initialize, send and end a COPY? #27

odo commented Apr 17, 2015

waisbrot commented May 7, 2015

odo commented May 8, 2015

waisbrot commented May 8, 2015

waisbrot commented Aug 24, 2015

odo commented Aug 25, 2015

pguyot commented Aug 25, 2015

odo commented Aug 25, 2015

waisbrot commented Aug 25, 2015

Single command to initialize, send and end a COPY? #27

Single command to initialize, send and end a COPY? #27

Comments

odo commented Apr 17, 2015

waisbrot commented May 7, 2015

odo commented May 8, 2015

waisbrot commented May 8, 2015

waisbrot commented Aug 24, 2015

odo commented Aug 25, 2015

pguyot commented Aug 25, 2015

odo commented Aug 25, 2015

waisbrot commented Aug 25, 2015