Cool Stuff in PostgreSQL 10: Transition Table Triggers
Last time, we created a heirarchy of tables for logging, but we don't have anything they'd log, and we don't have a way to ensure the logging happens automagically. This time, we'll fix those problems.
First, a thing to log:
CREATE TABLE foo(
id BIGSERIAL PRIMARY KEY,
t TEXT
);
Next, let's make an efficient trigger function. You may have seen one of these before, but it was probably row-based, so it couldn't be super efficient. This one is statement-based, and uses the new transition table feature in PostgreSQL 10. We only need to build one trigger function because our logging infrastructure is schema-agnostic. We could have separated it into three trigger functions, but that's just more code to maintain. It starts off as usual:
CREATE OR REPLACE FUNCTION log()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
IF TG_OP NOT IN ('INSERT', 'UPDATE', 'DELETE') THEN
RAISE EXCEPTION 'This function should not fire on %', TG_OP;
END IF;
IF TG_OP = 'INSERT' THEN
INSERT INTO the_log (
action, table_schema, table_name, new_row
)
SELECT
TG_OP, TG_TABLE_SCHEMA, TG_RELNAME, row_to_json(new_table)::jsonb
FROM
new_table;
While the rest of it should be familiar, new_table is new. It's a transition table, which contains the rows as they looked after the action, which is an INSERT. We've picked an arbitrary name for it, and we'll see how to do that below. One operation done, two to go. This next looks pretty similar:
ELSIF TG_OP = 'DELETE' THEN
INSERT INTO the_log (
action, table_schema, table_name, old_row
)
SELECT
TG_OP, TG_TABLE_SCHEMA, TG_RELNAME, row_to_json(old_table)::jsonb
FROM
old_table;
The other transition table, old_table, comes into play here. For UPDATE, we'll use both:
ELSE
/*
* DANGER, WILL ROBINSON! DANGER!
* This implementation assumes based on current implementation details
* that old_table and new_table have identical orderings. Should that
* implementation detail change, this could get a lot more complicated.
*/
WITH
o AS (
SELECT
row_to_json(old_table)::jsonb AS old_row,
row_number() OVER () AS ord
FROM old_table
),
n AS (
SELECT
row_to_json(new_table)::jsonb AS new_row,
row_number() OVER () AS ord
FROM new_table
)
INSERT INTO the_log (
action, table_schema, table_name, old_row, new_row
)
SELECT
TG_OP, TG_TABLE_SCHEMA, TG_RELNAME, old_row, new_row
FROM
o
JOIN
n
USING(ord);
END IF;
RETURN NULL;
END;
$$;
This may look a little intimidating, but it's just a way to make sure we've got corresponding rows without mandating primary keys that are never updated.
Next, triggers that call the function. Because we need the non-empty combinations of old_table and new_table, we'll need three of them:
CREATE TRIGGER log_insert_public_foo
AFTER INSERT ON public.foo
REFERENCING NEW TABLE AS new_table
FOR EACH STATEMENT
EXECUTE PROCEDURE public.log();
CREATE TRIGGER log_update_public_foo
AFTER UPDATE ON public.foo
REFERENCING OLD TABLE AS old_table NEW TABLE AS new_table
FOR EACH STATEMENT
EXECUTE PROCEDURE public.log();
CREATE TRIGGER log_delete_public_foo
AFTER DELETE ON public.foo
REFERENCING OLD TABLE AS old_table
FOR EACH STATEMENT
EXECUTE PROCEDURE public.log();
Now, we can INSERT (or COPY), UPDATE, and DELETE from the foo table and know we're capturing the change stream.
One little problem, addressed in our third and final episode of this series, is that there was an awful lot of boilerplate code. We'll DRY it up.
First, a thing to log:
CREATE TABLE foo(
id BIGSERIAL PRIMARY KEY,
t TEXT
);
Next, let's make an efficient trigger function. You may have seen one of these before, but it was probably row-based, so it couldn't be super efficient. This one is statement-based, and uses the new transition table feature in PostgreSQL 10. We only need to build one trigger function because our logging infrastructure is schema-agnostic. We could have separated it into three trigger functions, but that's just more code to maintain. It starts off as usual:
CREATE OR REPLACE FUNCTION log()
RETURNS TRIGGER
LANGUAGE plpgsql
AS $$
BEGIN
IF TG_OP NOT IN ('INSERT', 'UPDATE', 'DELETE') THEN
RAISE EXCEPTION 'This function should not fire on %', TG_OP;
END IF;
IF TG_OP = 'INSERT' THEN
INSERT INTO the_log (
action, table_schema, table_name, new_row
)
SELECT
TG_OP, TG_TABLE_SCHEMA, TG_RELNAME, row_to_json(new_table)::jsonb
FROM
new_table;
While the rest of it should be familiar, new_table is new. It's a transition table, which contains the rows as they looked after the action, which is an INSERT. We've picked an arbitrary name for it, and we'll see how to do that below. One operation done, two to go. This next looks pretty similar:
ELSIF TG_OP = 'DELETE' THEN
INSERT INTO the_log (
action, table_schema, table_name, old_row
)
SELECT
TG_OP, TG_TABLE_SCHEMA, TG_RELNAME, row_to_json(old_table)::jsonb
FROM
old_table;
The other transition table, old_table, comes into play here. For UPDATE, we'll use both:
ELSE
/*
* DANGER, WILL ROBINSON! DANGER!
* This implementation assumes based on current implementation details
* that old_table and new_table have identical orderings. Should that
* implementation detail change, this could get a lot more complicated.
*/
WITH
o AS (
SELECT
row_to_json(old_table)::jsonb AS old_row,
row_number() OVER () AS ord
FROM old_table
),
n AS (
SELECT
row_to_json(new_table)::jsonb AS new_row,
row_number() OVER () AS ord
FROM new_table
)
INSERT INTO the_log (
action, table_schema, table_name, old_row, new_row
)
SELECT
TG_OP, TG_TABLE_SCHEMA, TG_RELNAME, old_row, new_row
FROM
o
JOIN
n
USING(ord);
END IF;
RETURN NULL;
END;
$$;
This may look a little intimidating, but it's just a way to make sure we've got corresponding rows without mandating primary keys that are never updated.
Next, triggers that call the function. Because we need the non-empty combinations of old_table and new_table, we'll need three of them:
CREATE TRIGGER log_insert_public_foo
AFTER INSERT ON public.foo
REFERENCING NEW TABLE AS new_table
FOR EACH STATEMENT
EXECUTE PROCEDURE public.log();
CREATE TRIGGER log_update_public_foo
AFTER UPDATE ON public.foo
REFERENCING OLD TABLE AS old_table NEW TABLE AS new_table
FOR EACH STATEMENT
EXECUTE PROCEDURE public.log();
CREATE TRIGGER log_delete_public_foo
AFTER DELETE ON public.foo
REFERENCING OLD TABLE AS old_table
FOR EACH STATEMENT
EXECUTE PROCEDURE public.log();
Now, we can INSERT (or COPY), UPDATE, and DELETE from the foo table and know we're capturing the change stream.
One little problem, addressed in our third and final episode of this series, is that there was an awful lot of boilerplate code. We'll DRY it up.
As far as you know, this feature is still in for PG10? Last I saw there was a little bit of an argument going on with it regarding bugs which have come up, and patches not being reviewed in a timely manner. I am really looking forward to this feature going in, so I have my fingers crossed that those issues can be solved.
ReplyDeleteIt's going in as far as I can tell. At this point, something catastrophic would have to occur to remove it for 10.
DeleteSet oriented triggers at last. Wish I could use RETURNING * INTO (new|old)_table in regular SQL statements without involving triggers. This would be a game changer in functions. Today you can do it inside the same SQL statement (using WITH), but not outside it, within the same function scope and transaction.
ReplyDeleteYou can do NEW (essentially) with RETURNING. RETURNING OLD is now a relatively small matter of tuits (the round kind) and could easily be in 11.
DeleteThe transition table seems to be like OLD and NEW tables in Sybase/SQLServer, which contains the before and after image of the rows updated by the current SQL and are available for application.
ReplyDeleteI used it 12 yrs ago in Sybase and always found it cool. I don't think Oracle or DB2 offers something similar.
This comment has been removed by the author.
ReplyDeleteNice post thanks! Useful feauture.
ReplyDeleteVery useful! Currently we have a Cache invalidation solution with Row-Triggers (and pg_notify).
ReplyDeleteWe hope/think this new feature will improve the performance.
I hope a lot of people create nice things with it. It's been a long time coming.
DeleteNice post! That's a handy trick of using ROW_NUMBER to correlate the old and new transition relations on UPDATE. I wonder if it is safe to use in production, or could it ever give incorrect results under existing implementations? This has prompted a discussion on introducing some feature to properly enable correlating the old and new versions: https://www.postgresql.org/message-id/CAH8WVsjQ104O3-CNSeFo48ku-qWh0vFCzcvr%2BK93bax30voJFA%40mail.gmail.com.
ReplyDelete