create a column with array type in postgres

PostgreSQL Array Column Types

Every one of us remembers the moment when you first learned about databases and you were told that if you want to store multiple values in one column you have to split that in a 1:N relationship, because each column can only hold one value. Well, that’s not true in the world of PostgreSQL. You are free to create array columns with arbitrary length. You can even perform array-like queries on them.

But first we need to start by creating a table. We’ll do this with the most obvious example — file sharing among multiple user groups. in googledrive, dropbox… a file can be shared between multiple user groups, so that all the user in that particular user group can be able to see the file.

Let’s first create a new database so that we can play around and drop it at the end of out session, keeping our machine clean.

$ psql -U darth postgres
postgres=# CREATE DATABASE test TEMPLATE template0;
CREATE DATABASE
postgres=# \c test;

Defining array column types

The way we tell PostgreSQL that a column type is an array is by simply appending [] at the end of the type, now lets create posts  table with usergroups column as an array.

CREATE TABLE posts (
  id SERIAL PRIMARY KEY,
  filename VARCHAR(255) NOT NULL,
  username VARCHAR(255) NOT NULL,
  usergroups VARCHAR(255)[] DEFAULT '{}'
);
test=# \d posts
                                  Table "public.posts"
 Column    |           Type           |                     Modifiers
-----------+--------------------------+----------------------------------------------------
 id        | integer                  | not null default nextval('posts_id_seq'::regclass)
 filename  | character varying(255)   | not null
 username  | character varying(255)   | not null
 usergroups| character varying(255)[] | default '{}'::character varying[]
Indexes:
    "posts_pkey" PRIMARY KEY, btree (id)

In our example code we’re also defining a default value, in this case an empty array, which is defined as '{}' (the single quotes are important here). If we wanted a different default value, we can simply put it in the curly brackets '{thing}'. Note that these arrays columns can also be indexed and thus lighting fast:

CREATE INDEX idx_test on things USING GIN(usergroups);

Now that we know the syntax, we can actually start inserting some data.

test=# INSERT INTO posts (filename, username, usergroups) VALUES ('sample post.doc', 'vicky, '{dev,eng}');
INSERT 0 1

test=# INSERT INTO posts (filename, username, usergroups) VALUES ('another post.pdf', 'jhon', '{dev,qa}');
INSERT 0 1

test=# SELECT * FROM posts;
 id |    filename      |  username | usergroups
----+------------------+-----------+---------------
  1 | sample post.doc  | vicky     | {dev,eng}
  2 | another post.pdf | jhon      | {dev,qa}
(2 rows)

The advantage of doing this over just serializing the usergroups as a string (for example “dev,eng”) is that PostgreSQL knows it’s an array and it can perform operations on it. We can easily select posts with a single usergroup.

test=# SELECT title FROM posts WHERE 'dev' = ANY(usergroups);
    title
------------------
 sample post.doc
 another post.pdf
(2 rows)

We can also update a specific element of an array (notice that the indexing starts at 1 by default).

test=# UPDATE posts SET usergroups[2] = 'hr' WHERE id = 1;
UPDATE 1

test=# SELECT * FROM posts;
id |    filename      |  username | usergroups
----+------------------+-----------+---------------
  1 | sample post.doc  | vicky     | {dev,hr}
  2 | another post.pdf | jhon      | {dev,qa}
(2 rows)

There’s a lot operations that are supported right of the box, you can see the full list here.

get the count of each usergroup:

with the help of unnest() method,  we can n order to get the count of files that are present in each group, first we need to know the individual group names present in each row, we can obtain this with the help of unnest() method which expands an array into set of rows. The following query results us in the group name vs count of files.

test=# select unnest(usergroups) as ug, count(unnest(usergroups)) from posts
 group by ug;
 id |   ug   |  count
----+--------+--------
  1 | dev    |   2   
  2 | hr     |   1
  3 | qa     |   1 
(3 rows)

records that match multiple groups:

test=# SELECT * from posts where Array[‘qa’,’hr’]:: varchar[] && usergroups;
id |    filename       |  username | usergroups
----+------------------+-----------+---------------
  1 | sample post.doc  | vicky     | {dev,hr}
  2 | another post.pdf | jhon      | {dev,qa}
(2 rows)
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s