Saturday, January 26, 2008

» Aggregate join results with MySQL using GROUP_CONCAT

An ever recurring task with relational databases is to select data from a table and a list of values coming from another table (an 1-n navigation on an n-m relationship). Say you have a table person and a table role, defined as follows:
CREATE TABLE person (
  id INTEGER NOT NULL PRIMARY KEY,
  name VARCHAR(255) NOT NULL
);

CREATE TABLE role (
  id INTEGER NOT NULL PRIMARY KEY,
  role VARCHAR(255) NOT NULL
);
For the sake of simplicity, I'll omit real-world fields in the person table, as well as UNIQUE constraints. There, you'd typically want to have a relationship table to bind person rows to role rows, as you can't directly model n-m relationships. Let's name it person_role, aggregating 1-n relationships to person and role:
CREATE TABLE person_role (
  person_id INTEGER NOT NULL,
  role_id INTEGER NOT NULL,
  PRIMARY KEY (person_id, role_id),
  FOREIGN KEY (person_id) REFERENCES person.id,
  FOREIGN KEY (role_id) REFERENCES role.id
);
Let's insert some data for the same of the example:
INSERT INTO role VALUES(1, 'admin');
INSERT INTO role VALUES(2, 'developer');
INSERT INTO role VALUES(3, 'user');
INSERT INTO person VALUES(1, 'John Doe');
INSERT INTO person VALUES(2, 'Scott Tiger');
INSERT INTO person_role VALUES(1, 1);
INSERT INTO person_role VALUES(1, 2);
INSERT INTO person_role VALUES(1, 3);
The content of the tables is now:
mysql> select * from person;
+----+-------------+
| id | name        |
+----+-------------+
|  1 | John Doe    | 
|  2 | Scott Tiger |
+----+-------------+
mysql> select * from person_role;
+-----------+---------+
| person_id | role_id |
+-----------+---------+
|         1 |       1 | 
|         1 |       2 | 
|         1 |       3 | 
+-----------+---------+
mysql> select * from role;
+----+-----------+
| id | role      |
+----+-----------+
|  1 | admin     | 
|  2 | developer | 
|  3 | user      | 
+----+-----------+
What we want to do now is to retrieve every person with its associated roles. You basically have two options: 1) Only retrieve person rows and complete them with data from role one by one (needs to be done in the invoking programming language, cannot be done with SQL), here using some sort of python-ish pseudocode:
persons = SELECT id, name FROM person ORDER BY id;
for p in persons:
   p.roles = [ SELECT role FROM person_role
                 INNER JOIN role ON person_role.role_id=role.id
                 WHERE person_id=p.id ]
2) Select everything at once, and reduce duplicates in your invoking code:
SELECT person.id AS id, name, role FROM person
  LEFT OUTER JOIN person_role ON person.id=person_role.person_id
  LEFT OUTER JOIN role ON person_role.role_id=role.id
  ORDER BY person.id; 
That'll give you lots of duplicates and you'll have to sort it out yourself in your Java/PHP/Python/Ruby/C++/C/C# code. The result from the query above would look something like this:
+----+-------------+-----------+
| id | name        | role      |
+----+-------------+-----------+
|  1 | John Doe    | admin     | 
|  1 | John Doe    | developer | 
|  1 | John Doe    | user      | 
|  2 | Scott Tiger | NULL      | 
+----+-------------+-----------+
Note that you need to use LEFT OUTER JOIN to also get results from person rows that don't have any roles assigned through person_role. Neither of these solutions are all that satisfactory. Well, with MySQL's GROUP_CONCAT function, there's a third option:
SELECT person.id AS id, name, GROUP_CONCAT(role.role SEPARATOR ',') AS roles
  FROM person
  LEFT OUTER JOIN person_role ON person.id=person_role.person_id
  LEFT OUTER JOIN role ON person_role.role_id=role.id
  GROUP BY id
  ORDER BY id;
The result of the query above is:
+----+-------------+----------------------+
| id | name        | roles                |
+----+-------------+----------------------+
|  1 | John Doe    | admin,developer,user | 
|  2 | Scott Tiger | NULL                 | 
+----+-------------+----------------------+
Neat ;) When you use EXPLAIN to compare the query optimizer strategy for option 2 and 3, it does exactly the same operations. It's just that you can reduce it into a single statement, and avoid having to reduce the result rows into single objects (you just need to split the roles result column using the separator you specified in the query). Here's what EXPLAIN on solution 2 and 3 gives us:
+----+-------------+-------------+--------+---------------+---------+---------+--------------------------+------+----------------+
| id | select_type | table       | type   | possible_keys | key     | key_len | ref                      | rows | Extra          |
+----+-------------+-------------+--------+---------------+---------+---------+--------------------------+------+----------------+
|  1 | SIMPLE      | person      | ALL    | NULL          | NULL    | NULL    | NULL                     |    2 | Using filesort | 
|  1 | SIMPLE      | person_role | ref    | PRIMARY       | PRIMARY | 4       | play.person.id           |    1 | Using index    | 
|  1 | SIMPLE      | role        | eq_ref | PRIMARY       | PRIMARY | 4       | play.person_role.role_id |    1 |                | 
+----+-------------+-------------+--------+---------------+---------+---------+--------------------------+------+----------------+

Labels:

1 Comments:

Blogger Caleb said...

YAHOO! Thank you thank you thank you! I was getting pretty frustrated with this one. I got so close to your solution, I had almost everything, but it kept adding it all to one record. After hours of googling, I tried googling

mysql join with group_Concat

And you showed up. 15 seconds later I had added GROUP BY id and it was working fine.

Again, thank you so much for this post.

--
techdude

22:52  

Post a Comment

<< Home