Implementing and Improving PostgreSQL Full-Text Search in Ruby Apps
PostgreSQL is a great relational database with a lot of features. This blog post describes how to implement PostgreSQL full-text search and add it to your Ruby-on-Rails app. I will demonstrate how it works on a test app that searches articles by a title, content, the author’s name, and comments. In addition, the article explains how to speed up search by eliminating excessive join queries that greatly slow down the system’s performance.
Preparation
There already exist ready-made solutions for implementing PostgreSQL full-text search. We will be using this one. So, you need to add the following line to your Gemfile.
gem 'pg_search'
Then, add the following line to your application.rb
.
config.active_record.schema_format = :sql
A sample app
My sample app will have three models.
class Article < ActiveRecord::Base
belongs_to :author
has_many :comments
attr_accessible :content, :title
endclass Author < ActiveRecord::Base
attr_accessible :name
has_many :articles
endclass Comment < ActiveRecord::Base
belongs_to :article
attr_accessible :content
end
With these models, search will be implemented for articles. It will become possible to search an article by its title, content, comments, and the author’s name. In addition, search will have other settings about which you can read at gem’s page. So, to enable search for articles you should add the followong to the article
model.
include PgSearch
pg_search_scope :search,
against: [:title, :content],
associated_against: {
author: :name,
comments: :content
},
using: {
tsearch: {
dictionary: 'english',
any_word: true,
prefix: true
}
}
Then, you can perform search using the Article.search
method.
irb(main):011:0> Article.search('title')
=> [
Hooray, it found an article by the word title! However, look at the following query.
Article Load (1.9ms) SELECT "articles".*, ((ts_rank((to_tsvector('simple', coalesce("articles"."title"::text, '')) || to_tsvector('simple', coalesce("articles"."content"::text, '')) || to_tsvector('simple', coalesce(pg_search_1eb533ea18bbbe0846ef24.pg_search_a612c20e7f822205b5b540::text, '')) || to_tsvector('simple', coalesce(pg_search_121ea89914a721445aee70.pg_search_6e76a7a40d9cb3861e7fb2::text, ''))), (to_tsquery('simple', ''' ' || 'title' || ' ''')), 0))) AS pg_search_rank FROM "articles" LEFT OUTER JOIN (SELECT "articles"."id" AS id, string_agg("authors"."name"::text, ' ') AS pg_search_a612c20e7f822205b5b540 FROM "articles" INNER JOIN "authors" ON "authors"."id" = "articles"."author_id" GROUP BY "articles"."id") pg_search_1eb533ea18bbbe0846ef24 ON pg_search_1eb533ea18bbbe0846ef24.id = "articles"."id" LEFT OUTER JOIN (SELECT "articles"."id" AS id, string_agg("comments"."content"::text, ' ') AS pg_search_6e76a7a40d9cb3861e7fb2 FROM "articles" INNER JOIN "comments" ON "comments"."article_id" = "articles"."id" GROUP BY "articles"."id") pg_search_121ea89914a721445aee70 ON pg_search_121ea89914a721445aee70.id = "articles"."id" WHERE (((to_tsvector('simple', coalesce("articles"."title"::text, '')) || to_tsvector('simple', coalesce("articles"."content"::text, '')) || to_tsvector('simple', coalesce(pg_search_1eb533ea18bbbe0846ef24.pg_search_a612c20e7f822205b5b540::text, '')) || to_tsvector('simple', coalesce(pg_search_121ea89914a721445aee70.pg_search_6e76a7a40d9cb3861e7fb2::text, ''))) @@ (to_tsquery('simple', ''' ' || 'title' || ' ''')))) ORDER BY pg_search_rank DESC, "articles"."id" ASC
Above, you can see that the query has several joins, which is not good, because it greatly slows performance. I will show you how to avoid this behavior and improve search performance.
Improving performance
I am going to add a new column to the tables with articles. The new column will have the
tsvector type and store all the words by which an article can be searched. I will also add the GIN index to speed up search. So, we need to generate migration.
class AddTsvectorColumnToArticles < ActiveRecord::Migration
def up
add_column :articles, :search_vector, :tsvector
execute <<-EOS
CREATE INDEX articles_search_vector_idx ON articles USING gin(search_vector);
EOS
end
def down
remove_column :articles, :search_vector
end
end
I’ve added the tsvector
column to the articles named search_vector
. Now, we need to fill up this vector. For the purpose, I am going to write the postgresql
function for filling up the vector and a trigger that will fill up this vector when inserting or updating an article. Below is an example of migration for creating a trigger and a function.
class CreateFunctionAndTriggerForFillingSearchVectorOfArticles < ActiveRecord::Migration
def up
execute <<-EOS
CREATE OR REPLACE FUNCTION fill_search_vector_for_acticle() RETURNS trigger LANGUAGE plpgsql AS $$
declare
article_author record;
article_comments record;
begin
select name into article_author from authors where id = new.author_id;
select string_agg(content, ' ') as content into article_comments from comments where article_id = new.id;
new.search_vector :=
setweight(to_tsvector('pg_catalog.english', coalesce(new.title, '')), 'A') ||
setweight(to_tsvector('pg_catalog.english', coalesce(new.content, '')), 'B') ||
setweight(to_tsvector('pg_catalog.english', coalesce(article_author.name, '')), 'B') ||
setweight(to_tsvector('pg_catalog.english', coalesce(article_comments.content, '')), 'B');
return new;
end
$$;
EOS
execute <<-EOS
CREATE TRIGGER articles_search_content_trigger BEFORE INSERT OR UPDATE
ON articles FOR EACH ROW EXECUTE PROCEDURE fill_search_vector_for_acticle();
EOS
Article.find_each(&:touch)
end
def down
execute <<-EOS
DROP FUNCTION fill_search_vector_for_acticle();
DROP TRIGGER articles_search_content_trigger ON articles;
EOS
end
end
Each time, when an article is created or updated, a new vector will be built for the article. Look at the following example.
select name into article_author from authors where id = new.author_id;
select string_agg(content, ' ') as content into article_comments from comments where article_id = new.id;
I am fetching a name from an author and concatenated content of an article. Then, I create a new vector.
new.search_vector :=
setweight(to_tsvector('pg_catalog.english', coalesce(new.title, '')), 'A') ||
setweight(to_tsvector('pg_catalog.english', coalesce(new.content, '')), 'B') ||
setweight(to_tsvector('pg_catalog.english', coalesce(article_author.name, '')), 'B') ||
setweight(to_tsvector('pg_catalog.english', coalesce(article_comments.content, '')), 'B');
More details about syntax of each function can be found in this documentation.
We also need to update an article each time when a comment is updated. For this purpose, we will add touch true
for association at the comments
model.
belongs_to :article, touch: true
Now, every time when comments are updated, an article will be updated true
. This will call a trigger that will update the articles’ search_vector
column.
The last step is to make search use the new search_vector
column. For the purpose, we need to change pg_search_scope
to the article
model.
pg_search_scope :search,
against: :search_vector,
using: {
tsearch: {
dictionary: 'english',
any_word: true,
prefix: true,
tsvector_column: 'search_vector'
}
}
Results
Now, our articles have the tsvector
column that stores searches by text.
irb(main):051:0> Article.last.search_vector
=> "'astraukh':6B 'comment':9B 'content':4B,10B 'pavel':5B 'titl':1A"
irb(main):052:0> Article.search('title')
=> [
A query looks like this.
Article Load (0.4ms) SELECT "articles".*, ((ts_rank(("articles"."search_vector"), (to_tsquery('english', ''' ' || 'title' || ' ''' || ':*')), 0))) AS pg_search_rank FROM "articles" WHERE ((("articles"."search_vector") @@ (to_tsquery('english', ''' ' || 'title' || ' ''' || ':*')))) ORDER BY pg_search_rank DESC, "articles"."id" ASC
As you see, the query doesn’t have any joins, and this behavior greatly improves search performance. Now, you know how to implement and improve your search using PostgreSQL full-text search.
Further reading
- Deploying a Rails 5 App with MongoDB, Redis, and CarrierWave to IBM Bluemix
- Accessing an External Database Storage from a Bluemix-Based Ruby App
- Using the PostgreSQL Service from an ASP.NET App on GE’s Predix
About the author
Pavel Astraukh developed a number of projects using Ruby on Rails and other technologies. He worked in small and large teams utilizing the Agile methodology. Find him on GitHub.