Organizing Storage in Multiple Fog Containers Using CarrierWave
CarrierWave is one of the most popular Ruby-on-Rails solutions for file upload and storage. Most of us used it more than once. In this blog post, I want to dig into how flexible the solution really is. First, we will look a little deeper into how it works on lower levels (feel free to skip that part if you know stuff) and then watch how you can take advantage of its flexibility on a couple of simple scenarios.
How it works
First of all, CarrierWave has a number of modules extending your object-relational mapping (ORM) classes. The gem itself includes the extension only for ActiveRecord
, but everything else is readily available. All it does is creating a class method—mount_uploader
—that receives any string parameter of an ORM model.
# == Schema Information #< # Table name: items # # id :integer not null, primary key # file :string(255) class Item < ActiveRecord::Base mount_uploader :file end
Every time you instantiate an object of the Item
class, your ORM downloads the data from your database and instantiates the object as usual. Now, there is an object of the Uploader::Base
class, mounted where just a string parameter should have been. Methods such as file
and file=
don’t access data in the @file variable
directly, as there is a middle man now.
The file=
setter method of a new object now proceeds through the following steps:
- Accepts an object of any class extending
File
, including various types of streams, such asTempfile
andActionDispatch::Http::UploadedFile
. This is the object you receive from HTTP multipart file uploads. - Caches the received file in a temporary directory locally.
- Assigns the file’s name to our
@file variable
. Unlike some otherupload and store
gems, CarrierWave doesn’t save a full file path to a database, rather its original name only. The responsibility of building the full path is delegated to theUploader
object. - Waits for an object to be persisted.
- Optionally conducts file processing, resulting in one or more new files.
- Copies a file or processed files from a temporary storage to a persistant storage. Again, exact location and specifics of a storage is determined by the
Uploader
object.
When you update the existing object, the process is generally the same with the addition of Uploader
that downloads a previously uploaded version, caches it, and restores if a persisting object vailed.
So, how exactly does Uploader
choose where and how to store the uploaded file and how to retrieve it? Carrierwave::Uploader::Base
has a large number of defined default methods. Some of them are grouped into the Strategy
modules, specifying each aspect of the process (names for cache/storage directories, specifics of processing, etc.). These methods also have access to model objects Uploader
is mounted on, which means we can tweak handling of each uploaded file.
Of course, there is the Carrierwave.configure
method accepting a block of configurations. However, it merely provides default results for these uploader instance methods.
CarrierWave has two main storage strategies: :file
for a local storage and :fog
for a remote storage. A user can add other strategies as long as they support storing and retrieving, of course, but the above-mentioned strategies already cover most options. This means, the fog
storage strategy is itself a delegation to the fog/fog gem, which provides a common API for multiple cloud storage solutions. To initialize the CarrierWave fog storage, you should provide a service name, a container name, and credentials.
How to use CarrierWave to take advantage
All code snippets given here are excerpts of a more complex working application. This means, some stuff could be left out, and some become more complex while I was trying to simplify the code.
1. Specifying upload directories
Both default storage options rely on a number of uploader methods to detemine how to handle uploads. The main option is the store_dir
method that by default looks like this.
def store_dir "uploads/#{model.class.to_s.underscore}/#{mounted_as}/#{model.id}" end
Uploaded files will be stored locally in a number of nested folders in a public directory of the application, with each file having its own folder.
- Generally, we try to normalize a database in such a way that a single class has a single file field maximum, so that the
mounted_at
parameter could be left out. - On your development machine, you will probably run an application on different environments (at least test). So, consider adding
Rails.env
tostore_dir
. - Relying on a class name of the model can render data inconsistently if you refactor the model.
File
will still be there, butUploader
will be unable to retrieve it, since it will look in a wrong place. I prefer to implicitly set a folder name in eachUploader
.
Anyway, now, you can see that by overwriting the store_dir
method in your uploaders, you can store your uploads in any way you like. For example, you can group files by their creator’s identity rather than by their type.
class GeneralApplicationUploader < Carrierwave::Uploader::Base def store_dir folder = respond_to?(:folder_name) ? folder_name : model.class.to_s.underscore base_upload_dir << "#{folder}/#{model.id}" end private def base_upload_dir "uploads/#{Rails.env}/" <span class="k">end</span> end class ItemUploader < GeneralApplicationUploader def folder_name 'items' end end
There is one thing you should always remember, though: once a file was uploaded, any change to how store_dir
resolves will prevent Uploader
from finding that file.
2. Differences in handling local and cloud storages
Now, if you expect a lot of upload/download activity, and you have an option of using a remote storage in production, you should definitely do that. Using a remote storage for development or test environments, on the other hand, can be troublesome (that is, too expensive).
Ideally, strategies for handling the file
and fog
storages should behave in the exact same way. For most cases, they do. It is up to you to decide on whether you should develop an application using a cloud storage all the time or cut your expenses and develop using a local storage, being prepared to deal with a few differences.
If you, for example, have a number of text documents stored, and you want to show documents’ text on a page, there will be a difference. It will be there, because the fog
storage mostly handles file URLs and passes them to a client browser not a server.
There is a method you can employ to check if you are using a local or a remote storage.
class TextUploader < GeneralApplicationUploader def store_local? _storage == CarrierWave::Storage::File end # Then the text file content can be accessed as: def body store_local? ? File.read(path) : open(url).<read> end end class Item < ActiveRecord::Base mount_uploader :file, TextUploader end
This way, Item.first.file.body
will return the same text regardless of whether a file is stored remotely or locally.
3. Safe file names
By default, CarrierWave already sanitizes the name of a file it receives, keeping only English letters and numbers. There is also a configuration option that helps you to keep all unicode characters. It also helps you to avoid file path injection vulnerabilities. However, storing a file with its name unchanged still has some disadvantages. For example, one can upload a file with a name so long that saving it causes an exception on the file system level. To avoid this, we rename all files that are saved to the system, encoding old file names.
def full_filename(for_file) original_name = for_file || model.read_attribute(:mounted_as) [Digest::MD5.hexdigest(original_name), File.extname(original_name)].join end
Above the full_filename
method is used both on storing a file (where for_file
is a file name of an incoming upload) and on retrieving a file (when it is nil). The good part is that the name is stored in a database in its original state, while on a disk, it is properly encoded.
Now, you can provide this file for download, with it retaining the original file name, using this link.
= link_to 'Download', item.file.url, download: item[:file], target: '_blank'
Cached files are also saved to the disk, so, we will probably have to encode their names too.
def cache_name if cache_id && original_filename name = Digest::MD5.hexdigest(full_original_filename + cache_id) extension = File.extname(full_original_filename) [name, extension].join end end
4. Switching remote storage containers of a file
In some cloud file storages, for example on Rackspace, a file can be stored in two types of containers.
- Public. A file is readily available for download via HTTP, sometimes even with the container delivery network (CDN) support. Content is delivered quickly, but you can’t even dream of many security features, such as hotlinking protection.
- Private. A file is available only via a SSL-secured temporary link. However, a file is difficult to download other than with your application, as CDN is unavailable.
Let’s imagine that we need to store some type of files, but it is a user who decides whether it should be publicly available or hidden. Naturally, we will want the interface to be as simple as possible.
# == Schema Information # # Table name: items # # id :integer not null, primary key # file :string(255) # hidden :boolean not null, default(FALSE) class Item < ActiveRecord::Base mount_uploader :file, SwitchingStoragesUploader end
Uploader::Base
has two methods—fog_public
and fog_directory
—that decide where to store uploads.
class SwitchingStoragesUploader < GeneralApplicationUploader def fog_public !(model.respond_to?(:hidden) && (model.hidden_changed? ? model.hidden_was : model. hidden)) end def fog_directory fog_public ? 'public_container_name' : 'private_container_name' end end
This way, when you create a new object of the class Uploader
is mounted on, CarrierWave chooses a storage depending on the persisted value of the hidden
field. If you change that parameter on the existing object, though, without touching the file field, an uploaded file won’t move, and the reference to it will be lost.
class SwitchingStoragesUploader < GeneralApplicationUploader def initialize(*) if model.respond_to? :hidden model.define_method(:hidden=) do |new_value| send "#{ mounted_as }_will_change!" super(new_value) end end super end end
That redefinition method, though hacky, will make Uploader
work as expected: when you change a value of the hidden
parameter, Uploader
downloads a file from an old container, deletes the file from the storage, then uploads a cached file to a new container after the object is persisted.
Do we really want to burden our application server with upload/download routines all the time? I think, we don’t. As I mentioned before, all interactions with cloud storages are conducted through the fog gem that provides a common API for these storage services. This API is, in fact, wider than it is used for CarrierWave functionality.
class SwitchingStoragesUploader < GeneralApplicationUploader def initialize(*) if model.respond_to? :hidden if model.persisted? && model.hidden_changed? def model.before_save(model) model.send(mounted_as).copy_with_fog end end end super end def copy_with_fog unless store_local? || mounted_column_changed? begin source_container = fog_directoy target_container = fog_public? 'private_container_name' : 'public_container_name' fog_api = Fog::Storage.new(fog_credentials) fog_api.copy_object(source_container, store_dir, target_container, store_dir>) fog_api.delete_object(source_container, store_dir) rescue Fog::Errors::Error model.errors.add(mounted_as, 'Error occured while migrating file in storage!')) false end end end def mounted_column_changed? model.public_send(mounted_as).cached?.present? end end
The mounted_column_changed?
method skips the process of copying if a new file is provided. In that case, a new file will be stored to a new container anyway. That’s also why we check if the model was persisted before.
As you can see, all we do is initialize an object of Fog::Storage
, use it to copy a file from one container to another, then delete the file in an old container. If this, somehow, fails, we add an error to the model and cancel Save, providing a level of strong exception safety to the process.
Conclusion
The approach is simple: control as much of the process as possible by application entities, not by storing full paths to a database. For this purpose, nearly every aspect of that process is arranged as a method defined on the Uploader
object, giving a developer a good level of flexibility. CarrierWave is a nice tool to help you store your files, but, like any tool, it requires some knowledge to handle it well.