Hello everyone, My name is mark.
I am currently in the Fulfillment team,
but I was in the search team for a while.
I thought it would benefitial that I give
a presentation on elasticsearch integration,
before my memory fades away.
For those of you who doesn't know elasticsearch
it offers full text search
I believe currently it is accessed using the top right search field only.
There is no demo today, because all the change I am discussing are backstage changes,
and would not affect the user.
So today the session will be split into two parts.
The first part will be to briefly introduce the elasticsearch rails architecture
The second part will focus on how we used the library in a slightly different way, and the reason behind it.
Since the word "elasticsearch" is so long,
in order to fit it in the slides,
often I'll abbreviate it.
so part one
elasticsearch-rails is a ruby gem,
maintained by elastic the company
the gem is consisted of three parts
elastic persistence allows rails to save data on elasticsearch server instead of sql databases.
We don't use this.
elastic rails provides some useful utilities for rails. We use part of it for instrumentation.
The most important part will be es model.
This enables active model to talk to elasticsearch server.
We will only be covering this part today.
es model's core is to provide a proxy to bridge rails model and the server.
All the search related logic resides inside the proxy.
All search commands, go through the proxy
There are two different proxies,
one for the class level, and one for the instance level.
Instance level proxy is closely coupled with a single active record. For example it generate the data from the record for storage.
Class level proxy handles higher level commands, such as search or import.
On the official readme, you wil see
the most simple setup is to include the model module like this.
By doing so, we gain all the search functionalities.
the __es__ methods will become available on both the class level and instance level.
The class level _es_ method, would return ClassMethodProxy
And instance level es method would return instanceMethodProxy
The proxy both points back to the source using the `target` method.
We call all search related functionalities through thess proxies
Here I'll use indexing as a exmaple to show how the gem works.
When an issue is created, what will happen behind the scenes?
Previously we included the `callback` module,
which will setup 3 after_commit callbacks
for create/udpate and delete
Since we are creating a record
the first callback is triggered. It calls the proxy's index_document method
The index_docoument will first prepare the data for indexing.
Here at point 1 we see it calls as indexed json method.
By default this method would serialize all the attributes.
Then a client is used to send this data to the server.
at this point, everything is done.
But someitmes we don't necessarily want every attribute
to be searchable.
We might only want to search title.
Here we can override the default by defining our own
as_indexed_json in our model.
Just return a hash of things we need indexed.
The method name __elasticsearch__ is very long.
Typing it all the time is tiresome.
es rails also provides convenience methods to bypass this.
A few of the the methods are delegated to the proxy objects.
So now we can just type book.search directly.
proxies are consisted of many modules. This means we can cherrypick only the things we need
Proxy itself uses a client to send http requests to the server.
By default, all proxies share the same client
which is just a Faraday client.
The client will have the information on
where it can find the server, such as the url or the port.
as a summary, the simple setup gives us two methods to access the proxy object,
and from there we can talk to the server.
In GitLab we use Elasticsearch in slightly different way.
Until last year, we had issues reindexing stuffs.
Everytime we reindex, it would take as long as a week to do.
During this time, search results will be incomplete.
yet we require re-indexing since we do change data schema from time to time.
so our goal was to allow zero downtime search
when data schema changes
The development path was decided to allow multiple versions of search code to co-exist at the same time.
and we determine which version to call at run time.
We used to put all snippet related search logic
in the snippet search module, which is included by Snippet model.
This is less flexible if we want to have mutiple versions.
On the other hand, using classes and objects is very suitable for this kind of task
To keep things dry
we also extract the common logic into a super class
we have a switchboard
which would redirect commands to the desired version
Recall earlier, in the simplest setup, `__es__` would take us from the record to the proxy.
Now we can have `__es__` take us to a switch board, which would then forward calls to the correct version class.
at the time I named the switchboard classes like this,
which is kind of wordy now
The name of the switchboards are
Not great names. I didn't have a good name when I wrote these.
So how is the table represented in code?
The `generate_forwarding` class method is responsible
for setup method forwarding at boot time.
let's look at the first step.
for each method in methods_for_all_write_targets
we call `forward_to_all_write_targets`
This would dynamically define a method,
to call each writing targets. (as indicated by the circle 2)
Lastly, we collect all the responses.
We return the unsuccessful one if it exists,
otherwise the call is considered successful
That's for write operations.
For read operations, we only need to forward that to one version which is in sync.
How do we determine which are read methods?
We first just take all the methods
filter out the write methods,
We also filter out the instance methods here.
lastly we filter out method missing.
the remaining methods are considered read methods,
and we call forward_read_method to each of them (see circle 3)
generate_forwarding 的第二步是把只需要一個version的方法設好 forward
Let's take a detour here. and talk about `real_class` method.
Elasticsearch tries to be smart, and overrides `class` method on InstanceProxy to point to the classproxy.
`SnippetInstanceProxy#class` would be `SnippetClassProxy`
This can result in cryptic errors.
To obtain the actual class, I have to re-create one on my own,
and I call it `real_class`
so when we call version method, what do we get?
We first get the namespace,
and then we call proxy_class_name to get the name of the proxy we want.
lastly we fetch the proxy with that name from the namespace
That's all for switches, now we are at the final part,
to connect rails model to switchboard.
Each searchable model includes ApplicationVersionedSearch module
It provides access to switchboard
it provides permission check and various other things
here we can see that it simply initiates the switchboard classes,
one for class level and one for instance level.
This is an overview graph of where each class are:
We have Activemodel on the top left, which
have access to the switchboards,
and the switchboards can determine which version of the proxy to pass
the command on.
So we have explained all three layer of the integration
Now we can have two versions of the search logic.
This means we can have separete client setting,
pointing to different indices
theoretically we can even point to two different clouds.
We can migrate from one cloud provider to the next.
Currently we only have one version.
We hard code `elastic_reading_target` and `elastic_writing_targets` to that version
Ths is because we switched the focus to enabling global search on gitlab.com
* In the past, we included many ES related methods into our model
* Now model is slimmer