classClassMethodsProxyinclude ES::Model::Client::ClassMethods
include ES::Model::Naming::ClassMethods
include ES::Model::Indexing::ClassMethods
include ES::Model::Searching::ClassMethods
include ES::Model::Importing::ClassMethods
classInstanceMethodsProxyinclude ES::Model::Client::InstanceMethods
include ES::Model::Naming::InstanceMethods
include ES::Model::Indexing::InstanceMethods
include ES::Model::Serializing::InstanceMethods
Anatomy of a proxy
Each proxy has access to a client.
defclient
@client ||= Elasticsearch::Model.client
end
just a Faraday client.
knows server url, port, etc.
by default, all proxies share the same client
Part 2:
How GitLab uses elasticsearch-rails?
Problem
ES indexing took a long time (up to many days)
Some schema changes required re-indexing
During reindexing, search results can be incomplete
We used to have Snippet includes SnippetSearch module,
containing search related logic.
However, module does no allow dynamic swapping.
Rails model
Switchboard
ES-rails proxy
The proxy design is flexible in that we can have separate classes.
Instead of using the same class in all kind of searches,
we can subclass these proxies:
SnippetClassProxy < ClassMethodProxy
SnippetInstanceProxy < InstanceMethodProxy
Rails model
Switchboard
ES-rails proxy
And then we can have different versions of Snippet proxies:
V12p1::SnippetClassProxy < SnippetClassProxy
V13p0::SnippetClassProxy < SnippetClassProxy
Rails model
Switchboard
ES-rails proxy
Common logic are extracted as a common super class
V12p1::SnippetClassProxy
is a subclass of
V12p1::ApplicationClassProxy
is a subclass of
ClassMethodsProxy
How do we choose which version to use?
Switchboard
Rails model
Switchboard
ES-rails proxy
Switchboard
Previously:
model ---__es__---> proxies
Now:
model ---__es__---> switchboard ------> proxies
Rails model
Switchboard
ES-rails proxy
Switchboard Classes
MultiVersionClassProxy
MultiVersionInstanceProxy
Rails model
Switchboard
ES-rails proxy
Q:
How do we choose which version to route to?
A:
This is based case by case,
for example, if we have two index v1 and v2:
Rails model
Switchboard
ES-rails proxy
Assuming v1 is in sync, v2 is still indexing:
method
version
searching
v1
indexing
v1 & v2
removing index
manually selected
Rails model
Switchboard
ES-rails proxy
method
version
searching
v1
indexing
v1 & v2
removing index
elastic_reading_target
returns one version, the synced version, e.g.:
defelastic_reading_target
version('V12p1')
end
Rails model
Switchboard
ES-rails proxy
method
version
searching
v1
indexing
v1 & v2
removing index
elastic_writing_targets
returns array of all versions
defelastic_reading_target
[
version('V12p1'),
version('V12p2')
]
end
Rails model
Switchboard
ES-rails proxy
method
version
searching
v1
indexing
v1 & v2
removing index
methods_for_all_write_targets
Array of methods to be forwarded to all versions:
defmethods_for_all_write_targets
[:index_document, :delete_document,
:update_document, :update_document_attributes]
end
Rails model
Switchboard
ES-rails proxy
method
version
searching
v1
indexing
v1 & v2
removing index
methods_for_one_write_target
Array of methods not to be delegated, caller specifies version to call:
defmethods_for_all_write_targets
[:import, :create_index!:delete_index!]
end
Rails model
Switchboard
ES-rails proxy
Switchboard Recap
method(s)
versions to delegate to:
*methods other than below
elastic_reading_target
methods_for_all_write_targets
elastic_writing_targets
methods_for_one_write_target
*user defined
Rails model
Switchboard
ES-rails proxy
Forwarding to multiple write versions
defgenerate_forwarding
methods_for_all_write_targets.each do|method|self.class.forward_to_all_write_targets(method) ①
defforward_to_all_write_targets(method)returnif respond_to?(method)
define_method(method) do|*args|
②
responses = elastic_writing_targets.map do|elastic_target|
elastic_target.public_send(method, *args)
end
responses.find { |response|
response['_shards']['successful'] == 0
} || responses.last ③
end
Rails model
Switchboard
ES-rails proxy
Forwarding to single read version
defgenerate_forwarding# ... continue from earlier
read_methods = elastic_reading_target
.real_class.public_instance_methods ①
read_methods -= methods_for_all_write_targets
read_methods -= methods_for_one_write_target ②
read_methods -= self.class.instance_methods
read_methods.delete(:method_missing)
read_methods.each do|method|self.class.forward_read_method(method) ③
endend
Rails model
Switchboard
ES-rails proxy
class and real_class
Elasticsearch tries to be smart, and overrides class method on InstanceProxy.
SnippetInstanceProxy#class would be SnippetClassProxy
This can result in cryptic errors.
To obtain the actual class, real_class is defined.
defreal_classself.singleton_class.superclass
end
Rails model
Switchboard
ES-rails proxy
How are targets specified
defversion(version)
version = Elastic.const_get(version, false) if version.is_a?(String)
# Now version is Elastic::V12p1
version.const_get(proxy_class_name, false).new(data_target)
# Now we return Elastic::V12p1::IssueInstanceProxyenddefproxy_class_name"#{@data_class.name}InstanceProxy"# @data_class is the model class, e.g. `Issue`end
2.3 Rails Model
Rails model
Switchboard
ES-rails proxy
Elastic::ApplicationVersionedSearch
provides the __elasticsearch__ methods
(returning switchboard)
Now we can have two versions of search code,
they can have their own client, pointing to two different index.
We can even point two versions to two different cloud providers.
Current status: pending
Currently we only have one version.
We hard that verison to elastic_reading_target and elastic_writing_targets.
I still think there are some benefits:
Cleaner model
Testing can be done on proxies
Next step
maybe we can rename the class to be actually "switchboard"?
maybe you would prefer to remove the switchboard?
generate_forwarding should not be done per initialization
Special Thanks
Markus Koller
Marcel van Remmerden
Denys Mishunov
Darva Satcher
Kai Armstrong
James Lopez
Q&A
Hello everyone, My name is mark.
I am currently in the Fulfillment team,
but I was in the search team for a while.
I thought it would benefitial that I give
a presentation on elasticsearch integration,
before my memory fades away.
For those of you who doesn't know elasticsearch
it offers full text search
I believe currently it is accessed using the top right search field only.
There is no demo today, because all the change I am discussing are backstage changes,
and would not affect the user.
So today the session will be split into two parts.
The first part will be to briefly introduce the elasticsearch rails architecture
The second part will focus on how we used the library in a slightly different way, and the reason behind it.
Since the word "elasticsearch" is so long,
in order to fit it in the slides,
often I'll abbreviate it.
so part one
elasticsearch-rails is a ruby gem,
maintained by elastic the company
the gem is consisted of three parts
elastic persistence allows rails to save data on elasticsearch server instead of sql databases.
We don't use this.
elastic rails provides some useful utilities for rails. We use part of it for instrumentation.
The most important part will be es model.
This enables active model to talk to elasticsearch server.
We will only be covering this part today.
es model's core is to provide a proxy to bridge rails model and the server.
All the search related logic resides inside the proxy.
All search commands, go through the proxy
There are two different proxies,
one for the class level, and one for the instance level.
Instance level proxy is closely coupled with a single active record. For example it generate the data from the record for storage.
Class level proxy handles higher level commands, such as search or import.
On the official readme, you wil see
the most simple setup is to include the model module like this.
By doing so, we gain all the search functionalities.
after including
the __es__ methods will become available on both the class level and instance level.
The class level _es_ method, would return ClassMethodProxy
And instance level es method would return instanceMethodProxy
The proxy both points back to the source using the `target` method.
We call all search related functionalities through thess proxies
Here I'll use indexing as a exmaple to show how the gem works.
When an issue is created, what will happen behind the scenes?
Previously we included the `callback` module,
which will setup 3 after_commit callbacks
for create/udpate and delete
Since we are creating a record
the first callback is triggered. It calls the proxy's index_document method
The index_docoument will first prepare the data for indexing.
Here at point 1 we see it calls as indexed json method.
By default this method would serialize all the attributes.
Then a client is used to send this data to the server.
at this point, everything is done.
But someitmes we don't necessarily want every attribute
to be searchable.
We might only want to search title.
Here we can override the default by defining our own
as_indexed_json in our model.
Just return a hash of things we need indexed.
The method name __elasticsearch__ is very long.
Typing it all the time is tiresome.
es rails also provides convenience methods to bypass this.
A few of the the methods are delegated to the proxy objects.
So now we can just type book.search directly.
proxies are consisted of many modules. This means we can cherrypick only the things we need
Proxy itself uses a client to send http requests to the server.
By default, all proxies share the same client
which is just a Faraday client.
The client will have the information on
where it can find the server, such as the url or the port.
as a summary, the simple setup gives us two methods to access the proxy object,
and from there we can talk to the server.
In GitLab we use Elasticsearch in slightly different way.
Until last year, we had issues reindexing stuffs.
Everytime we reindex, it would take as long as a week to do.
During this time, search results will be incomplete.
yet we require re-indexing since we do change data schema from time to time.
so our goal was to allow zero downtime search
when data schema changes
The development path was decided to allow multiple versions of search code to co-exist at the same time.
and we determine which version to call at run time.
For example,
We used to put all snippet related search logic
in the snippet search module, which is included by Snippet model.
This is less flexible if we want to have mutiple versions.
On the other hand, using classes and objects is very suitable for this kind of task
To keep things dry
we also extract the common logic into a super class
we have a switchboard
which would redirect commands to the desired version
Recall earlier, in the simplest setup, `__es__` would take us from the record to the proxy.
Now we can have `__es__` take us to a switch board, which would then forward calls to the correct version class.
at the time I named the switchboard classes like this,
which is kind of wordy now
The name of the switchboards are
* `MultiVersionClassProxy`
* `MultiVersionInstanceProxy`
Not great names. I didn't have a good name when I wrote these.
總結一下,一開始我列出的表格,
每個格子可以用先前四個方法來表達
So how is the table represented in code?
The `generate_forwarding` class method is responsible
for setup method forwarding at boot time.
let's look at the first step.
for each method in methods_for_all_write_targets
we call `forward_to_all_write_targets`
This would dynamically define a method,
to call each writing targets. (as indicated by the circle 2)
Lastly, we collect all the responses.
We return the unsuccessful one if it exists,
otherwise the call is considered successful
That's for write operations.
For read operations, we only need to forward that to one version which is in sync.
How do we determine which are read methods?
We first just take all the methods
filter out the write methods,
We also filter out the instance methods here.
lastly we filter out method missing.
the remaining methods are considered read methods,
and we call forward_read_method to each of them (see circle 3)
generate_forwarding 的第二步是把只需要一個version的方法設好 forward
Let's take a detour here. and talk about `real_class` method.
Elasticsearch tries to be smart, and overrides `class` method on InstanceProxy to point to the classproxy.
`SnippetInstanceProxy#class` would be `SnippetClassProxy`
This can result in cryptic errors.
To obtain the actual class, I have to re-create one on my own,
and I call it `real_class`
so when we call version method, what do we get?
We first get the namespace,
and then we call proxy_class_name to get the name of the proxy we want.
lastly we fetch the proxy with that name from the namespace
That's all for switches, now we are at the final part,
to connect rails model to switchboard.
Each searchable model includes ApplicationVersionedSearch module
It provides access to switchboard
it provides permission check and various other things
here we can see that it simply initiates the switchboard classes,
one for class level and one for instance level.
This is an overview graph of where each class are:
We have Activemodel on the top left, which
have access to the switchboards,
and the switchboards can determine which version of the proxy to pass
the command on.
So we have explained all three layer of the integration
Now we can have two versions of the search logic.
This means we can have separete client setting,
pointing to different indices
theoretically we can even point to two different clouds.
We can migrate from one cloud provider to the next.
Currently we only have one version.
We hard code `elastic_reading_target` and `elastic_writing_targets` to that version
Ths is because we switched the focus to enabling global search on gitlab.com
first
* In the past, we included many ES related methods into our model
* Now model is slimmer