Elasticsearch PHP Clients test drive
Update november, 26: some days after publishing this article, we got some very good feedback, especially on the benchmark part. The SearchSuggestion slowness of Elastica disappeared, and elasticsearch-php got some configuration and code love by Zachary Tong, improving its performances (Guzzle and systematic pings have been fixed). We also ran the test in a controlled vagrant box and the whole benchmark is more robust and accurate, so here are the new graphs! We can see that Sherlock is slower, nervetattoo is no longer the fastest everywhere and elasticsearch-php is now closer to Elastica’s very good performances.
My assertion about Elastica not being able to run raw Json queries has been corrected by Nicolas Ruflin himself, and new pages on the documentation have been added.
About Sherlock, there’s no future plan for it, if development is started again it will be about putting his fluent interface on top of the official client.
I’m very glad to see clients moving forwards thanks to this article: thanks everyone!
Two months ago, the Elasticsearch team released four new official clients for Perl, Python, Ruby and PHP. This is both awesome and unexpected as the community already worked on a lots of clients for those languages (Tire for Ruby and Elastica for PHP at the top of my list).
Tire was a 1500 stars’ project on Github, and has since been renamed retire, putting an end to it. In the PHP world, another client suffured from this release: SherlockPHP. The project was alpha-ish, and is no more actively developed since then.
The common point between those two clients? Both Zachary Tong and Karel Minarik are now Elasticsearch employees and work full time on the respective new clients! And that’s awesome, congrats to them :-)
The point of this blog post is to test the new PHP client against three existing ones:
I have a list of commonly needed requests to perform on an Elasticsearch cluster, the idea is to write them for the four clients and extract statistics and feedback.
Section intitulée elastica-community-leaderElastica, community leader
Elastica is the most popular client at the moment: 560 stars, 80 contributors, used by the Symfony2 community thank to FOSElasticaBundle, it’s here to stay.
It’s a full object-oriented client where everything you manipulate in a cluster has a class.
$this->client->getIndex("my_index")
->getType("my_type")
->search("Pony"); // Return a ResultSet object
You can manipulate your indexes, documents, queries easily via objects, but you can’t copy and paste a Json example query from the internet directly to use against your client. This is the hard part when you start with Elastica: translating a Json to an usable query is hard and can take some lines of code! The official documentation doesn’t cover all the possibilities and I recommend to read the unit tests, as everything is fully tested.
Section intitulée nervetattoo-the-first-oneNervetattoo, the first one
This is the oldest client (July 2010) and it own 183 stars for 5 contributors. The style is much less complicated as you use directly PHP arrays to build everything, from the document to the search request.
$this->client->search("Pony"); // Return an array
Nervetattoo allows to write less code than Elastica at first sight, but you may be constrained if you want to deal with multiple types, as they are configured on the client and not at the request level.
My tests include searching on a disconnected / unreachable node when there’s also a valid node configured. This client wasn’t able to switch to the second configured node when the first one failed, making the whole application fails. This is because the configuration only use the first declared node:
$server = is_array($config['servers']) ? $config['servers'][0] : $config['servers'];
Another misfortune I got is that some API endpoints are not supported, but we can call them manually, like this _stats call versus the supported refresh one:
// Refresh
$this->client->refresh(); // The index is configured in the client
// Stats
$this->client->request('/my_index/_stats', "GET", false, true);
There is no documentation outside the README, and the tests aren’t as complete as the Elastica ones but it’s only 12 classes, the API is quick to learn.
Section intitulée sherlockphp-precursor-of-the-official-oneSherlockPHP, precursor of the official one
This is the client Zachary Tong wrote before joining Elasticsearch. It owns 97 stars and 11 contributors for less than a year of life, and a nice looking website (also, the name is kind of awesome).
It exposes a fluent interface but dealing with it is a bit painful, due to some consistency issues. For example, the QueryString object does not have a setter for the “query” option (it’s a magic method) but all the other ones (like auto_generate_phrase_queries) are present – how can you or your IDE guess that?
Sherlock::queryBuilder()->QueryString()->query("Pony");
My “two nodes test” has been compromised by this line of code in the Cluster configuration:
$this->nodes[$host] = array('host' => $host, 'port' => $port);
If the two nodes have the same host, only the second one will be stored. Also, they are used randomly and not removed from the list if they do not answer.
Documentation looks good at first, but miss some important pages, and some Elasticsearch features aren’t supported (e.g., suggestions, …).
On the good parts, we can use Json or Array to build queries, and the HTTP layer is based on rolling-curl.
Section intitulée elasticsearch-php-the-new-playerElasticsearch-PHP, the new player
We finish with the official client, 108 stars and 6 contributors for only 2 months of public life and the only one with a real and full documentation.
It uses Guzzle as HTTP client, and has a good ConnectionPool management (with random and ping to remove dead nodes).
Syntax may be verbose for some case. Here is how we do a QueryString, the simplest query in Elasticsearch:
$params = array(
'index' =>"my_index",
'type' =>"my_type",
'body' => array(
'query' => array('query_string' => array(
'query' => "Pony",
))
)
);
$docs = $this->client->search($params);
There is no shortcut for it as in other clients, but you are not forced to build such PHP Arrays as Json is also supported in the body
part.
All the API endpoints are supported and easy to use:
$this->client->indices()->stats(array('index' => "my_index"));
And results are always presented as decoded Json (PHP Arrays).
Section intitulée performance-and-memory-comparisonPerformance and memory comparison
Let’s run a bunch of classic queries hundreds of time with each client to extract some not so useful statistics!
Those queries were implemented on all clients:
- getDocument: fetch a document by Id;
- searchDocument: perform a QueryString search;
- searchDocumentWithFacet: perform a MatchAll query with a Term facet on the name;
- searchOnDisconnectNode: perform a QueryString search with a disconnected and a valid one;
- searchSuggestion: perform a QueryString search with a Term suggest;
- indexRefresh: refresh an index;
- indexStats: get the statistics of an index.
The code is open-source, you can run it yourself..
The smaller the better.
As you can see, the official client (in blue) is slower on every requests except for Suggestion, where Elastica (orange) is very slow (this look like a bug). Sherlock and Nervetattoo miss some statistics for queries that couldn’t be done, otherwise Nervetattoo (green) is always the fastest.
For memory usage and total running time, the official client also needs some optimisations as it use almost three times the memory used by Nervatattoo. The time in ordinate isn’t relevant as some tests were skipped on the two fasters ones.
Section intitulée the-right-client-for-the-right-jobThe right client for the right job
The fluent interface of Sherlock is incomplete and inconsistent, making it quite hard to use There are some bugs and 4 of my 7 queries are just not possible with it. Also, the development looks stopped. You should avoid it.
Nervatattoo is really fast, I think that if you need a quick and simple client in which you can paste a query array and get a response array, it gets the job done – without node fallback or documentation but with a good memory footprint.
If you need more, you have two choices: the object API of Elastica or the array API of the official client. This last one is maybe a bit too young as it consume a lot of memory and isn’t so fast, but the whole network layer looks very good except for the Thrift support missing (Elastica support it).
Elastica is my favourite one but I miss the ability to simply send a copy/pasted Json instead of writing complex objects graph. The huge time consumed in the Suggest query test is certainly just a pull request away, and I have no doubts that the official client will get some performance improvements in the near future. Stick with them and chose the more appropriate to your needs!
Commentaires et discussions
Quel client PHP pour Elasticsearch
Mise à jour du 26 novembre : quelques jours après la publication de cet article, nous avons eu quelques très bon retours sur la partie benchmark. La grosse lenteur constatée sur la SearchSuggestion d’Elastica a disparu, et elasticsearch-php a été reconfiguré et modifié par Zachary…
Lire la suite de l’article Quel client PHP pour Elasticsearch
Nos articles sur le même sujet
Ces clients ont profité de notre expertise
Armadillo édite un moteur de base de données spécialisé dans la gestion de données multimédias et des métadonnées associées. Depuis de nombreuses années, cette plateforme est accessible par le biais d’un connecteur PDO pour PHP, dont nous avons facilité l’intégration en développant une librairie PSR-0 ainsi qu’un bundle Symfony. Notre mission a principalement…
Afin de poursuivre son déploiement sur le Web, Arte a souhaité être accompagné dans le développement de son API REST “OPA” (API destinée à exposer les programmes et le catalogue vidéo de la chaine). En collaboration avec l’équipe technique Arte, JoliCode a mené un travail spécifique à l’amélioration des performances et de la fiabilité de l’API. Ces…
JoliCode accompagne l’équipe technique Dayuse dans l’optimisation des performances de sa plateforme. Nous sommes intervenus sur différents sujets : La fonctionnalité de recherche d’hôtels, en remplaçant MongoDB et Algolia par Redis et Elasticsearch. La mise en place d’un workflow de réservation, la migration d’un site en Twig vers une SPA à base de…