Introducing Elastically, our Elastica Ally
Sorry for the pun 😅
In March, I got the chance to share my knowledge about Elasticsearch and PHP with hundreds of developers at Symfony Live Paris. While building this talk, I tried to make sense of all the PHP implementations I came across, either while auditing third party applications or building from scratch for our clients.
In this article, I would like to introduce Elastically, a thin wrapper on top of Elastica we use to bootstrap our Elasticsearch implementations.
Section intitulée building-php-and-elasticsearch-applicationBuilding PHP and Elasticsearch application
When a project needs Elasticsearch, most of the time we build our own indexing and search components on top of Elastica. This library is really convenient as it exposes every Query DSL clause and API endpoint as PHP classes, and is very well maintained. Our experience also made us consider some good practices that we impose on ourselves from now on.
Section intitulée do-not-tie-mapping-and-document-togetherDo not tie mapping and document together
The JSON document you send to Elasticsearch and the actual Mapping – the fields in Lucene – should not be correlated.
The JSON document should contain:
- the data needed for search;
- the data needed for the view, the manipulation, etc.
But the Mapping only needs one:
- the data needed for search.
As an example, if you index a product:
{"name": "WashWash 3000", "picture": "https://cdn.example.com/toothpaste-cropped.jpg"}
You need the picture
for display obviously, so it makes sense to have it in JSON. But you should not index this field, because you are never going to search product by picture
! And guess what, by default Elasticsearch will index this data.
So firstly, you should not use the dynamic mapping as it’s a very good way to compromise data and store useless data in Lucene.
Secondly, your Mapping should only consist of one field, the name
. So it has to be explicitly written, and is not the same as the data structure.
In Elastically, this is the default behavior.
Section intitulée use-yaml-instead-of-json-or-array-for-configurationUse YAML instead of JSON or array for configuration
Elasticsearch Mappings are JSON formatted – but as humans, writing JSON is just a massive pain.
{
"settings": {
"number_of_replicas": 1,
"number_of_shards": 3,
"analysis": {},
"refresh_interval": "1s"
},
"mappings": {
"properties": {
"title": {
"type": "text",
"analyzer": "english"
}
}
}
}
In Elastica, we can setup index mapping as an Array. PHP Array are verbose and not that much easier to write and maintain:
[
'settings' => [
'number_of_replicas' => 1,
'number_of_shards' => 3,
'analysis' => [],
'refresh_interval' => '1s',
],
'mappings' => [
'properties' => [
'title' => [
'type' => 'text',
'analyzer' => 'english',
],
],
],
];
So what we do now is always use YAML. This format has some downsides but also lots of perks:
- comments;
- anchor and merge (article in French) to reuse parts of the configuration in multiple places;
- support in IDE…
settings:
number_of_replicas: 1
number_of_shards: 3
analysis: {}
refresh_interval: 1s
mappings:
properties:
title:
type: text
analyzer: english
In Elastically, the use of YAML is forced.
Section intitulée use-dto-data-transfer-objectUse DTO: Data Transfer Object
On top of Elastica, we add some logic to write and read DTO in Elasticsearch, instead of plain old array.
The advantages are:
- The code is easier to read and manipulate;
- It’s closer to what we already do with Doctrine ODM;
- Interoperability with other storage is easier to manage;
- Data is always consistent and we can pass the DTO as type-hinted arguments, there is no need to guess from an associative array.
As Elastica only talks JSON or array, Elastically introduce a custom Indexer and ResultBuilder allowing to pass and retrieve PHP objects (via a Serializer).
Section intitulée indexes-should-be-versionedIndexes should be versioned
When talking to an Index, we do it via its name. That’s good, unless we want to update the mapping of that index, because we have to rebuild it. To avoid downtime, we use aliases on top of our indexes.
In Elastically, this is forced and transparent.
Section intitulée tools-for-better-integrationTools for better integration
Some tools are also implemented (or on their way!) to ease application development:
- The Indexer: allowing to use the Bulk API properly;
- (TBD) A reindexing command: leveraging the Reindex API to rebuild your entire index automagically when you update your Mapping configuration (think about deployment);
- (TBD) An updater helper to ease real-time updates even when the reindexing command is building the “next” index;
- (TBD) A custom healthchecker: allowing you do get tailor-made insight about your cluster health (Is there enough document in that index?)…
Section intitulée how-to-useHow to use?
Elastically is not released yet as I still want to add some features, but you can already use it for the core functionalities (DTO, Indexer…).
composer require "jolicode/elastically:dev-master"
Then you can use JoliCode\Elastically\Client
instead of Elastica Client; they are 100% compatible as it’s just a parent class.
// Building the Index from a mapping config
use JoliCode\Elastically\Client;
use Elastica\Document;
// New Client object with new options
$client = new Client([
// Where to find the mappings
Client::CONFIG_MAPPINGS_DIRECTORY => __DIR__.'/configs',
// What object to find in each index
Client::CONFIG_INDEX_CLASS_MAPPING => [
'beers' => App\Dto\Beer::class,
],
]);
// Class to build Indexes
$indexBuilder = $client->getIndexBuilder();
// Create the Index in Elasticsearch
$index = $indexBuilder->createIndex('beers');
// Set proper aliases
$indexBuilder->markAsLive($index, 'beers');
// Class to index DTO in an Index
$indexer = $client->getIndexer();
$dto = new Beer();
$dto->bar = 'American Pale Ale';
$dto->foo = 'Hops from Alsace, France';
// Add a document to the queue
$indexer->scheduleIndex('beers', new Document('123', $dto));
$indexer->flush();
// Force index refresh if needed
$indexer->refresh('beers');
Section intitulée the-serializerThe Serializer
By default, Elastically will leverage the ObjectNormalizer from Symfony to transform your DTO to an array. That’s easy and fast, but you can also setup your own.
At JoliCode, we use Jane PHP to generate super fast Normalizer based on a JSON Schema. We can declare our Model and Jane generate the PHP code: the DTO, the Normalizer and a factory.
Section intitulée less-time-on-the-basics-more-time-on-the-business-valueLess time on the basics, more time on the business value
Elastically is not meant to be a fully feature implementation like FOSElasticaBundle for example. I want it to be an opinionated framework to build Elasticsearch based feature in PHP application.
I would be glad to hear about different approaches when dealing with Elasticsearch via PHP, so feel free to compare and share your experiences!
Code is available on Github as always: https://github.com/jolicode/elastically
Commentaires et discussions
Elasticsearch the right way in Symfony
You are building an application with Symfony – good choice 😜 – but now you need some full-text search capabilities? This article is for you. Multiple options are available: going full RDMS and using FULLTEXT indexes – yes it works; using a third party SaaS like Algolia or Elastic…
Lire la suite de l’article Elasticsearch the right way in Symfony
Nos articles sur le même sujet
Nos formations sur ce sujet
Notre expertise est aussi disponible sous forme de formations professionnelles !
Elasticsearch
Indexation et recherche avancée, scalable et rapide avec Elasticsearch
Ces clients ont profité de notre expertise
Dans le cadre du renouveau de sa stratégie digitale, Orpi France a fait appel à JoliCode afin de diriger la refonte du site Web orpi.com et l’intégration de nombreux nouveaux services. Pour effectuer cette migration, nous nous sommes appuyés sur une architecture en microservices à l’aide de PHP, Symfony, RabbitMQ, Elasticsearch et Docker.
JoliCode a assuré le développement et les évolutions d’une des API centrale de l’épargne salariale chez Groupama. Cette API permet à plusieurs applications de récupérer des informations et d’alimenter en retour le centre de donnée opérationnel. Cette pièce applicative centrale permet de développer rapidement des applications avec une source de vérité…
Dans le cadre d’une refonte complète de son architecture Web, Expertissim a sollicité l’expertise de JoliCode afin de tenir les délais et le niveau de qualité attendus. Le domaine métier d’Expertissim n’est pas trivial : les spécificités du marché de l’art apportent une logique métier bien particulière et un processus complexe. La plateforme propose…