Battle log: a deep dive in Symfony stack in search of optimizations 1/n
My team and I are working on a big project that is becoming bigger and bigger. Split in micro services, each end-user call generates an increasing number of HTTP API calls.
Performance was becoming a problem and the Developer eXperience (DX) suffered as a result. Note that we already have a perfect stack on paper. Everything is up to date, all the default optimizations are already in place.
It’s time for a deep dive in search of NEW performance optimizations.
This adventure will lead us through many Blackfire profiles and two big concepts of Symfony’s stack:
- the container compilation: how it works and what it means for both dev and prod environments;
- the serializer and its metadata extraction.
These are the two hot paths for API developers and the ones I worked on the most.
This article is the first in a new series that will explain what we learned and how we discovered new performance improvements. We’ll recap all the performance tips you can use on your projects, and the next steps we expect to follow.
The second article is out and focus on the performance in Symfony dev env.
Section intitulée the-stackThe stack
- Symfony 4.4
- API Platform 2.5
- Jane 5.2
- React ??
- Phpstan at the maximum level 😎
- Cypress / Panther
- PHPUnit
- CircleCI
- dev stack uses our Docker starter which uses Docker and Alpine based images.
At the beginning of the project, developers were mainly using MacOS but many switched to Linux for a lot of reasons, one of them being the poor performance of the stack. For the ones that remained on MacOS, we created pomdok which uses the Symfony binary on the host (french blogpost presenting it 🇫🇷). Even then, performance was not good enough to work efficiently.
So I started digging.
Section intitulée the-baselineThe baseline
On a normal Symfony stack, there are three Symfony environments (a confusing name): dev
, prod
and test
. If it’s slow in prod, it will definitely be worse in dev. So let’s go see what we can optimize there first.
The prod environment should follow theses principles (best described in this wonderful documentation):
- use a composer optimized autoload;
- don’t check that caches are stale;
- warmup everything;
- cache everything;
- use a fast cache (apcu > opcache > the rest).
The dev environment is the opposite:
- always check if caches are stale and if something has changed, rebuild what is necessary to reflect the change in cache (without calling the
cache:clear
command manually); - on some paths, bypass the cache because we are not capable of handling it;
- collect / log / profile what we can, so the developer can be made aware of what happened during the request very easily.
Everything works pretty smoothly and a search for the tag “performance” in the GitHub Symfony repository does not surface many complaints.
Let’s start with the prod environment.
Section intitulée optimize-the-prodOptimize the prod
Before looking for problems in our dependencies, be sure to audit your own code.
TL;DR: there are two things to check: event subscribers and normalizers. They are always loaded, always checked. They should have no dependencies, or they should be lazy.
Section intitulée chase-the-default-costsChase the default costs
Let’s create a new controller:
// src/Controller/DebugController.php
/**
* For debug purposes only.
*
* @Route("/empty", methods={"GET"}, name="debug_empty")
*/
public function empty(): Response
{
return new Response('ok');
}
$ cat .env.local APP_ENV=prod $ composer dump-autoload -a $ bin/console cache:warmup
App is loaded, let’s go profiling!
blackfire curl http://api/empty
Yep, I created a route that does nothing more than saying “ok”. The goal is to see the cost of booting the app. Chase the usual suspects, stuff that should not be there:
- external calls (HTTP or SQL);
- services booted that are not supposed to be used in this blank page.
If they are present here, they will be everywhere. So we need to deal with them.
In my case, I had some event subscribers that had a Doctrine Repository as a constructor dependency, as well as a few others services. That resulted in booting a lot of services for nothing.
I wasn’t in the mood for refactoring, so I just made them lazy.
// config/services.yaml
services:
# required by MediaNormalizer
Lib\Core\Media\Cloudinary:
lazy: true
Section intitulée cost-of-the-serialization-stackCost of the serialization stack
Second test, the serializer. The serialization stack in Symfony is always improving and is good enough for a lot of cases, but there is room for improvement.
Let’s add some new methods to our controller
// src/Controller/DebugController.php
/**
* For debug purposes only
*
* @Route("/serializer", methods={"GET"}, name="debug_serializer")
*/
public function serializer(SerializerInterface $serializer)
{
$object = new Dto();
$object->setTitle('test');
return JsonResponse::fromJsonString($serializer->serialize($object, 'json'));
}
/**
* For debug purposes only
*
* @Route("/serializer-entity", methods={"GET"}, name="debug_serializer_entity")
*/
public function entitySerialized(SerializerInterface $serializer, MyEntityRepository $repository)
{
$entity = $repository->find(1);
return JsonResponse::fromJsonString($serializer->serialize($entity, 'json'));
}
blackfire curl http://api/serializer blackfire curl http://api/serializer-entity
The basic algorithm of the serializer is:
1/ what is this data? => gather information about it 2/ iterate all properties and methods => should I normalize it? => how should I normalize it? 3/ normalize 4/ serialize
The normalization and the serialization can certainly be optimized, but this is not where we will dedicate our effort. We will work on all the side jobs that lead to it.
The things I fixed on our side:
1/ reduce the serialization groups to the minimal to avoid working on non useful data (the fastest code is code not executed);
2/ don’t forget to configure the max depth level. An indicator we use for that is the weight in kilobytes of the response, we had some in the megabyte range;
3/ BE CAREFUL WITH CUSTOM normalizers. They are all constructed at the start (so the same as event subscribers: no dependencies) and they are ALL checked for all data. You can reduce it to “all data types” by making them Cacheable: a normalizer is cacheable if it only depends on the object’s type, which is almost always the case. It can’t be cacheable if it depends on the context.
Having just one non cacheable normalizer costs a lot:
Our proposal to Symfony maker-bundle now makes all generated normalizers Cacheable by default.
The third test makes sql queries but does not reveal any problems: top time consumer is the sql query itself and that is what we expect.
Section intitulée new-optimizationsNew optimizations
I detected a few issues in dependencies that resulted in new PRs that got merged:
1/ Symfony: the serialization metadata is calculated by Symfony\Component\Serializer\Mapping\Factory\ClassMetadataFactory
, they are warmed up by Symfony\Bundle\FrameworkBundle\CacheWarmer\SerializerCacheWarmer
and the cache is handled by Symfony\Component\Serializer\Mapping\Factory\CacheClassMetadataFactory
. Funnily enough, there is a local cache in ClassMetadataFactory
but not in the decorator, CacheClassMetadataFactory
. So the cache is hit pretty heavily.
This is the profile that lead me to it. Here I forgot to warm the container before (so apcu
is used instead of PhpArrayAdapter
), it clearly shows the hit count problem (the cardinality is the same with PhpArrayAdapter
but costs less).
I just added a local cache in CacheClassMetadataFactory
and that gives us a nice performance improvement in prod of 10% with warmup / 40% without..
The PR was merged into Symfony 5.1, but a patch for 4.x and 5.0 is very easy to apply thanks to the DependencyInjection
:
// XML file loaded from config/services.yaml
<!-- Performance in Serialization cachemetadata, should be removed after upgrade to Symfony 5.1 -->
<!-- See https://github.com/symfony/symfony/pull/35046/files -->
<service id="serializer.mapping.cache.symfony.file" class="Psr\Cache\CacheItemPoolInterface">
<factory class="Symfony\Component\Cache\Adapter\PhpArrayAdapter" method="create" />
<argument>%serializer.mapping.cache.file%</argument>
<argument type="service" id="cache.serializer" />
</service>
<service id="serializer.mapping.cache.symfony" class="Symfony\Component\Cache\Adapter\ChainAdapter">
<argument type="collection">
<argument type="service">
<service class="Symfony\Component\Cache\Adapter\ArrayAdapter">
<argument>0</argument>
<argument>false</argument>
</service>
</argument>
<argument type="service" id="serializer.mapping.cache.symfony.file" />
</argument>
</service>
2/ Symfony: there is a way to override the serialized property name. There was a nice bug that prevented it from being cached, I fixed it for another 6% performance gain \o/. This was merged and released in 4.4.3 and 5.0.3;
It’s not the first time I encountered that bug, so let’s explain it: we are used to adding class variables as arrays to cache things for the duration of the request. Usually, we check with isset
to verify if the cached value exists to avoid recomputation. Here, the computation results in an array or a null
value.
isset — Determine if a variable is declared and is different than NULL
So if the array value is null
, isset
will always return false. Confirmation:
// code
$cache = [
'not-null-result' => ['hello'],
'null-result' => null,
];
foreach ($cache as $key => $value) {
printf(
"with %s:\n isset result: %s\n array_key_exists: %s\n",
$key,
(int) isset($cache[$key]),
(int) array_key_exists($key, $cache)
);
}
// results:
with not-null-result:
isset result: 1
array_key_exists: 1
with null-result:
isset result: 0 // oups, cache miss, even if it’s indeed a result cache
array_key_exists: 1
So the optimization is not a isset
vs array_key_exists
but a misuse of isset
. It’s a frequent one and hard to spot without the right tooling.
3/ API Platform: a big computation had no local cache, so the same value was computed again and again. This PR adds a local cache and has been merged and released as of 2.5.4.
4/ Jane: we use Jane to generate HTTP API clients. We wrote it, love it and maintain it. However, over the last few months, one of our micro services with a large HTTP API added a lot of data types, resulting in a large Swagger file. This Swagger file (30000 lines, 600 schemas and 500 routes) is computed by Jane to generate DTOs and highly optimized normalizers.
The problem was that there were too many of them: more than 600. They are now 600 classes to initialize and to ask if they are capable of normalizing a data type (of course these normalizers are Cacheable
). We rewrote the way these normalizers are declared, to have a parent normalizer that only checks if the object is one it can handle. Hard to beat a simple array_key_exists
.
Section intitulée that-s-all-for-todayThat’s all for today
That concludes our first run of performance improvements. We learned a lot about how the serializer works and the hot paths of a Symfony stack.
I hope that you learned something today and the performance of your application will benefit from it as much as mine. Ben Davies (@bendavies) tried a few of these patches on his prod API and it turned out ok for him:
Spent a few minute chatting to @bastnic
about his recent performance related PRs to @symfony & @ApiPlatform, applied them all (some are unreleased), and…💪
— Ben Davies (@benjamindavies) January 8, 2020
Next article will focus on the dev stack, how it works, what we found and what we learned.
Commentaires et discussions
Battle log: a deep dive in Symfony stack in search of optimizations 2/n
This article is the second in a series that explains what we learned and how we discovered new performance improvements. It’s focused on the dev environment. The first article guides us through multiple new optimizations for the prod environment. You should definitely read it first.…
Lire la suite de l’article Battle log: a deep dive in Symfony stack in search of optimizations 2/n
Nos articles sur le même sujet
Nos formations sur ce sujet
Notre expertise est aussi disponible sous forme de formations professionnelles !
Symfony
Formez-vous à Symfony, l’un des frameworks Web PHP les complet au monde
Symfony avancée
Découvrez les fonctionnalités et concepts avancés de Symfony
Ces clients ont profité de notre expertise
L’équipe d’Alain Afflelou a choisi JoliCode comme référent technique pour le développement de son nouveau site internet. Ce site web-to-store incarne l’image premium de l’enseigne, met en valeur les collections et offre aux clients de nouvelles expériences et fonctionnalités telles que l’e-réservation, le store locator, le click & collect et l’essayage…
Travailler sur un projet US présente plusieurs défis. En premier lieu : Le décalage horaire. Afin d’atténuer cet obstacle, nous avons planifié les réunions en début d’après-midi en France, ce qui permet de trouver un compromis acceptable pour les deux parties. Cette approche assure une participation optimale des deux côtés et facilite la communication…
Canal+ a sollicité l’expertise technique de JoliCode dans le cadre d’un audit technique du framework PHP employé par l’entreprise pour ses développements. À l’aide de notre outillage projet, nous avons évalué la qualité du framework et son adéquation avec l’écosystème PHP actuel, et émis une série de recommandations en vue de la modernisation du socle…