{"id":2529,"date":"2022-12-01T16:25:30","date_gmt":"2022-12-02T00:25:30","guid":{"rendered":"https:\/\/babypengu.in\/blog\/?p=2529"},"modified":"2022-12-03T12:41:01","modified_gmt":"2022-12-03T20:41:01","slug":"php-efficiently-storing-object-data","status":"publish","type":"post","link":"https:\/\/babypengu.in\/blog\/tech\/php-efficiently-storing-object-data\/","title":{"rendered":"PHP at Scale: Efficiently Storing Object Data"},"content":{"rendered":"\n<p class=\"wp-block-paragraph\">This post seeks to discourage two patterns in any PHP code that may ever need to scale:<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>use of a $properties array to store the attributes of an object<\/li>\n\n\n\n<li>setting dynamic properties (i.e. setting $obj-&gt;foo where &#8220;foo&#8221; is undeclared)<\/li>\n<\/ul>\n\n\n\n<p class=\"wp-block-paragraph\">Both of these may seem like innocent, sensible additions at one time without it being clear to developers how grave the consequences are down the line. (<a href=\"#benchmark\">skip to a dynamic-property benchmark<\/a>)<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Starting Simple<\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code>class Company {\n  public $name;\n  public $address;\n}<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This is fine for an introduction to OOP, but ignores real world needs like representing relational data. Consider that the consumer of a Company object may want to know the ids of employees at the company, and that those may need to be loaded from a database. To avoid repeated queries, it makes sense to cache that data. This leads to the addition of:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>  protected $employee_ids;\n  public function employeeIds() {\n    if (!isset($this-&gt;employee_ids) {\n      $this-&gt;employee_ids = \/* fetch from DB *\/;\n    }\n    return $this-&gt;employee_ids;\n  }<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">An unrelated feature may then require the ability to iterate over the attributes of a company, and perhaps only those that would be stored in the &#8220;companies&#8221; table of a SQL database. That becomes more complicated when the object&#8217;s set of properties now includes both attributes of the company itself (name, address), and something derived at runtime from another dataset (employee_ids).<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">A (Non-) Solution<\/h4>\n\n\n\n<pre class=\"wp-block-code\"><code>class Company {\n  \/\/ with the expectation that $this-&gt;properties&#91;'name']\n  \/\/ replaces $this-&gt;name\n  public $properties;\n}<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This is by far the simplest solution in that it isolates attributes that need to be iterated over inside a data structure that&#8217;s already iterable.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\"><strong>The issue:<\/strong> Hash tables are memory hogs, and every instance of Company now requires a hash table.<\/p>\n\n\n\n<h5 class=\"wp-block-heading\">Understanding the Impact on Memory<\/h5>\n\n\n\n<p class=\"wp-block-paragraph\">To overly simplify things, if a class has <em>n<\/em> properties, the interpreter can allocate space for <em>n<\/em> pointers for each instance of the object. It then only needs a single list of symbols to map each to a particular pointer. Ignoring other data, such as property visibility, this can be illustrated roughly like so:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ mapping symbols to indices\n{\n  'foo': 0,\n  'bar': 1,\n  'baz': 2\n}\n\n\/\/ 3 blocks of memory for object 1\n&#91;*ptr_to_foo]&#91;*ptr_to_bar]&#91;*ptr_to_baz]\n\n\/\/ repeat for objects 2 to n-1...\n\n\/\/ 3 blocks of memory for object n\n&#91;*ptr_to_foo]&#91;*ptr_to_bar]&#91;*ptr_to_baz]<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">To find the baz property of object <em>n<\/em>, the interpreter needs only to find that the index of baz is 2, then add 2<em>b<\/em> to the first pointer-address of <em>n<\/em> to find the pointer to the data. Most critically, the memory overhead of mapping the list of properties is only incurred once. That is, where <em>m<\/em> is the size of the map, and <em>b<\/em> is the size of the pointer, the total allocation is <em>m<\/em> + 3<em>nb<\/em>. It&#8217;s important to note that <em>m<\/em> is larger than <em>b<\/em>, making it minimally painful to scale <em>n<\/em>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When an associative $properties array is introduced, PHP implements this as a hashtable <em>for every instance<\/em>. The data structure may be represented as:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>\/\/ $properties of object 1\n{\n  'foo': *ptr_to_foo_1,\n  'bar': *ptr_to_bar_1,\n  'baz': *ptr_to_baz_1\n}\n\n\/\/ repeat for objects 2 to n-1...\n\n\/\/ $properties of object n\n{\n  'foo': *ptr_to_foo_n,\n  'bar': *ptr_to_bar_n,\n  'baz': *ptr_to_baz_n\n}<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">Each of these encounters overhead that was only needed once in the previous example. Where <em>s<\/em> is the size of one of these, the overhead becomes <em>sn<\/em>. In plain language, this is the relevant comparison:<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">1 big thing plus <em>n<\/em> small things <strong>vs.<\/strong> <em>n<\/em> big things<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">With all of this considered, it becomes immediately obvious why $properties arrays are a terrible idea for any sufficiently large set of objects.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">An Efficient Solution<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">The desire is to have an isolated, iterable container for a specific set of data related to the company. Another object that implements things like the Iterator interface is arguably the best way to handle this in PHP. A basic implementation is <a href=\"https:\/\/github.com\/brandon-detty\/php-dynamic-prop-pitfall\/blob\/main\/src\/Data.php\" target=\"_blank\" rel=\"noreferrer noopener\">available on GitHub<\/a> as part of a test setup that demonstrates just how terrible the memory tradeoff of convenience can be (more on that later). To simply illustrate the solution at a high level:<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>abstract class Data implements \\ArrayAccess, \\Iterator, \\Countable {\n  \/\/ refer to the GitHub link above for the body\n}\n\nclass CompanyData {\n  public $name;\n  public $address;\n}\n\nclass Company {\n  \/\/ $properties is an instance of CompanyData\n  public $properties;\n}\n\n$c = new Company();\n\n\/\/ works thanks to ArrayAccess interface\n$c-&gt;properties&#91;'name'] = 'Widgets Unlimited';\n$c-&gt;properties&#91;'address'] = '123 Main St.';\n\n\/\/ works thanks to Iterator interface\nforeach ($c-&gt;properties as $prop =&gt; $value) {}<\/code><\/pre>\n\n\n\n<p class=\"wp-block-paragraph\">This grants the desired isolation of attributes about a given company into $properties as well as providing convenient iteration and access, but it does so while avoiding the overhead of an array for each object because each $properties is an efficiently stored instance of a CompanyData class.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">Danger Ahead<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">PHP is overly permissive and that can lead to destroying that memory advantage. Namely, the allowance of dynamic property assignment has a terrible side-effect.<\/p>\n\n\n\n<pre class=\"wp-block-code\"><code>class Animal {\n  public $type;\n}\n\n$cow = new Animal();\n\/\/ PHP allows this**\n$cow->color = 'brown';<\/code><\/pre>\n\n\n\n<p class=\"has-small-font-size wp-block-paragraph\" style=\"margin-top:0\"><em>** This generated a notice prior to PHP 8.2, and the error_reporting default hid these notices until 8.0. As of 8.2, it generates a warning.<\/em><\/p>\n\n\n\n<p class=\"wp-block-paragraph\">When PHP encounters this code, it can no longer rely on the map from properties to pointers that makes objects efficient because &#8220;color&#8221; is not in that map. Instead, a new map is created just for the $cow instance. This is effectively a return to having the overhead of a $properties array for every instance.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">Fortunately, version <a rel=\"noreferrer noopener\" href=\"https:\/\/wiki.php.net\/rfc\/deprecate_dynamic_properties\" target=\"_blank\">8.2 deprecated dynamic properties<\/a>, but with no announced plans for version 9.0, the removal of the feature (*except when using a special class attribute) is far into the future. For now, developers must remember that something being allowed doesn&#8217;t make it a good idea.<\/p>\n\n\n\n<h4 class=\"wp-block-heading\">By the Numbers<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">That&#8217;s a nice bit of theory, but how bad could it really be? <em>Bad<\/em>. <em>Really bad<\/em>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">As an illustrative example, two classes (MunicipalityData, PersonData) were created that extend Data. Each is suitable for use as a properties container as outlined earlier in the post. Whereas MunicipalityData only has three properties, the Person class has 32 to illustrate how things scale with more properties.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">A benchmark script creates 100,000 objects of a particular class, records the memory usage, then does it again, only this time with setting a dynamic property on each object. It does so for both classes and reports the results.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">All of the code is <a rel=\"noreferrer noopener\" href=\"https:\/\/github.com\/brandon-detty\/php-dynamic-prop-pitfall\" target=\"_blank\">available on GitHub<\/a>.<\/p>\n\n\n\n<p class=\"wp-block-paragraph\">The output:<\/p>\n\n\n\n<pre id=\"benchmark\" class=\"wp-block-code\"><code>  Testing MunicipalityData (w\/ 3 public properties)\n    No Dynamic Props: 14.15 MiB, 0.018 seconds\n    w\/ Dynamic Props: 49.02 MiB, 0.032 seconds\n    <strong>Memory Penalty: 3.5x<\/strong>\n    Time Penalty: 1.8x\n  Testing PersonData (w\/ 32 public properties)\n    No Dynamic Props: 65.05 MiB, 0.048 seconds\n    w\/ Dynamic Props: 314.52 MiB, 0.138 seconds\n    <strong>Memory Penalty: 4.8x<\/strong>\n    Time Penalty: 2.9x<\/code><\/pre>\n\n\n\n<h4 class=\"wp-block-heading\">&#8220;Don&#8217;t Use PHP Then&#8221;<\/h4>\n\n\n\n<p class=\"wp-block-paragraph\">This is a silly response. Legacy code exists, and some new projects <em>are<\/em> using PHP, even if usage is sliding. Engineers also rarely know how a product is going to evolve. What was once functional with a small number objects may someday be asked to scale by orders of magnitude. Using best practices early on spares a lot of pain. This post was inspired by a real example where a piece of enterprise software was crashing after exhausting a 1 GB memory budget. RAM is a precious resource, especially in high-concurrency environments. Code accordingly.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>This post seeks to discourage two patterns in any PHP code that may ever need to scale: Both of these may seem like innocent, sensible additions at one time without it being clear to developers how grave the consequences are down the line. (skip to a dynamic-property benchmark) Starting Simple This is fine for an [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"closed","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[5],"tags":[15],"class_list":["post-2529","post","type-post","status-publish","format-standard","hentry","category-tech","tag-php"],"_links":{"self":[{"href":"https:\/\/babypengu.in\/blog\/wp-json\/wp\/v2\/posts\/2529","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/babypengu.in\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/babypengu.in\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/babypengu.in\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/babypengu.in\/blog\/wp-json\/wp\/v2\/comments?post=2529"}],"version-history":[{"count":21,"href":"https:\/\/babypengu.in\/blog\/wp-json\/wp\/v2\/posts\/2529\/revisions"}],"predecessor-version":[{"id":2553,"href":"https:\/\/babypengu.in\/blog\/wp-json\/wp\/v2\/posts\/2529\/revisions\/2553"}],"wp:attachment":[{"href":"https:\/\/babypengu.in\/blog\/wp-json\/wp\/v2\/media?parent=2529"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/babypengu.in\/blog\/wp-json\/wp\/v2\/categories?post=2529"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/babypengu.in\/blog\/wp-json\/wp\/v2\/tags?post=2529"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}