Million dollar site in CakePHP …

Is that possible? Yes, and it has been done.


— CakePHP is slow. I’ve seen bench-marks on “Hello world” and cake falls behind other frameworks.
— What’s your caching strategy?
— [crickets] …

  1. Cache content
  2. CakePHP comes with a built-in view caching mechanism. Granted there is always a question of real-time data vs. performance, but I’ve yet to come across a project where at least some content could not be cached. Even a five or ten minute cache can be quite helpful if you’ve got millions of hits a day on your site. I’ve described some strategies in a post a while back, so give it a read, if you are after the details of implementation.
    If you need something faster, consider adding a front-end cache solution, like varnish or reverse-proxy nginx. (If you have a choice consider replacing apache, which requires a lot of tweaking with nginx as your web server, which is quite fast out of the box).

  3. Cache DB queries
  4. Optimizing queries is important. You should always use containable and limit the fields being returned by each query. This optimization will only take you so far, however… and I would argue that before you go head-first into nitty-gritty details of each query, you should invest into setting up memcached (nicely supported by cake) to alleviate some load on your DB.

  5. Index your DB fields
  6. Do you have some JOIN’s? Do you perform some searches on certain fields? (i.e. “username”). Then you’d better remember to index any fields in your DB that are used in the JOIN or being searched on. There are a few ways to properly use indexes, but one thing is for sure — without them, your DB and your app performance is going to suffer greatly.

  7. De-normalize your DB structure, better yet use an appropriate DB system
  8. Speaking of JOIN’s, no matter how you slice it… they are costly for the performance. Sometimes it helps to de-normalize your DB structure in order to avoid such expensive operations. An example would be a users table and user_profiles table. I love to keep minimal information in the users table, but if I find that I keep JOIN’ing user_profiles to get additional information, it is probably time to consider de-normalizing data and move whatever piece of info I need into the users table to avoid an extra JOIN. (Counter cache is another prime example of this).
    That being said, if you find yourself de-normalizing your DB quite heavily, perhaps it is time to consider an alternative to RDBMS. Of course, I am going to suggest MongoDB. Where applicable, it is quite alright to use a mixture of DB systems… always use the right tool for the right job (simple, but powerful statement).

  9. Create read/write DB replicas
  10. Do evaluate your read and write queries. Do you have a ton of admin features, which require complex find() operations? Offload them to a read replica, so that your users (or front-end) write queries do not get in the way. Increasing performance through replication is a nice trick, but you should be cautious not to offload mission-critical data to a replica, because the data might be a little behind as compared to your master server.

  11. Offload heavy tasks to a background process
  12. I do hear this pretty often… “I need to export data to an Excel, and it is taking forever”. This is a perfect example of a job that can be offloaded to a background operation. There is simply no need for a user to sit in-front of the screen and wait for X minutes for the Excel file to build and download. When a user requests an Excel report, add this task to your job queue manger and notify the user when the job is complete. (Have you looked at gearman?).
    — Well, I need to have this real-time.
    — Sorry, but this is not going to happen. Having a user sit and wait is already not “real-time”, IMO. Not to mention the unnecessary stress this kind of operation will put on your DB. (This kind of requirement is a perfect time to consider replication as well).

  13. Use AJAX, when needed
  14. The whole point of AJAX was to minimize the number of heavy requests to the server. Imagine you have an e-commerce site, where you have a page of most popular t-shirts (purchased by the users in the last week) cached… for a week. Makes sense, you only rebuild this list once a week, based on the updated data in the DB. For a week, this popular page on your site is served statically, just like good ol’ HTML.
    But there is a gotcha… you have shopping cart info also as part of this page. Well, AJAX to the rescue. While the rest of the page is cached, one little div, which has the shopping cart summary is easily updated by AJAX.
    Granted this is a very simple example, but think about how well this applies to other situations where you need to mix and match static and dynamic content.

A few other things worth looking into: APC, hiphop-php, Yahoo! performance rules.

Overall, I suggest not to get too hung up on trying to squeeze milliseconds out of your PHP code, unless you’ve exhausted all other resources. Spend your time on proper architecture and caching strategies. Avoid premature optimization and do not trust “hello world” benchmarks when measuring something so complex as an entire framework.
Oh, and before you invest hours into setting up various caching and optimization tools, do not forget to run a load and stress tests to help you identify early bottle-necks upfront.

Would love to hear your experience with optimization, caching and improving performance.

Related Posts