Published in blog / webtech
Tags: / /

As you know from a recent post, I do care much about the pagespeed of my websites. This is quite easy for sites based on static html like this one. But what about sites that are powered by a content management system like Joomla?

Joomla is great - but slow

For Joomla sites one usually depends on certain plugins to make the site respond faster. Using caching plugins and file mergers for css and js files usually has its impact. But the impact is limited since any call to a Joomla site takes a long chain of commands until a final output is presented to the user. This incorporates an initial php call, calls to an SQL database and a re-assembly of the site. This chain might take up to several seconds on low-cost webspace - even for highly optimized sites.

The workaround: use of static html files

To make my Joomla sites perform (almost) as static websites, I use a dirty but effective workaround made of a simple three-step procedure:

  1. Download all accessible pages from the website as static html files,
  2. Upload these static html pages back to the host server, and
  3. Use htaccess-redirects to let the server deliver the static html files upon request

The cool thing is: because of some tricks here and there, a lot of the dynamic functionality of Joomla is still maintained like feeds and the in-built search. So we get the best of both worlds - a fully functional Joomla with page load times below one second. Don't believe me? Check out an article I wrote back in 2013.

I believe my programming skills could be improved a lot. This is the reason why I never published the necessary code to achieve this dramatic performance boost for Joomla sites. But I believe it is better to share this code in a somewhat functional state than actually never sharing it at all. So please: use the following scripts at your own risk. Technically, I provide the code under the GPL v3.0 licence. So if you improve the code somewhat and would like to distribute it, you need to share the code under the same licence.

The Code

Prerequisites

To use the code you will need to have access to a linux machine to execute shell scripts. In fact, this might be the server of your webspace provider. Make sure you can execute python scripts as well. Furthermore you may use SSH access to your webspace to safely copy by scp. If this is not the case please use FTP. A possible usage of lftp instead of scp in the code below is

lftp ftp://$(USER)@$(HOST) -e "mirror -R static $(FOLDER) ; quit"

In addition you will need an .htaccess file of your Joomla installation - yes I am assuming an Apache server. However, htaccess converters are available online.

Please download the .htaccess file, save its content as htaccess_vorlage.txt and insert soon after "RewriteEngine On" a line with

#STATIC_REPLACEMENTS

After all the downloads and file manipulation to come, the redirects will be placed exactly at this point in the .htaccess that will be uploaded to the server. Before you perform the scripts on your Joomla site, you may want to try the pingdom tools to see how fast it is loading in its current status. Usually you should see download times of about 2 to 5 seconds here.

Script for complete Download of a Joomla site as static html files

Below you find the main script. Simply copy its content, save it as some shell-file (.sh) and change its mod to an executable (e.g. sudo chmod +x mycoolscript.sh). Please also include your Joomla website and hosting parameters.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
#!/bin/bash
##################################
#  Required files: htaccess_vorlage.txt with '#STATIC_REPLACEMENTS' after RewriteEngine On
#  handle with care!
#
##################################

## Please include your parameters 
URL=        #your website url, e.g. http://my-awesome-site.tld
HOST=       #ssh url of your host, e.g. ssh.my-awesome-provider.com
USER=       #your ssh user, e.g. awesomeguy
FOLDER=     #your Joomla website's folder on the host server 
##

# From here on it's generic.

echo "Enter host password, please"
read -s pw

echo "Purely dynamic CMS goes online."
cp htaccess_vorlage.txt .htaccess
touch .htaccess
sshpass -p $pw scp .htaccess $USER@$HOST:$FOLDER/.htaccess

sleep 5

# We now work in the folder 'static' to which we download all of the content

mkdir static
cd static
rm -r *

# File download - note that it is also css and js

echo "Downloading $URL"
wget -N -nH -q --mirror --reject=php,*format=*,png,jpg,pdf,1,PNG -p --html-extension -e robots=off --base=./ -P ./ $URL

#insert straight away if you would like to maintain a certain url handled by Joomla
#rm joomla-should-handle-this-url.html

sleep 1

# getting rid of ".1.html"-files - duplicates by wget

find -L . -type f -name "*.1.html" | while read FNAME; do
    mv "$FNAME" "${FNAME%.1.html}.html"
done
echo "Please check if the html-files are proper etc.! If you are confident, press enter, otherwise abort operation! (Press Enter to proceed)"
read -s blub

# Now we make a list that will be used in our python script to process the html files

echo "Providing all static html files for further processing"
#provide list of html files for last python script, remove beginning ./
find -iname '*.html' -o -iname '*.css' -o -iname '*.js' -type f > html-files.txt
sed 's/\.\///g' html-files.txt > html-files_2.txt
mv html-files_2.txt ../html-files.txt
rm html-files.txt

sleep 1

cd ..

# Processing files and writing .htaccess
echo "Preparing html files and writing htaccess."
python write_htaccess.py

# Finally

echo "uploading files"
sshpass -p $pw scp -r static $USER@$HOST:$FOLDER

sshpass -p $pw scp .htaccess $USER@$HOST:$FOLDER/.htaccess

echo "Success?!"

In the above script you have seen a call to a file called write_htaccess.py. Its content is outlined in a second. For now we should clarify one important aspect of the introduced script:

Why download CSS and JS as well?

CSS and JS files are downloaded by the script as well. Why? With suitable plugins like JCH Optimize you can merge these kind of files. This comes in handy if you have a lot of such files to load. But the problem with these tools is that the files they generate and link to in the html pages expire some day. And if your static html files are still pointing to the expired CSS and JS files, your Joomla website might be seriously broken. So we make static copies of these files as well.

A Python script for html file manipulation and .htaccess preparation

Please save this python script as 'write_htaccess.py' in the same directory.

# -*- coding: utf-8 -*-
import sys
import re

htmlfilesfile = 'html-files.txt'
replacementstring = '#STATIC_REPLACEMENTS'
RewriteCond = 'RewriteCond %{QUERY_STRING} !format=feed\n'
startstring = '#################################################################\n#begin to match statics\n\n#<IfModule justadummymodule.c>\n'
endstring = '#</IfModule>\n\n##end static\n################################################################\n'

# necessary function.

def replace_to_new_file(Zielfile,Vorlage,ZuErsetzen,Ersetzung):
    o = open(Zielfile, 'w')
    data = open(Vorlage).read()
    o.write( re.sub(ZuErsetzen,Ersetzung,data) )
    o.close()
    return 0

def replace_in_file(File,ZuErsetzen,Ersetzung):
    data = open(File).read()
    o = open(File, 'w')
    o.write( re.sub(ZuErsetzen,Ersetzung,data) )
    o.close()
    return 0


with open(htmlfilesfile) as f:
    htmlfiles = f.readlines()
zeilennr = len(htmlfiles)

# mark the html files in the header to see it straight away in the browser.

for ll in range(len(htmlfiles)):
  replace_in_file('static/' + htmlfiles[ll].rstrip('\n'),'<head>','<head>\n<!-- static version -->')
  replace_in_file('static/' + htmlfiles[ll].rstrip('\n'),'action=.*metho','action="/component/search" metho')

# prepare htaccess from htaccess_vorlage.txt

htmlfiles_htaccess = ''

for ll in range(len(htmlfiles)):
    if '.html' in htmlfiles[ll]:
        if htmlfiles[ll] == 'index.html\n':
            htmlfiles_htaccess = htmlfiles_htaccess + RewriteCond + 'RewriteRule ^$' + '    ' + 'static/' + htmlfiles[ll]
        else:
            htmlfiles_htaccess = htmlfiles_htaccess + RewriteCond + 'RewriteRule ^' + str(htmlfiles[ll])[:-6] + '$' + '    ' + 'static/' + htmlfiles[ll]
    else: htmlfiles_htaccess = htmlfiles_htaccess + 'RewriteRule ^' + str(htmlfiles[ll])[:-1] + '$' + '    ' + 'static/' + htmlfiles[ll]

Ersetzung = startstring + htmlfiles_htaccess + endstring
replace_to_new_file('.htaccess','htaccess_vorlage.txt',replacementstring,Ersetzung)

That's it.

If you now call the main script, ./mycoolscript.sh, your site should be rendered from static html files after some processing time and necessary input. If you further like, try the pingdom tools again. Does your Joomla website loads clearly under one second now (at least from a server close to you ;)? I hope this is the case and you are truly happy with the provided piece of software.

Conclusions and Final Remarks

I don't know what your performance increase for your website are or even if the code is working for you. I usually encounter a decrease in load times for Joomla websites by a factor of 5 to 10, which is a huge increase in pagespeed.

PS.: You may also consider enabling compression and caching through your .htaccess:

## Compression
mod_gzip_on Yes

## CACHING
AddType font/otf .otf
<IfModule mod_expires.c>
ExpiresActive On
ExpiresByType image/jpg "access 1 week"
ExpiresByType image/jpeg "access 1 week"
ExpiresByType image/gif "access 1 week"
ExpiresByType image/ico "access 1 month"
ExpiresByType image/png "access 1 month"
ExpiresByType text/css "access 1 week"
ExpiresByType application/pdf "access 1 month"
ExpiresByType text/x-javascript "access 1 month"
ExpiresByType application/x-shockwave-flash "access 1 month"
ExpiresByType image/x-icon "access 1 year"
ExpiresByType font/otf "access 1 month"
ExpiresDefault "access 2 days"
</IfModule>