I have three web sites that I use internally – they’re part of the development environment. One is a rails app that we use for tracking defects, the second is a mediawiki that we use for documentation, the third is gitweb. We decided to do some travel, but still want to access that development environment over the internet.
First step was to harden the server that hosts this – which I documented a bit here. I also needed to set security on all these servers – the rails app already had a user logon, mediawiki needed changes to make it only accessible with a userid following the instructions at the mediawiki site, gitweb I put behind a simple apache authentication using username and password (we only have 2 users at the moment, so not too onerous).
The next step was to proxy all these from a single web server, so that they all appear as if they were one web site (I only have one domain name). This was suprisingly hard, so I thought I’d write up some instructions for what I did. I’m running Debian Wheezy, so all these instructions are relevant for debian and probably ubuntu.
Firstly, the aim was to get the three sites all accessible from the same page. I created a directory called “Development”, and created an index page within that. The body of that index page looks like:
<a href="gitweb/">gitweb/</a> <a href=">wiki/</a> <a href="rails/">rails/</a>
Nothing flash, but as you can see the aim is that we have three virtual subdirectories, each of those is going to call out to the server that hosts that particular function.
Next, we need to configure our apache to have the modules we need. There are a handful of them:
a2enmod proxy proxy_html proxy_connect proxy_http
Then we need to configure our front-end web server proxy the sites. We’re using apache in what is known as reverse_proxy mode. The difference between a forward proxy and a reverse proxy is that in a forward proxy the calling machine tells the proxy where it wants to go. This is usually used when you’re inside a corporate network and want to connect to the outside internet – you call the proxy server, tell it what web page you want to go to, and it gets it for you (along with logging your activity so you can be sacked later for looking at dodgy web sites). A reverse proxy is used when you want some subset of the pages on a particular apache host to actually be served from another server – the first apache is going to go and get those pages from another apache. Basically you’re making the pages on that other server look like they reside on the first server, importantly you’re not letting the calling browser ask you to retrieve arbitrary web sites for you – this isn’t an open proxy.
Firstly, we configure our sites to be proxied in our site file (found in sites-enabled or sites-available) as follows:
ProxyRequests Off # this is important to avoid creating an open proxy - when off it means reverse proxy <location "/Development/gitweb"> ProxyPass http://git-server/gitweb ProxyPassReverse http://git-server/gitweb </location> <location "/Development/wiki"> ProxyPass http://wiki-server/wiki ProxyPassReverse http://wiki-server/wiki </location> <location "/Development/rails"> ProxyPass http://rails-server ProxyPassReverse http://rails-server </location>
Then we restart apache, you should find that you can now get to the home page of each of your applications. However, you’ll still have a problem where some of the links within the application don’t work – they’re absolute URLs that mediawiki or rails are giving out, and they don’t like being moved to be inside a directory. In order to fix these links we use mod_proxy_html, which will allow us to change the links to something more suitable.
The configuration for this looks like:
ProxyRequests Off ProxyHTMLExtended On # LogLevel debug # ProxyHTMLLogVerbose On SetOutputFilter INFLATE;proxy-html;DEFLATE <location "/Development/gitweb"> ProxyPass http://git-server/gitweb ProxyPassReverse http://git-server/gitweb ProxyHTMLURLMap /gitweb/ /Development/gitweb/ </location> <location "/Development/wiki"> ProxyPass http://wiki-server/wiki ProxyPassReverse http://wiki-server/wiki ProxyHTMLURLMap /wiki/ /Development/wiki/ ProxyHTMLURLMap ^/ /Development/wiki/ ProxyHTMLURLMap http://wiki-server/wiki /Development/wiki </location> <location "/Development/rails"> ProxyPass http://rails-server ProxyPassReverse http://rails-server ProxyHTMLURLMap ^/ /Development/rails/ R </location>
Finally, some tricks and tips.
I spent about 3 hours staring at this attempting to get it to work – but nothing at all seemed to be happening. I finally gave up and switched to using mod_substitute instead. Which also didn’t work. But when I re-enabled mod_proxy_html with the exact same configuration file, it started working. The documentation isn’t good, and frankly something flaky was going on. So if you’re tearing your hair out, don’t be afraid to disable then re-enable. (And yes, I restarted Apache a number of times, so don’t think it’s that).
I also spent a long time trying to get the debug messages to work. I finally worked out that LogLevel appears in a number of places in the configuration – it’s in /etc/apache2/apache2.conf, and also it’s at the bottom of the site configuration file as well – if any of them are set to warn you don’t get debug messages.
Hopefully this will be useful for someone trying to do the same thing I did. I note that I still have a couple of gremlins that I plan to iron out later, and come back and update this post when I do so.