Moving From WordPress to Jekyll

Introduction

When I first started this blog back in 2008 I hosted it myself using an Arch Linux instance with Linode. For the blogging platform I used WordPress MU because it was, and still is, one of the most popular blogging platforms. At that time the MU version was how multiple domains (subdomains) were supported on a single installation. Later it was merged into standard WordPress. It was easy to use and Administer. For a long time I was happy with this setup but at a certain point upkeep was taking to much of my time.

In December 2017 I decided to move to WordPress.com and let them handle all of the system maintenance. The idea was it would give me more free time and I wouldn’t have to worry about security issues. Also, If I ran into a problem they have people on staff to help.

The only plugins I was using were ones provided by JetPack so I wasn’t going to lose any functionality. I made the switch moved off of my Arch Linux instance and shut it down.

WordPress.com A Bad Decision

Using WordPress.com has proven to be huge mistake. For domain mapping, WordPress.com requires you to use their name servers. This isn’t a problem but at the time I was using a subdomain (john.) for my blog and the main site was a landing page. There used to be multiple sub domains hosting different content but at that time the only thing I had left was this blog.

I wanted to have the main site be my blog but I still needed to maintain the sub domain because that’s what had been crawled by search engines. Any exiting links would just redirect to the main site. Getting this setup proved challenging and the “Happiness Engineer” couldn’t help me with setting up DNS records to do this. I was hoping something like a CNAME could be used for mapping but the support person didn’t really understand what I was trying to do. Eventually, I gave up with them and I ended up purchasing a second domain mapping for john. to map to the main site.

Mangled Posts

Over the past year I’ve been finding and correcting broken posts. People have been emailing me about it as they’ve run into issues trying to run some of the code I’ve posted. Whole sections of code were missing, symbols such as <, &, and > were just gone. Other were converted to entities. Even some posts had paragraphs or parts of sentences missing.

Images are another thing the import apparently had problems with. There are a handful of images that never got imported. It’s pretty random because some posts had many images and only one or two from the post are missing. Unfortunately, I don’t have backups from the self hosted instance from that far back.

What’s really strange is images that I never linked to in any posts were imported into WordPress.com. I have no idea how that’s possible because the image import is supposed to scan posts and pull down the referenced images. The links are still in the posts but the images are missing.

Recently, the GutenBerg Editor fiasco happened. Again the “Happiness Engineer” wasn’t able to help me and gave me an unacceptable work around for one of the many issues I was facing. They also ended the support session saying they need to research further and would get back to me via email. I got an email saying they can’t reproduce the issue and that was the end of it. The screen shots and explanations I posted clearly show the issues.

For the past 10 years I’ve been using WordPress and for the majority of the time it’s been a good choice. However, times change and WordPress is no longer providing me the benefits it once was. I’ve become fed up with WordPress.com I decided to move elsewhere. The whole point of using WordPress.com was to make it easier on me but that’s not what happened. I decided to move my blog back to a self hosted Linode but this time using the static website generator Jekyll.

Self Hosting Rational

I decided to go back to self hosting on my “own” server. I used Linode for years and I’ve been happy with their VPS service. I figured if I’m moving I might as well do it myself because I had less issues over the 9 years I was hosting it myself than the past year with WordPress.com.

Some of the issues with WordPress.com are with WordPress itself so I didn’t want to go back to using it. When I was deciding where to move my blog last time I had narrowed my choices down to using Jekyll or WordPress.com. I could either self host Jekyll or use GitHub Pages. Self hosting would remove half of the system administration I needed to do because I wouldn’t need to worry about WordPress, PHP, and MySQL. I ended up going with WordPress.com because I would lose built in search with Jekyll.

The High Level Setup

Since Jekyll was in the running previously I decided to go with it this time. The big thing that I’ve lost with the change is search. I can live without having it and if I really want I can set it up with a third party provider in the future.

I decided to use Arch Linux again for the same reason I did originally. It has very good support for rolling updates. With CentOS you have to essentially do a complete system rebuild when a new release comes out. I’ve experienced many major upgrades go side ways with Ubuntu so that one is out of the running too. There aren’t too many other distros I’m comfortable using and other than not having SELinux support with the base install there isn’t anything Arch Linux can’t do that the other can.

After 9 years of running the same Arch Linux server I feel like it’s a good option. Also, Arch Linux has a very minimal base installation so the focus is installing and enabling what I need, not disabling what I don’t need.

Jekyll

I’m using a handful of Jekyll plugins which give me all (minus search) public functionality I had with WordPress.

jekyll-feed
jekyll-archives
jekyll-paginate
jekyll/tagging

I’m also using jekyll-amp but I have that in my ‘_plugins directory because I made a few changes. I wanted to keep the amp paths the same as WordPress so I had to tweak the plugin.

Right now all of the paths are the same as they were with WordPress. No links where harmed in this move (I hope).

Getting everything out of WordPress wasn’t hard with it’s export functionality. Getting the posts into Jekyll’s format was very easy. I used Exitwp which worked beatify. I took a long time going through every post to fix everything I saw that was broken. This took about 6 hours in total.

I’m not using a Jekyll theme and opted to do my own. I don’t recommend this because it was tedious and took a long time. But the site looks like I want and I shouldn’t have to touch it again for awhile.

Linode

I decided to use a Linode Nano instance which gives me 25 GB of storage and 1 GB of RAM. This is plenty for my low traffic blog. With everything setup, the OS and all files, I’m using 2.1 GB of disk space. That leaves me with just shy of 23 GB left. Plenty of space. Current RAM shows 805 MB free, so I don’t think I’m in danger of running out of memory.

Since I needed to move my DNS away from WordPress.com I decided to use Linode’s DNS servers. They have nice interface and certbot works with it for automating TLS certificate updates.

I setup the DNS but only changed the DNS servers once the server was near fully setup. I didn’t want people trying to go to the server before anything was being served.

Arch Linux Setup

Linode makes deployment very easy. I told it to deploy Arch Linux, and it spun up a base installation. I didn’t have to worry about disk portioning, boot loaders or pretty much anything in the Arch Linux installation guide. Linode will even auto configure static networking for your distro of choice.

Now let’s get into the nitty-gritty of how I configured the OS.

Time sync

The first thing I did was setup time syncing. Some of the things I want to setup are time dependant and it’s essential the system has the correct time.

# systemctl enable systemd-timesyncd.service

Update

The base image Linode setups doesn’t do an update. So the next thing I did was update the OS and rebooted. The reboot is necessary in case the kernel is upgraded. Setting a firewall won’t work if it can’t load the module corresponding to the running kernel.

# pacman -Syu
# reboot

Packages

With only a base image I needed to install the few packages I need to host the blog. I also wanted to remove some that I will never use.

# pacman -S vim nginx ack git libpam-google-authenticator rsync ufw wget certbot certbot-dns-linode
# pacman -R nano

git, ack, and wget are only installed to aid with setting up the system. Once I’ve completed setup those could be be removed. What I mean by setup is making it easier for me to edit config files.

One of the requirements I have is 2-Factor authentication (time based OTP) for remote logins. The libpam-google-authenticator PAM module provides this. certbot is the application that works with Lets Encrypt to provide TLS certificates for the web server. More on setting these up later.

Firewall

Obviously, this is a public server so I needed to secure it with a firewall. I decided to use ufw instead of writing my own iptables rules. ufw is easy to configure and handles ipv4 and ipv6 automatically.

# systemctl enable ufw.service
# ufw default deny
# ufw limit SSH
# ufw allow http
# ufw allow https
# ufw enable

I only need to expose the few services that will ever be accessed from the public internet. SSH is limited to mitigate attacks. I’m not setting up something like fail2ban or SSHGuard because I’m using public key plus 2 Factor login for SSH. I feel rate limiting connections and having secure connection is enough protection.

A quick reboot to verify the rules are applied at startup and this step is complete.

Environment

I pulled down my dot files and got them put in place. wget was used to pull down the files and git was need to setup the vim plugins.

A quick logout and back in and I could actually get some work done.

Adding a user

It’s one thing to be the root user while setting up the system but at some point I need an actual user to run as.

# useradd -m -G users,wheel -s /bin/bash USER
# passwd USER

With the user created I can copy over my ssh public key from my laptop. I needed to use the host ip and not the domain name because I don’t have DNS pointing at the server yet.

ssh-copy-id <host ip>

I also installed my dot file for the user because I’ll be running as this user the majority of the time.

2 Factor Authentication

Before setting up SSH and logging in as my new user I wanted to get 2 Factor authentication setup. This is where the Google Authenticator PAM module comes into play.

I set it up for both root and USER.

# google-authenticator

This asks you a series of questions about how it should be configured. I just turned on all the protection features it offered.

/etc/pam.d/su and /etc/pam.d/su-l

A few PAM files needed to be changed to enable using it.

First I added this to the second line of “/etc/pam.d/su”, and “/etc/pam.d/su-l”.

auth      required  pam_google_authenticator.so

Now anytime su is used root’s verification code will need to be entered. This doesn’t need any service restarted. Just su to USER then run su and it should prompt for the verification code in addition to password. I recommend doing this with a different terminal open before making the changes so they can be reverted is something goes wrong.

/etc/pam.d/ssh

The “/etc/pam.d/ssh” is a little different. In this one I added the authenticator module line but I also commented out the “system-remote-login” auth and password entries. This is necessary to prevent prompting for the user’s password. I’ll explain why this is necessary in the SSH config section. Strongly auth is what asks for the password and password handles updating passwords when they’ve expire.

auth      required  pam_securetty.so     #disable remote root
auth      required  pam_google_authenticator.so
#auth      include   system-remote-login
account   include   system-remote-login
#password  include   system-remote-login
session   include   system-remote-login

SSH

Here are all the options I used. I’m only going to go into detail for the not so common ones and the 2 Factor options.

/etc/ssh/sshd_config

Port 22
AddressFamily any
ListenAddress 0.0.0.0
ListenAddress ::
SyslogFacility AUTH
LogLevel INFO
LoginGraceTime 2m
PermitRootLogin no
StrictModes yes
MaxAuthTries 3
MaxSessions 3
PubkeyAuthentication yes
AuthorizedKeysFile  .ssh/authorized_keys
PasswordAuthentication no
PermitEmptyPasswords no
ChallengeResponseAuthentication yes
AuthenticationMethods publickey,keyboard-interactive:pam
UsePAM yes
PrintMotd no # pam does that
ClientAliveInterval 360
ClientAliveCountMax 0

Options worth mentioning

No one should be allowed to login as root so PermitRootLogin is set to no.

MaxSessions limits the number of open SSH sessions. I’m the only person using the server so 3 lets me have a few sessions open. That said, I really should be using screen if I need multiple terminals.

ClientAliveInterval sets a 6 minute idle timeout. If the session is idle for the specified number of seconds it will be closed by the server. ClientAliveCountMax says to never sent a keep alive message. I want the client to be the one to send data to say the session is still in use.

I want public key authentication enabled (PubkeyAuthentication) and password authentication PasswordAuthentication turned off.

2 factor authentication

A few options need to be set for 2 Factor authentication.

First ChallengeResponseAuthentication needs to be enabled ssh will prompt for the verification code. This allows for keyboard interactive prompting.

sshd need to know that it must use both public key first and PAM authentication second, but only if public key succeeds. This is handled by AuthenticationMethods set to “publickey,keyboard-interactive:pam”. Also UsePAM needs to be set to “yes” in order for PAM to be used for the 2 factor code verification.

By using PAM keyboard interaction in AuthenticationMethods the rules in the sshd PAM module will be used. Even with PasswordAuthentication disabled ssh will still prompt for the user’s password. It’s not sshd that’s prompting it’s PAM requesting it. By disabling password and only having 2 factor enabled in the PAM module config it will only ask for the verification code. We don’t need to worry about having password disabled in the sshd PAM module config because sshd is requiring public key auth before handing further authentication off to PAM.

nginx

This is the big one. nginx is the main program running and is the heart of this server. I relied on this gist for setting up a secure configuration. It has many of the options commented and references the sources that recommend them. I do have some of my own tweaks though. For example, I wasn’t happy with the cipher list and removed some of the less secure ones.

dhparam

nginx by default uses a weak dhparam so I generated a stronger one.

# openssl dhparam -out /etc/nginx/ssl/dhparam.pem 4096

Directory layout

I went with the common pattern of “/etc/nginx/sites-available” and “/etc/nginx/sites-enabled” directory layout. Where enabled is a symlink to a config file in available.

“/srv/http/DOMAIN/” is used for content. Where domain is a subdirectory for every domain being hosted. In my case I only have “/srv/http/nachtimwald.com/”

nginx.conf

The main configuration

user http;
worker_processes auto;
worker_cpu_affinity auto;

error_log /var/log/nginx/error.log;
error_log /var/log/nginx/error.log  notice;
error_log /var/log/nginx/error.log  info;

events {
    worker_connections  1024;
    multi_accept on;
    use epoll;
}

http {
    server_tokens off;
    charset       utf-8;
    types_hash_max_size 4096;
    server_names_hash_bucket_size 128;
    include       mime.types;
    default_type  application/octet-stream;

    add_header X-Frame-Options DENY;
    add_header X-Content-Type-Options nosniff;
    add_header X-XSS-Protection "1; mode=block";
    add_header Content-Security-Policy "default-src 'self'; script-src 'self' https://cdn.ampproject.org; img-src 'self' data:; style-src 'self'; font-src 'self'; object-src 'none'";

    access_log  /var/log/nginx/access.log;

    sendfile       on;
    tcp_nopush     on;

    keepalive_timeout  65;
    gzip  on
    gzip_types
        text/plain                                                                                                      
        text/css                                                                                                        
        text/xml                                                                                                        
        application/json                                                                                                
        application/xml                                                                                                 
        application/xml+rss                                                                                             
        application/javascript;                                                                                         

    include sites-enabled/*;
}

I’ve set a number of secure headers but the Content-Security-Policy is the most interesting. I’m hosting almost all of the content directly. Javascript for example is all being served by me. I’m not using a CDN for bootstrap or jquery and I don’t see any need to have readers pull in external content.

The one exception to external Javascript is AMP. I need to have AMP pages pull in the AMP Javascript from AMP Project. This is why you see that CDN in the “scrip-src” entry.

The most interesting one is “img-src”. The Bootstrap nav bar uses an icon with 3 horizontal lines on a button when the screen is a small size. This accesses the menu with the page links. For some reason Bootstrap made this an inline SVG image. The ‘data:’ option is necessary to allow this to display.

/etc/nginx/sites-available/nachtimwald.com

Even though I only have one site I still put it in a separate file because I might want to host something else. I have done so in the past. Or I might want to have a dedicated subdomin for something other than this blog in the future.

There are a total of 3 server blocks.

server {
    listen 80 default_server;
    listen [::]:80 default_server;
    server_name nachtimwald.com *.nachtimwald.com;
    return 301 https://nachtimwald.com$request_uri;
}

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name *.nachtimwald.com;

    include conf.d/ssl_nachtimwald.conf;

    return 301 https://nachtimwald.com$request_uri;
}

server {
    listen 443 ssl http2;
    listen [::]:443 ssl http2;
    server_name nachtimwald.com;

    error_log  /var/log/nginx/nachtimwald.com.error.log;
    access_log /var/log/nginx/nachtimwald.com.access.log;

    include conf.d/ssl_nachtimwald.conf;

    root /srv/http/nachtimwald.com;
    location / {
        index      index.html;
        error_page 404 /404.html;
    }
}

The first block (listen 80) will convert anything that’s coming in on port 80 to https. Normally, you’d se return 301 https://$host$request_uri; however I want to also convert the host to the main domain. I have a few additional domains such as niw.cx. I have the DNS for this domain pointing to this server so anything coming needs to be converted to the main domain.

The “server_name” is set to ’nachtimwald.com’ and ‘*.nachtimwald.com’ which implies this only converts nachtimwald.com requests but don’t let that fool you. This is also marked as the ‘default_server’ which is used if an unrecognized domain is requested. Such as, niw.cx

The next block is very similar but converts any subdomains for nachtimwald.com to the main domain. This requires a wildcard ssl certificate to work. The redirect when connecting via HTTPS will only be honored by a browser if the connection is secure. So a certificate for every subdomain is necessary. Hence needing a wildcard one.

Finally, the server block for the actual blog. This is pretty simple and points to the files on disk.

conf.d/ssl_nachtimwald.conf

This is the TLS configuration for the domain. It’s in a separate file because both the HTTPS redirect and site blocks need these parameters.

ssl_certificate /etc/letsencrypt/live/nachtimwald.com/fullchain.pem;
ssl_certificate_key /etc/letsencrypt/live/nachtimwald.com/privkey.pem;

ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
ssl_session_tickets off;

ssl_dhparam /etc/nginx/ssl/dhparam.pem;

ssl_prefer_server_ciphers on;
ssl_protocols TLSv1.1 TLSv1.2;
ssl_ciphers 'ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256:ECDHE-ECDSA-AES128-SHA:ECDHE-RSA-AES256-SHA384:ECDHE-RSA-AES128-SHA:ECDHE-ECDSA-AES256-SHA384:ECDHE-ECDSA-AES256-SHA:ECDHE-RSA-AES256-SHA:DHE-RSA-AES128-SHA256:DHE-RSA-AES128-SHA:DHE-RSA-AES256-SHA256:DHE-RSA-AES256-SHA:ECDHE-ECDSA-DES-CBC3-SHA:ECDHE-RSA-DES-CBC3-SHA:EDH-RSA-DES-CBC3-SHA:DES-CBC3-SHA:!DSS';
ssl_ecdh_curve secp384r1;

resolver 8.8.8.8 8.8.4.4;
ssl_stapling on;
ssl_stapling_verify on;
ssl_trusted_certificate /etc/letsencrypt/live/nachtimwald.com/fullchain.pem;

add_header Strict-Transport-Security "max-age=31536000; includeSubdomains; preload";

The Strict-Transport-Security must be in the TLS configuration and cannot be in the global server block! This tells the browser to only connect via HTTPS. If a browser connects via HTTP on port 80 it will evolute this directive before the redirect and not process the redirect. I’m not sure about a good way to have a redirect on port 80 and still telling the browsers it should only use HTTPS.

Single site config option

I’m using a wildcard certificate and I’m using a subdomain to domain redirect but not everyone wants to do that. Especially considering Lets Encrypt wildcard certificates require DNS verification.

If you don’t want to use a wildcard certificate and don’t want 443 subdomain redirects, then kill the 443 redirect block. And add this file with an include reference in the site block.

conf.d/letsencrypt.conf

location ^~ /.well-known/acme-challenge/ {
  allow all;
  root /var/lib/letsencrypt/;
  default_type "text/plain";
  try_files $uri =404;
}

Also create the directory “/var/lib/letsencrypt/.well-known/acme-challenge/”. This will allow certbot to verify the site using file on the web server using the webroot method. I don’t want to use the nginx auto config plugin because I don’t want my config changed.

Again, I do not have this configured because I’m using a wildcard certificate for subdomain redirection.

Enable and start nginx

Finally we need to get nginx up and running.

# systemctl enable nginx
# systemctl start nginx

At this point nginx will refuse to start. If you run the config valation.

nginx -t

It will complain that files (the TLS certificates) are missing. Right now you either need to disable the TLS support and redirects or generate a self signed certificate. This is temporary until real certificates can be generated by certbot.

However, realize at this point DNS needs to point to the server. Especially if you’re using the webroot method where Lets Encrypt will try to read a file from the web server.

TLS (Certbot)

Setting up certbot is straight forward.

It will ask you a variety of questions the first time you run. You can also use --dry-run to see what it will do without having it make any changes.

Note: Lets Encrypt only support wildcard certificates using DNS verification.

Generate the certificate

Before using the Linode DNS plugin you need to get an API key. Log into Linode and under account generate an API key with read and write access to domains. The key will only be allowed to make changes to DNS.

It’s very important to get a key from the old interface. Do not generate a key with the new cloud manager. The certbot Linode DNS plugin is using the v3 API interface. The classic Linode interface generates v3 keys and the cloud manager generates v4 keys. A v4 key cannot be used with the v3 API. The v3 API is deprecated and hopefully Lets Encrypt will update the plugin to the new v4 API soon.

Then the plugin needs to be configured. I put the plugin ini file in “/etc/letsencrypt/linode.ini”. Be sure to set the permission 600 because the file contains a sensitive API key.

Next I generated certificate.

certbot certonly --email domain@nachtimwald.com --dns-linode --dns-linode-credentials /etc/letsencrypt/linode.ini -d nachtimwald.com -d "*.nachtimwald.com"

Linode applies DNS record changes every 15 minutes. The record challenge changes with every request for certificates so there is a 20 minute delay when running this command while it ensures the record change has been applied. If you have sessions timeout enabled with SSH either use something like screen or every few minutes press space so there is activity and the session isn’t closed.

If using webroot

Webroot is a bit simplier to use because there are no API keys.

certbot certonly --email domain@nachtimwald.com --webroot -w /var/lib/letsencrypt/ -d nachtimwald.com

The -d option can be specified multiple time for multiple domains. However, you have to have each of the domain on the server. DNS has to point to the server and you have to have a sites-enabled for each one.

Auto renew

Lets Encrypt certificates have a short life and expire every few months. Thankfully certbot has a renew option which will essentially run the same commands used to generate the original certificate without interaction and pull down new certificates.

First a systemd service file needs to be created. Also, since nginx is being used a hook needs to be used to reload nginx so the new certificate will be used. The hook will only run if certificates are updated.

/etc/systemd/system/certbot.service

[Unit]
Description=Let's Encrypt renewal

[Service]
Type=oneshot
ExecStart=/usr/bin/certbot renew --quiet --agree-tos --deploy-hook "systemctl reload nginx.service"

/etc/systemd/system/certbot.timer

Lets Encrypt recommends checking for expired certificates twice a day. When certbot checks the certificates it checks the expiration and won’t do anything if it’s not within a certain time period. So this isn’t going to constantly go out to Lets Encrypt. I will only do so when the certificate is about to expire and it actually needs to be renewed.

[Unit]
Description=Twice daily renewal of Let's Encrypt's certificates

[Timer]
OnCalendar=0/12:00:00
RandomizedDelaySec=1h
Persistent=true

[Install]
WantedBy=timers.target

Finally, the timer needs to be enabled and started.

systemctl enable certbot.timer
systemctl start certbot.timer

rsync

On my computer after generating my Jekyll site I use rsync to push the files to the sever.

rsync -r --delete --checksum -e ssh _site/* USER@nachtimwald.com:/srv/http/nachtimwald.com/

Often times I blow out the Jekyll build to be sure development builds don’t interfere with production ones. This changes the timestamp on all of the files. I’m using the --checksum option which tell rsync not to use the file time and size, and instead to use a calculated checksum to determine if a file has changed.

Conclusion

Overall I’m happy with my new setup. Everything appears to be working and with such a simple setup I don’t think I’ll have to spend much time with system administration.

Getting everything setup properly was a very time consuming process but thankfully I don’t have to do it again anytime soon. And I have detailed notes on how I setup the system in case I need to do it again. If I can get 9 years out of this Arch Linux install (applying regular updates) I’ll be very happy.

Moving my blog back to Arch Linux on a Linode instance feels like meeting an old friend. While things have changed but it’s still familiar and pleasant. I shouldn’t have moved off of this setup a year ago.

Introduction#

WordPress.com A Bad Decision#

Mangled Posts#

Self Hosting Rational#

The High Level Setup#

Jekyll#

Linode#

Arch Linux Setup#

Time sync#

Update#

Packages#

Firewall#

Environment#

Adding a user#

2 Factor Authentication#

/etc/pam.d/su and /etc/pam.d/su-l#

/etc/pam.d/ssh#

SSH#

Options worth mentioning#

2 factor authentication#

nginx#

dhparam#

Directory layout#

nginx.conf#

/etc/nginx/sites-available/nachtimwald.com#

conf.d/ssl_nachtimwald.conf#

Single site config option#

Enable and start nginx#

TLS (Certbot)#

Generate the certificate#

If using webroot#

Auto renew#

/etc/systemd/system/certbot.service#

/etc/systemd/system/certbot.timer#

rsync#

Conclusion#