The author selected the Internet Archive to receive a donation as part of the Write for DOnations program.
GoAccess is a tool for monitoring web server logs in realtime. It’s written in C and uses the popular ncurses library for its dashboard interface, which can be accessed directly from the command-line.
This is great because you’re able to SSH into any web server you control and view or analyze relevant statistics quickly and securely. Apart from the command-line dashboard interface, it’s also capable of displaying the statistics in other formats such as HTML, JSON, and CSV, which you can use in other contexts or share with others.
GoAccess could also be a great alternative to client-side analytics tools depending on your needs. It analyzes your server logs directly, so you don’t need to load any additional scripts, and your data is completely under your control.
In this tutorial, you’ll install and configure GoAccess for Apache on an Ubuntu 20.04 web server. You’ll access the Apache log files with GoAccess before reviewing the modules available and navigation shortcuts on the command-line interface.
For this tutorial, you’ll need the following:
One Ubuntu 20.04 server. You can set it by following this initial server setup for Ubuntu 20.04 tutorial, including a non-root user with sudo privileges and a firewall.
Apache installed by following How To Install Apache on Ubuntu 20.04.
Step 1 — Installing GoAccess
In this step you’ll install the GoAccess tool and its dependencies.
Start by ensuring that the package database and system are up to date:
- sudo apt update
- sudo apt full-upgrade
Now it’s time to install GoAccess. A version of the tool is available in the Ubuntu repos, but this is not usually the latest stable version. For example, the latest version of GoAccess at the time of writing is 1.4, while the version available from the Ubuntu 20.04 repos is 1.3.
To ensure that you have the latest stable version of GoAccess installed on your server, you can compile from source or use the official GoAccess repository on Ubuntu.
Method 1 — Compiling from source
First, install the dependencies required to compile GoAccess from source:
- sudo apt install libncursesw5-dev libgeoip-dev libtokyocabinet-dev build-essential
You install the following dependencies:
build-essential: installs many packages, which includes
gcccompilers for C, C+, and other programming languages, and
makefor building the GoAccess makefile.
libncursesw5-dev: installs the ncurses library that GoAccess uses for its command-line interface.
libgeoip-dev: includes the necessary files for the GeoIP library.
libtokyocabinet-dev: provides database dependencies for higher performance.
Next, download the latest version of the GoAccess from their official website with the following command:
- wget http://tar.goaccess.io/goaccess-1.4.tar.gz
Once the download completes, extract the archive with:
- tar -xzvf goaccess-1.4.tar.gz
Change into the newly unpacked directory like this:
- cd goaccess-1.4/
Run the configure script found inside this directory:
- ./configure --enable-utf8 --enable-geoip=legacy
--enable-utf8 flag ensures GoAccess compiles with wide character support, while
--enable-geoip enables GeoLocation support with the original GeoIP databases. You can replace
mmdb to use the enhanced GeoIP2 databases instead. You can find other configuration options on the GoAccess website.
You’ll receive output similar to the following:
Output. . . Your build configuration: Prefix : /usr/local Package : goaccess Version : 1.4 Compiler flags : -pthread Linker flags : -lnsl -lncursesw -lGeoIP -lpthread UTF-8 support : yes Dynamic buffer : no Geolocation : GeoIP Legacy Storage method : In-Memory with On-Disk Persitance Storage TLS/SSL : no Bugs : [email protected]
make command to build the makefile required for installing GoAccess:
Finally, install GoAccess using the previously created makefile to the system:
- sudo make install
Ensure that the program was installed successfully by running:
- goaccess --version
You will receive the following output:
OutputGoAccess - 1.4. For more details visit: http://goaccess.io Copyright (C) 2009-2020 by Gerardo Orellana Build configure arguments: --enable-utf8 --enable-geoip=legacy
Method 2 — Using the Official GoAccess Repos
Another way to install GoAccess is by using the official Ubuntu repository for the program. This method is preferable if you’d like it to be updated to a newer version automatically during system upgrades without having to compile from source for each new release. You need to add the repository to your server first:
- echo "deb http://deb.goaccess.io/ $(lsb_release -cs) main" | sudo tee -a /etc/apt/sources.list.d/goaccess.list
First you get the release name of the distribution and then pipe that to
tee, which appends to the file
With the repository in your sources list, you can now download the GPG key to verify the signature:
- wget -O - https://deb.goaccess.io/gnugpg.key | sudo apt-key --keyring /etc/apt/trusted.gpg.d/goaccess.gpg add -
Next, update the package database with the following command:
- sudo apt update
Finally, install GoAccess:
- sudo apt install goaccess
GoAccess is now installed on your Ubuntu server. In the next step, you’ll access and edit its configuration file so that you can make changes to how the program runs.
Step 2 — Editing the GoAccess Configuration
GoAccess comes with a configuration file where you can make permanent changes to the behavior of the program. You’ll edit this file to specify the time, date, and log format so that GoAccess knows how to parse the server logs.
The configuration file may be located at
%sysconfdir% is either
/usr/local/etc/. To find out where the config file is located on your server, run the following command:
- goaccess --dcf
Edit this config file using
- sudo nano /etc/goaccess/goaccess.conf
Note: If this file does not exist on the server, ensure to create it first and populate it with the contents of the
goaccess.conf file on GitHub.
Many of the lines in the file are commented out. To enable an option, remove the first
# character in front of it. Let’s enable the
time-format setting for Apache first. This setting specifies the
log-format time and allows GoAccess to parse any plain-text Apache log files that meet the supported formatting criteria.
# The following time format works with any of the # Apache/NGINX's log formats below. # time-format %H:%M:%S
Next, you’ll uncomment the Apache
date-format setting that specifies the
# The following date format works with any of the # Apache/NGINX's log formats below. # date-format %d/%b/%Y
Finally, uncomment the
log-format setting. Several lines change this setting and the exact one to uncomment depends on the way your web server is set up. If you have a non-virtual hosts setup, uncomment the following
# NCSA Combined Log Format log-format %h %^[%d:%t %^] "%r" %s %b "%R" "%u"
Otherwise, if you have virtual hosts set up, uncomment the following line instead:
# NCSA Combined Log Format with Virtual Host log-format %v:%^ %h %^[%d:%t %^] "%r" %s %b "%R" "%u"
At this point, you can save the file and exit the editor. You are now ready to run the GoAccess program and analyze some Apache plain-text log files.
Step 3 — Accessing Apache’s Log Files with GoAccess
The Apache server grants access to your website and keeps an access log for all incoming HTTP traffic. These records, or log files, are stored on the system and can be a valuable source of information about your website’s usage and audience.
On Ubuntu, the Apache log files are stored in the
/var/log/apache2 directory by default. To inspect the contents of this directory, run the following command:
- sudo ls /var/log/apache2
Sample outputaccess.log error.log other_vhosts_access.log
If your server has been running for a long time, you may find compressed
.gz files in this directory containing past log files as a result of log rotation. The most recent logs are placed in an
access.log file. For web servers with virtual hosts, you may have to
cd into sub-directories from within the
/apache2 directory to locate each host’s log files.
Let’s go ahead and run GoAccess against the Apache access logs to gain insight into what type of traffic is being handled by the web server. Run the following command to analyze your
access.log file with GoAccess:
- sudo goaccess /var/log/apache2/access.log
This will launch the GoAccess command-line dashboard.
Note: If you see a Log Format Configuration prompt instead, it means that the changes you made to the GoAccess config file in the previous step are not taking effect. Ensure that your the config file is in the right place and that you have uncommented the necessary settings.
As mentioned previously, you will sometimes have several compressed log files on a long-running web server. To run GoAccess on all these files without extracting them first, you can pipe the output of the
zcat command to
- zcat /var/log/apache2/access.log.*.gz | goaccess -a
Next you’ll learn how to quickly navigate through the dashboard interface with keyboard shortcuts.
Step 4 — Navigating the Terminal Dashboard
At the top of the dashboard is a summary of several key metrics. This includes total requests for the reporting period, unique visitors, log size, 404 not found errors, requested files, size of the parsed log file, HTTP referrers, name of the log source, time taken to process the log file, and more.
Below the top panel, you will find all the available modules which provide more details on the aforementioned metrics and other data points supported by GoAccess. To navigate the interface, use the following keyboard shortcuts:
TABto move forward through the available modules and
SHIFT+TABto move backwards.
F5to refresh the dashboard.
gto move to the top of the dashboard screen and
Gto move to the last item in the dashboard.
ENTERto expand the selected module.
kto scroll down and up within the active module.
sto display the sort options for the active module.
/to search across all modules and
nto move to the next match.
SHIFT+0to quickly activate the respective numbered module.
?to view the quick help dialog.
qto quit the program.
Let’s examine each of the available modules on the dashboard next. Each one has a number and a title, and an indication of the total number of lines present. The
> character indicates the active panel, which is also reflected at the top of the dashboard.
Here’s a brief explanation of each of the panels. Each section below correspond to the panel number and title in the program.
1 — Unique Visitors per Day
This panel displays the hits, unique visitors, and cumulative bandwidth for each reported date. A unique visitor is considered to be one with the same IP address, date, and user-agent. It includes web crawlers and spiders by default.
2 – Requested Files (URLs)
This panel provides the statistics concerning the most highly requested non-static files on your web server. It displays the request path, HTTP protocol and method, unique visitors, number of hits, and cumulative bandwidth.
3 — Static Requests
4 — Not Found URLs (404s)
This panel also displays the same metrics discussed in 2 and 3, but for paths that were not found on the server (404s).
5 — Visitor Hostnames and IPs
This panel provides detailed information on the hosts that connect to your web server. You can find their IP address, the number of visits, and the amount of bandwidth consumed. This is a great way to identify who is eating up all your bandwidth and block them if necessary.
If you expand this panel by pressing
o, you will see more info about each host such as its country of origin, city, and reverse DNS lookup result.
6 — Operating Systems
This panel reports the different operating systems used by the hosts to connect to your web server. Expanding this panel will display specific versions of each operating system.
7 — Browsers
Similar to the previous panel, this reports the browsers used by each unique visitor to your web server and lists specific versions for each browser once expanded.
8 — Time distribution
Here, you will find an hourly report for the number of hits, unique visitors, and bandwidth consumed. This is a great way to spot periods of peak traffic on your server.
9 — Virtual Hosts
This panel displays the virtual hosts parsed from the log file. It becomes active only if
%v is included in the log-format configuration.
10 — Referrer URLs
The URLs that referred the visiting hosts to your web server are reflected here. This panel is disabled by default and can only be enabled by commenting out the
REFERRERS line highlighted following in the GoAccess config file:
#ignore-panel VISIT_TIMES #ignore-panel VIRTUAL_HOSTS #ignore-panel REFERRERS #ignore-panel REFERRING_SITES
11 — Referring Sites
This panel displays the IP address of the referring hosts, but not the whole URL.
12 — Keyphrases
Here, the keywords used on Google search, Google cache, and Google translate that led to your website are reported. This panel is also disabled by default and must be enabled in the settings:
#ignore-panel REFERRERS #ignore-panel REFERRING_SITES #ignore-panel KEYPHRASES #ignore-panel STATUS_CODES
13 — HTTP Status Codes
This panel reflects the overall statistics for HTTP status codes returned by your web server when responding to a request. Expanding the panel will display the aggregated stats for each status code.
14 — Remote User (HTTP Authentication)
This panel displays the user ID of the person requesting a document on your server as determined by HTTP authentication. For documents that are not password protected, this part will be
-. Note that this panel is only enabled if
%e is part of the log-format configuration.
15 — Cache status
This panel allows you to determine if a request is being cached and served from the cache. It is enabled if
%C is part of the log-format variable, and the status could be
16 — Geo Location
This panel provides a summary of the geographical locations derived from visiting IP addresses. Expanding this panel will display the aggregated stats for each country of origin.
You’ve reviewed the panels available in the dashboard, now you’ll generate reports in different formats.
Step 5 — Generating Reports
Aside from displaying the data in the terminal, GoAccess also allows you to generate HTML, JSON, or CSV reports. Make sure that you’re in the home directory before running any of the commands in this section:
- cd ~
To output the report as static HTML, specify an HTML file as the argument to the
-o flag. This flag also accepts filenames that end in
- sudo goaccess /var/log/apache2/access.log -o stats.html
stats.html file should appear in your user directory.
Outputgoaccess-1.4 goaccess-1.4.tar.gz snap stats.html
You can copy this file to the user directory on your local machine using
scp. Run this command from your local machine, and not the remote server:
- scp [email protected]_server_ip:stats.html ~/stats.html
Once the file has been copied over, you can open it in your browser with the
open command on macOS:
- open ~/stats.html
Or if you’re using a Linux distribution on your local machine:
- xdg-open ~/stats.html
You’ve generated a HTML report and viewed this in your browser.
In this article, we covered the GoAccess command-line tool and discussed how to use it for analyzing server logs. Although we only considered how GoAccess may be used with Apache logs, the tool also supports other log formats such as Nginx, Amazon S3, Elastic Load Balancing, and CloudFront.
You can check the full GoAccess documentation or run
man goaccess in your terminal.