Web Log Analysis Tutorial – Lesson 3 : Mining Gold with Filters

Table of Contents

  1. Introduction
  2. Why should I filter data?
  3. Hit Filter & Visit Filter
  4. Wildcards
  5. How to exclude spiders and bots data?
  6. How do I exclude my internal traffic from reports?

I. Introduction

Filters hold the key to unlocking the power of Nihuo Web Log Analyzer. Understanding how filters work will help you to get the most out of your
reports. Filters allow you to limit the scope of Nihuo Web Log Analyzer’s analysis to specific parts of your site, providing only the most important information in reports to make reports more readable or relevant.

II. Why should I filter data?

Filtering or excluding certain data from your reports is important to ensure your data reports are accurate. If you want to measure performance
of your promotion plan and you do not set up any filters, all those visits from your web team are going to have a negative effect on the reported data. Generally it is a good idea to filter as many known people and domains as possible to ensure the reported data is as accurate as possible.

You can create extra profiles of the same website and filter specific traffic to only appear in all reports of that profile.

III. Hit Filter & Visit Filter

Nihuo Web Log Analyzer provides two filter types: Hit Filter and Visit Filter.

Hit filters include or exclude raw data generated by individual actions on a web site.

Visit filters include or exclude all the data in a visitor session.

Multiple filters can be combined in a profile with boolean operator ( AND, OR , NOT ).

Here’s the correct list for all the currently available filters and how Nihuo categorizes them.

1. Hit Filter

  • Advertising
  • Agent
  • Authenticate User
  • Browser
  • Client Host Country
  • Client Host Domain
  • Client Host IP
  • Cookie
  • Day Of Week
  • File Type
  • HTTP Method
  • OS
  • Referrer
  • Requested File
  • Return Code
  • Spider
  • Stolen Object
  • Time
  • URL Parameter
  • Virtual Domain

2. Visit Filter

  • Visitor with specified entry page
  • Visitors with specified exit page
  • Visitors who came from specify referrer
  • Visitors who accessed specified file
  • Visitors who accessed specified file type
  • Visit Depth
  • Visitors who came from specify search phrase

For step-by-step instructions on creating filters, please refer to http://loganalyzer.net/tutorial.html#filter.

IV. Wildcards

Nihuo Web Log Analyzer support using wildcards in hit parameters.

Here are the wildcards supported by Nihuo Web Log Analyzer:

Wildcard
Matches
?
any character (only one)
*
zero or more characters (any characters)

1. An example using *

Let’s say you want exclude all files and subdirectory below /admin/ from reports.

Just create exclude Requested File hit filer and input below parameter:

/admin/*

2. An example using ?

Let’s say you have several files on your website:

file1.htm

file2.htm

file3.htm

file35.htm

Let’s say we use this wildcard file name:

file?.htm

The ? matches a single character. The wildcard file name above means

“match any filename which starts with file, is followed by a single character, and then the .htm extension follows”. This wildcard will select:

file1.htm

file2.htm

file3.htm

… but it will NOT select:

file35.htm

because it has two characters, instead of one, between file and .htm.

But if you specify the wildcard:

file??.htm

then only

file35.htm

will be selected, because the wildcard file name specifies that there must be 2 characters between file and .htm

For more detail information about wildcards, please refer to http://www.loganalyzer.net/log-analysis-tutorial/how-to-use-wildcards.html

V.How to exclude spiders and bots data?

If you want to exclude spiders and bots traffic from appearing in your reports, you can use spider hit filter to filter out visits from spiders and bots.

To exclude spiders and bots:

  1. Right click profile and select Edit from menu
  2. Select Hit Filter page
  3. Click Template button and select Exclude all spiders from menu
  4. Click OK button
  5. Clearing database is required ( Right click profile and select Clear database from menu)
  6. Re-analyze ( Right click profile and select Analyze from menu)

You may also use agent filter to filter out visits from particular spiders and bots which aren’t recognized by Nihuo Web Log Analyzer.

For example: exclude agent string “Microsoft Data Access Internet Publishing Provider DAV 1.1″

  1. Right click profile and select Edit from menu
  2. Select Hit Filter page
  3. Click And button and select Agent from menu
  4. Input “Microsoft Data Access Internet Publishing Provider DAV 1.1″ ( include double quotes ).
  5. Click OK button
  6. Right click the filter node and select Not from context menu
  7. Click OK button
  8. Clearing database
  9. Re-analyze

VI.How do I exclude my internal traffic from reports?

If you want to exclude internal traffic from appearing in your reports, you can filter out a specific IP address or a range of IP addresses. You can also use cookies to filter out visits from particular users. We’ll explain how below.

To exclude by IP address:

  1. Right click profile and select Edit from menu
  2. Select Hit Filter page
  3. Click And button and select Client Host IP from menu
  4. Enter correct IP range value and click OK button
  5. Right click the filter node and select Not from context menu
  6. Click OK button
  7. Clearing database
  8. Re-analyze

To exclude traffic by Cookie Content:

To exclude traffic from dynamic IP addresses, you can use a JavaScript function to set a cookie on your internal computers. You’ll then be able to filter all visitors with this cookies from appearing on your Analytics reports. How to exclude traffic by cookie:

  1. Create a new page on your domain, containing the following code:
    < script > function SetCookie(cookieName, cookieValue, nDays) {

    var today = new Date();

    var expire = new Date();

    if (nDays == null || nDays == 0)
    nDays = 1;

    expire.setTime(today.getTime() + 3600000 * 24 * nDays);

    document.cookie = cookieName + “=” + escape(cookieValue) +
    “; expires=” + expire.toGMTString();

    }

    </script>

    (Please note you must ensure cookie field had been exported in log files.)

  2. In order to set the cookie, visit your newly created page from all computers that you would like to exclude from your reports.
  3. Launch Nihuo Web Log Analyzer and right click profile and select Edit from menu
  4. Select Hit Filter page
  5. Click And button and select Cookie from menu
  6. Enter “test_value” and click OK button
  7. Right click the filter node and select Not from context menu
  8. Click OK button
  9. Clearing database
  10. Re-analyze

Post to Twitter Post to Yahoo Buzz Post to Delicious Post to Digg Post to Facebook

Tags: ,

Web Log Analysis Tutorial – Lesson 2: Basic concept of web log analysis

Table of Contents

  1. Hits and Visits
  2. Page views
  3. Bandwidth
  4. Web Spider
  5. Stolen object
  6. Unique Visitors
  7. Session
  8. Referrer
  9. Bounce rate

I. Hits and Visits

A log entry will generate a “Hit” on the web server. This can include pages, images, animations, audio, video, downloads, PDF or Word documents or anything else that you allow visitors to access. When a web browser loads a page, it also loads all the components referenced by that page. For example, if a web page contains 5 images, a visit on that page will generate 6 “Hits” on the web server, one hit for the web page, 5 hits for the images.

A unique visitor is determined by the IP address or cookie. By default, a visit session is terminated when a user falls on inactive state for more than 30 minutes. So a unique visitor may visit your web site twice and get reported as two visits.

If the visitor left the web site and came back 30 minutes later, Nihuo Web Log Analyzer will report 2 visits. If the visitor came back within 30 minutes, Nihuo Web Log Analyzer will still report 1 visit.

II. Page views

Page is any file or content delivered by a web server that would generally be considered a web document. This includes HTML pages (.html, .htm, .shtml), script-generated pages (.cgi, .asp, .cfm, etc.). Image
files (.jpeg, .gif, .png), javascript (.js) and style sheets (.css) are generally not considered to be pages.

A page view (PV) or page impression is a request to load a single page of an Internet site. On the World Wide Web a page request would result from a web surfer clicking on a link on another HTML page pointing to
the page in question. This should be contrasted with a hit, which refers to a request for a file from a web server. There may therefore be many hits per page view since a page can be made up of multiple files.

III. Bandwidth

Measure (in kilobytes of data transferred) of the traffic on a site. If you are billed for bandwidth usage on a monthly basis you can see an estimate of the amount of bandwidth your web site used in the General Statistics report.

IV. Web Spider

Web spider is a program used by search engines, also known as a crawler or robot, searches the internet scanning web pages to include in the search engines index. All activities caused by web spiders will also be recorded into web log files.

V. Stolen object

Stolen object report reveals cases in which your images and other non-page objects have been embedded in, or directly linked to by, pages on other web sites. This does NOT mean that the files have been stolen in any legal sense. It does, however, mean that your content is being displayed, heard or shown outside the context of your own web pages.

For example, if an outside site places this code in a popular web page:

<img src=”http://www.yoursite.com/yourpicture.jpg”>

Then your image will be displayed thousands of times, possibly without any attribution or permission on your part. This report is extremely valuable in identifying such situations.

VI. Unique Visitors

The number of individuals who visit a web site during a specific time. The same person visiting twice is only counted once.

VII. Session

A period of interaction between a visitor’s browser and a particular web site, ending when the browser is closed or shut down, or when the user has been inactive on that site for a specified period of time.

For the purpose of Nihuo Web Log Analyzer reports, a session is considered to have ended if the user has been inactive on the site for 30 minutes. You can update this setting in Option dialog.

VIII. Referrer

An http referrer or referrer is anything online that drives visits and
visitors to your Web site.
This can include:

  • search engines
  • blogs
  • link lists
  • banner ads
  • email
  • affiliate links
  • links built into software

Technically, even offline sources like print ads or references in books or magazines are referrers, but these aren’t specifically captured in
the server referrer log. When a Web developer uses the term “referrer” she means those sites or services that are referenced in the Web server
logs.

IX. Bounce rate

It essentially represents the percentage of initial visitors to a site who “bounce” away to a different site, rather than continue on to other pages within the same site.

The formula used to calculate bounce rate is:

Bounce Rate = Total Number of Single-Page Visitors / Total Number of Visitors

A bounce occurs when a web site visitor only views a single page on a website, that is, the visitor leaves a site without visiting any other pages before a specified session-timeout occurs. There is no industry-standard minimum or maximum time by which a visitor must leave in order for a bounce to occur. Rather, this is determined by the session timeout of the analytics tracking software.

Post to Twitter Post to Yahoo Buzz Post to Delicious Post to Digg Post to Facebook

Tags: ,

Web Log Analysis Tutorial – Lesson 1 : Getting Started with Nihuo Web Log Analyzer

Table of Contents

  1. Introduction
  2. Download and install
  3. Creating your 1st analysis task
  4. Web Log Format
  5. Related learning resources

I. Introduction

This tutorial is your starting point for learning web log analysis. It
shows you some of the things you can discover about your visitors
through analysis of your web site logs. It uses Nihuo Web Log Analyzer
Windows version to provide examples of reports, but the knowledge gained
can be applied to Nihuo Web Log Analyzer Linux version and any other
traffic analysis tool.

II. Download and install

If you have not downloaded Nihuo Web Log Analyzer, please download and
install the latest version from

http://www.loganalyzer.net/download.html, before proceeding with this
tutorial.

III. Creating your 1st analysis task

1. Where can I find my IIS log files?

To determine where your IIS log files are stored, please follow below
guides step by step on your server:

  1. Go to Start -> Control Panel -> Administrative Tools
  2. Run Internet Information Services (IIS).
  3. Find your Web site under the tree on the left.
  4. If your server is IIS7
    1. Click Logging icon on the right
    2. On the bottom of logging page, you will see a box that contains
      the log file directory
  5. If your server is IIS 6
    1. Right-click on it and choose Properties.
    2. On the Web site tab, you will see an option near the bottom that
      says “Active Log Format” Click on the Properties button.

    3. At the bottom of the General Properties tab, you will see a box
      that contains the log file directory and the log file name.

2. Where can I find my Apache access log files?

The location and content of the access log are controlled by the
CustomLog directive. Default apache access log file location:

  • RHEL / Red Hat / CentOS / Fedora Linux Apache access file
    location – /var/log/httpd/access_log
  • Debian / Ubuntu Linux Apache access log file location -
    /var/log/apache2/access.log
  • FreeBSD Apache access log file location -
    /var/log/httpd-access.log

To find exact apache log file location, you can use grep command:

  • grep CustomLog /usr/local/etc/apache22/httpd.conf
  • grep CustomLog /etc/apache2/apache2.conf
  • grep CustomLog /etc/httpd/conf/httpd.conf

Sample output:

a CustomLog directive (see below)

CustomLog “/var/log/httpd-access.log” common

CustomLog “/var/log/httpd-access.log” combined

3. How to create my first analysis task?

Please visit online flash step by step tutorial in http://loganalyzer.net/log-analysis-tutorial/creating-project.html.

IV. Web Log Format

It is critical to set up your web server logging in a format that allows
Nihuo Web Log Analyzer to properly interpret the data and produce fully
detailed reporting.

1. Apache

By default, Apache generally logs in what’s called common log format,
and also provides an option to log in a more detailed format known as NCSA extended/combined log format. For optimal reporting, Nihuo strongly
recommend the NCSA extended/combined format. NCSA custom log format can
be analyzed by Nihuo Web Log Analyzer too.

2. Microsoft Internet Information Server (IIS)

Nihuo Web Log Analyzer can provide very basic reporting if your IIS log
files have, at the very least, the following fields:

  • date
  • time
  • c-ip
  • cs-uri-stem
  • sc-status
  • sc-bytes

However, this minimal logging does not provide enough information for
Referral and Browser reporting. Therefore it is advisable to set more
detailed logging properties for your IIS server.

For more detail report, please export following fields in your IIS log
files:

  • c-ip
  • cs-method
  • cs-host
  • cs-uri-stem
  • cs-uri-query
  • sc-status
  • sc-bytes
  • time-taken
  • cs(referer)
  • cs(user-agent)
  • cs(cookie)
  • cs-username
  • date
  • time
  • s-ip
  • s-port
  • sc-win32-status
  • sc-substatus
  • s-sitename
  • s-computername

V. Related learning resources

Post to Twitter Post to Yahoo Buzz Post to Delicious Post to Digg Post to Facebook

Tags: ,