Bayesian Counters 0.1.0

Development Environment Tutorial For CentOS 6.3 x86_64 Workstation

Download PDF Tutorial

By Alex Kozlov and Chris Poulin \\ Testing by Daniel Rule and Ken Krugler
January 31, 2013

1.0 Introduction

Bayesian counters (B-counts) is a framework for on-line near real time model building and prediction. It can be used to identify correlations in the data, and as a library used to respond to unusual or rare events. The underlying technology for B-counts is HBase, a highly scalable and fault tolerant key-value map storage engine. The solution can scale to thousands of nodes and billions of features. Finally, the initial prediction algorithm is Naïve Bayes (NB). The framework is currently being extended to incorporate Nearest Neighbors (NN) and a general Bayesian Network (BN) learning algorithms.

2.0 The Audience

The steps in this tutorial are highly detailed and aim for optimal repeatability at the time of this writing, however the audience must have Linux literacy either by experience, formal training or education and have a strong understanding of computer and network security. Finally, this tutorial does not cover statistical analysis aspects of the solution.

3.0 The Goal

Preparing a development environment is usually a complex task but leads to powerful results and strong capabilities. This tutorial will attempt to make this task as painless and repeatable as possible.

4.0 Provisioning

4.1 Virus Risk Warning

It is the responsibility of the customers to check every download mentioned in this document for signature verification, run MD5 checks and virus scans and any other steps to ensure that no download poses a risk to the customer’s trusted network. It is also the customer’s responsibility to ensure that network security, firewalls, network level port blockage are correctly configured for the trusted network specified in this document. Both the servers and network referenced in this tutorial must be provisioned entirely for the purpose of learning from, experimenting and completing this tutorial and must never be adjacent too, or in any way share resources with production or otherwise mission critical environments.

4.2 Software Archive Warning

It is the responsibility of the customer to maintain archives of all software specified in this document as the URL, URI’s, IP Addresses or other external references specified in this document may be come invalid at any time without notice.