Category Archives: Data Modeling

Monitor: Graphite, Collectd, Statsd and Graphana

Dear Reader,

Need to monitor many computers at the same time? Are you worried about a pegged CPU, memory paging, lots of swap space activity or low disk space? What about high network traffic? These things are readily available on Amazon through AWS CloudWatch.

I decided to implement 1 of 2 major monitoring stacks. The first stack is: Logstash, Elasticsearch and Kibana (ELK stack). The second group of technologies is: Collectd/Statsd, Carbon/Whisper and Graphite/Graphana.  I implemented the later.

My pre-mature conclusion is:

  1. ELK stack: Is great for text heavy documents.  Elastic Search is based on Lucene, a popular open-source search engine.
  2. Graphana/Carbon: is great for time series analysis with an emphasis on near realtime data feeds.  Collectd provides a convenient plugin/interface to operating system statistics.

Below is the Front-end component monitoring the monitoring computer (not WordPress):

Graphana Server

Graphana Server


The components of the system are:

  1. Collectd – data collection software with a plugin architecture.  Common plugins seen above include: CPU, Memory, Disk Usage, Processes, Network Traffic, Apache metrics and much more.
  2. Statsd – Used more for application monitoring.  You can send custom metrics based on set intervals.  The common 4 metrics I saw: gauges, counts, sets and intervals.  The first one takes the last of measurement within an interval and reports it.  Counts aggregate data over 10 seconds.  Sets return a unique count of values encountered (via UDP).  Intervals are time-based calculations (like rates).
  3. Carbon/Whisper – A data store focused on time series aggregation.  Data is aggregated based on two configuration files.  The first sets up regex matches (used to categorize TCP traffic) followed by a data retention policies as well as specifying polling time-frames (typically in seconds or minutes).  The second file specifies the type of aggregations available: sum, min, and max.
  4. Graphite/Graphana – Graphite is a front-end dashboard tool for carbon.   Graphana serves a similar role, but with a sleek black UI (featured above).  Both are similar to Kibana.  They both provided dashboard and graphing utilities to their respective data stores.  With Graphana being able to access other sources (like ElasticSearch).

For my Graphana localhost dashboard, I used CPU, Memory, DF and Processes modules from Collectd.

If interested in trying out the service, I can recommend the following 4 articles (Ubuntu):

digital ocean: metric tracking tutorial

For Graphana, you have to follow these instructions:

Graphana Install




Interactive Programming Diagrams

Python Tutors


Philip J. Guo is a professor of Cognitive Science at UC San Diego, who focuses on teaching programming at interactively online.  He has produced great products for both programming beginners and those interested in Python internals (10-hour lecture).

He impressed me with Python Tutors website: python tutors. Python Tutors is: Python, Java, JavaScript, TypeScript, Ruby, C and C++.  Python tutors converts Python written on the left into data structures on the right.  This provides a deeper view of what Python is actually doing.  The diagrams are step-based, which means you can see the execution of each line of code and what happens beneath the covers.  Red and Green arrows show you the line to be executed and the line being executed.  An example below.

Example (click image to go to interactive page):

Related Posts:

introduction to python – health innovators class

great python books – beginner to intermediate

Happy Holidays,


Data as a Fitness Tool

Recently, I’ve been focusing on improving my life by becoming more fit and eating healthy.  I’ve read somewhere that those who record what they eat and dedicate at least 4-5 hours of time in the gym tend to lose more weight and maintain weight for longer periods of time.  I think it’s more about changing your attitude to it and realizing that being healthy is a life style.

Since I am a data nerd of sorts, I’ve come to the conclusion that it would be fun to approach this as a data management problem.  The first thing I did was get an app called myFitnessPal, which you can download on both Android and IPhone (I have IPhone).  It then provides a relatively simple interface to log food as well as a database/search utility to look up food.  The app also has the ability to monitor exercise and water intake.  I decided to keep track of everything I consumed since mid-November and have kept it up for around 50 some odd days now.

To make this more interesting, I also produced a google spreadsheet containing projections (a technique I learned as a fraud analyst) and used this to project weight in the future.  The good news the amount of calories I’ve lost around 40,000 kcal translates to a loss that is significantly lower then the 17 lb that I’ve lost so far (11 lb projection).  The bad news is the projections I built are definitely off, typically by about 1-2 weeks.

The way I built the projections is to use BMR calculation to get a base burn rate (before exercise of any sort).  I then added food and exercise calculations to this.  I take an average (of a few days) that updates every day to get a general sense of net loss rates and then apply that to the future based on the last weight measurement.  It’s been off by one week and one pound, which isn’t too bad.

Of course, like any good business or goal, good quality data matters and so I have extra motivation to as my nutritionist states treat the matter like an accountant.  That motivation ends up translating into more accountability.

I act liberal with food measurements and conservative with exercise calories.  It’s better to be safe then sorry when considering margins of error.  One thing I try not to compromise on is getting to the gym or some physically intense event (like Salsa dancing) 3 times a week for about 1-1.5 hours.  I force myself to do the activity even if I end up just walking for the duration.

Right now, I’m hoping to keep up this habit and see the results in a few months.  I’ve already changed how I view exercise and am looking at potential programs or new ways to exercise (in a more social manner) to reduce things like fatigue (too much cardio in between two days).  It seems that by making exercise a habit I’m forced to deal with new problems, which require new solutions and experiences.  Definitely seeing benefits from doing this.