Category Archives: Online Tutorials

Data Camp: Week Adventure into Python Data Science

Image result for data camp

Introduction:

Noah, a former TechTarget colleague, mentioned  datacamp.com  to me, while we were discussing an issue with remotely hosted files (probably rdp associated) and pandas DataFrames.  On February 10th, I decided to give it a try and today, 6 days later, I finished up the Python Programmer Track (10 classes)!

What is Data Camp?  Data Camp is an online education company that offers data science specific courses.  They focus mostly on two lines of technology: R and Python.  They offer some auxiliary courses in other topics: SQL, Linux and git (as well as a few derivatives).  All the courses are hosted on their online platform, which overall is beautiful to interact with.  A bit more about the topics:

Data Camp Topics:

R – Statistical Programming Language

A statistical programming language inspired by LISP and developed by professors at the University of Auckland.  R has become famous amongst statistics and machine learning researchers where they prototype cutting-edge research and provide it as packages to CRAN, a online index for hosting R code.

Python – General Purpose Language

A general purpose programming language developed by Guido van Rossum.  It’s a dynamically typed, interpreted language that has been used for rapid prototyping and scripting.  Python as a general purpose programming language that can be used for: web development, networking, devops, data analysis and machine learning.  Python has gained popularity over the last few years with increased interest in it’s scientific computing platform.  Pandas, a popular Data Science framework, was inspired by R’s data frame.

SQL – Structured Query Language

Structured Query Language, SQL, is a domain-specific language typically associated with relational database management systems (RDBMS aka SQL databases), but has been coopted by other solutions (BI platforms, Hadoop and Spark).  SQL is easy to learn and revolves around a few common entities: Servers, Databases and Tables (Views etc).  Tables can be viewed as tabular data where rows are entities (employee) and columns are attributes of the entity (salary).  Most SQL is used to retrieve data hosted in tables mediated by RDBMS.

Context

R and Python are used to manipulate data in a procedural way, typically through the use of things like Data Frames (R), Pandas (Python) or Numpy (Python).  These libraries create tabular, matrix, vector or scalar data often rely on vectorized operations to do computation.  Both R and Python have extensive data visualization frameworks to create graphs.  They also have great libraries for statistics, machine learning and artificial intelligence.  SQL is primary associated with data retrieval and is used to get “data” back from a hard drive (through the database).  Most database are more limited in functionality when it concerns more finite data manipulation, graphing and machine learning (though extensions do exist).  They make up for it by being able to store large quantities of data without relying extensively on memory (volatile and limited).  Generally, most software engineers and analysts use both a programming language like (R/Python) and SQL (to access data).

Data Camp:

Summary:

Data Camp breaks down Python/R courses on career tracks: Python Programmer, Data Analyst and Data Scientist.  Each track is composed of a number of courses: 10, 13 and 20 respectively.  Topics covered: basic programming, data manipulation techniques, graphing, statistics, machine learning, network analysis and ai (1 course).

Each course is composed of 3-5 segments.  Each segment has a set of lectures followed by exercises.  You can expect around 3 lectures and 10 exercises per segment.  Most courses build on themselves intuitively beginning with the basics and gradually building up in complexity.  I was surprised to find lectures on generators, closures and how they relate to data frames within the first 4 classes.

Lectures:

datacamp lecture

Lecture portion of the website is beautiful.  Presentations have nice transitions and the website background is not distracting.  Lecturers were clear and easy to follow.  You get a sense that al to of effort was put into the curriculum.  There were multiple lecturers in the 10 courses I took (around 5-6).  Some of these lecturers work for esteemed companies like Anaconda, published books on the topic they spoke on or had a background in software engineering/consulting.  Overall, the lectures were high quality with very few mistakes.

Practice Sets:

datacamp practice problems.

The practice problems were conducted in what looks like a modified ipython notebook embedded in the website.  They have 4 panels, the 2 on the left: exercise and instructions and 2 on the right: Scrapt.py and ipython Shell.

The exercise and instructions provide guidance on how to complete the exercise.  The Exercise section explains the topic.  The instructions tell you what steps need to be completed before submission is excepted.

Script.py is where you write your code.  The run code button let’s you execute script.py and see the output in the IPYTHON SHELL.  It’s pretty interactive.  When you submit answer, it checks if the solution matches the instruction section.  For the most part, there are few cases where code I submitted was marked wrong when it was in fact right.  That’s great!

What if you get stuck on a programming problem?  There is a button called hint.  This provides some extra guidance typically in the form of small chunks of code.  If you press the hint button, a new button called show solution will appear.  Clicking the show solution button will overwrite the SCRIPT.PY window with the correct solution, which you can then run and submit.

In rare occasions, you might end up with a multiple choice question.  They offer an interactive shell in this case.  It always proceeds the more free-form question variant.

Gamification:

Data Camp has one really great feature that I liked.  Each exercise has an amount of xp that you can collect.  Each lecture is worth 50xp, multiple choice is worth 50xp and problem sets are worth 100xp.  If you click the show hint button, the problem set xp is reduced to 70xp.  Show solution gives you 0xp.  When logging in, you will see the total xp you got that day as well as how many days in a row you have utilized data camp (streak). This is a great way to motivate you to do the exercises.

Another layer of gamification comes from certificates you can collect (and post on linkedin) as well as career tracks you can complete.

Cost/Summary:

I think data camp is very friendly to beginners interested in learning about data analysis in Python and R, both marketable skills.  I think the course layouts, lectures and practice problems are well thought out.  I would suggest that beginners also read books and online documentation on subjects like Pandas and Numpy.  I found data camp focused more on practicing skills and less on implementation details (which is a good thing).

Data Camp is currently on sale for $180/year (usually $300/year).  You can also buy it as a monthly subscription for $30/month.  Data Camp is similar to Udemy.  There are 3 advantages Data Camp has over Udemy:

  1. Practice problems make up a larger percent of the curriculum.
  2. $30/month you have access to any one of 100+ courses.  The Udemy equivalent is $15/course.  A Udemy course is equivalent to 3 Data Camp courses (in material)
  3. Data Camp specializes in data science and has really thought about how to naturally progress through the data science material.
  4. If you like structured programs, this is better.

Advantages of Udemy:

  1. Overall larger selection of content, variety and topics.  If you want to practice Python, but want to tackle multiple topics like: web programming, networking, penetration testing or a specific subset of machine learning.  It might be a better option.
  2. There is no subscription fee.  It’s fixed cost.  If you are not sure you want to invest a lot of time into python programming this is a better choice.
  3. Lectures are not as uniform.  You might find some lectures that are more theoretical.  Others are more practical.  That means you can try out different lecture styles to see what works.

Best,

Chris

 

 

Powerful Accessories to Excel – Introduction

Dear Readers,

I wanted to start a quick tutorial targeted at Excel users and highlighting cool technologies that are relatively easy to learn and can provide a lot of benefit.  Basically, they are technologies that provide a lot of bang for their “effort” bucks.  The technologies I want to cover in order of ease to learn are:

Bash/Batch – Bash and Batch are shells for Linux/Mac and Windows respectively.  Shells interact with your operating system to allow you to do things like read files, manipulate data, create remote login sessions and schedule tasks on your computer.  One of the great things about bash and batch is that if you study it long enough, you can start using “cloud” computers, which allow computers to do work, while you are sleeping.

SQL/RDBMS – Relational databases allow you to store data compactly on your computer and provide a convenient interface for doing data manipulation.  One great reason you should pick up this tool is that it allows you to specify relationships between data: Student – Class -Teacher – School being a good example and then make statements like: I want only class taught by John Naughton that are considered math classes and only students whose first name starts with A.  One great thing about databases is scale.  You can work with 100s of millions of rows. Since SQL is a standard for data access, there are SQL-esque access to big data tools like Hadoop and Spark, which is software that allows you to take dozens of computer and have them process things in parallel.

Python – Python is a general purpose programming language that provides countless libraries for different purposes.  Do you want to develop a website or automate networking?  Python can do this.  Do you want to scrape a website, travel through a social network or do analysis on wikipedia?  Python is a great tool for it.  Do you want to be on the cutting edge of machine learning and AI by having direct interface to more advanced libraries?  Python…  Python can also be used for data manipulation and processing, which I will focus on.

I focus on the above three based on my knowledge.  Other noteworthy technologies include R, which I hope to cover in a future article.  During these tutorials, I might also mention other cool concepts or technologies.

If you are interested in this lecture series, I will put it under tutorial categories.

Best,

Chris