Category Archives: Languages

Fall 2018 – Readings and Online Courses

Dear Reader,

I decided to write quickly about 6 programs/books I took: AI Programming with Python (Udacity), Google Cloud Platform Architecting (Coursera), Creating Kivy Apps by Dusty Philips, Linux Kernel Development by Robert Love, Terraform: Up and Running by Yevgeniy Brikman and the Design of Everyday Things by Donald A. Norman.

Udacity: AI Programming with Python

This is the entry-level course in AI for Udacity.  Udacity is an online course-ware provider that focuses on providing content on cutting-edge topics: AI, Deep Learning, Self-driving cars and Node/React.  It also has classes on data analysis and marketing.  AI program starts by teaching you python, brushing up on some linear algebra and finishing off with 4 sections on AI: Two theoretical units on gradient descent and neural networks, one on PyTorch and a final project that classifies flowers.  PyTorch is Facebook’s AI framework built in Python.

AI Class

The course is well prepared overall and should be manageable for someone with intermediate programming skills.  Best in Python.  It’s a bit on the expensive side for online material: $500-$1000, but Udacity tends to make up for the price tag with more interesting content and topics.  Overall the quality is good.  I’ve heard FastAI is a free alternative.

My Repo: https://github.com/Silber8806/udacity-AI-programming-with-python

Coursera: Architecting with Google Cloud Platform

Offered by Google Cloud

Google Cloud Platform, GCP, is googles answer to AWS, Amazon Web Services.  Overall, the platform has many of the same services as Amazon including Cloud Functions, which looks to be AWS Lambda for Google Cloud.  The only glaring issues I’ve noticed so far: 1. No equivalent to SNS for e-mail and some UI issues in the console concerning Google Cloud Storage (S3).  The first part is painful in that Google by default blocks the smtp ports forcing you to use a 3rd party provider (with limited support in stackdriver).

Google outsourced most of it’s training for GCP to coursera.  The courses are broken into 6 parts.  The first course covers all of the services on a 40,000 foot level, 4 courses proceeding this are in-depth technical reviews of virtual machines, networking, storage, database technologies, container services and autoscaling solutions.  It also includes a host of managed services including the very neat sounding: “Spanner”, a supposedly horizontally scaling RDBMS service.

The last course, Reliable Cloud Infrastructure: Design and Process,  was the most interesting.  It provides a practical overview of how to construct infrastructure as well as going over Google’s philosophy on designing scalable software.

Something that really stood out about this program is the labs.  They provide you a temporary Google Cloud Account for your tutorials.  You get to practice on the actual Google console and shell prompts.  This program is reasonably priced at $49.99 per month.  You can take the material without labs for free, but I’d recommend the labs!

Creating Apps in Kivy:

Kivy is Python’s Mobile Application development framework.  It’s used to make cross-platform applications for: Android, IPhone and Ipads.  It’s something I’ve wanted to try out for a while just out of curiosity.  I picked up Creating Apps in Kivy by Dusty Phillips, a book on the subject.

Creating Apps in Kivy

The book covers two applications: A weather application using an open weather api and a video game involving spaceships.  The weather app takes the block of the book with significant amount of time spent on UI and UI interactions.  Kivy generally is split into two files: a main file in python that is event-driven and a .kv file that describes the component layout.  A single chapter is dedicated to graphics.  The best part of the graphics section was developing animated snowflakes that fell from a single component.  The book also covers databases and advanced UI components like a carousel.

Overall, I’m happy with the quality of this book with the exception of 2 things.  It was hard to read the indentation between pages and there were a few chapters where I had to add imports or features he didn’t mention to get the examples to run.  For this reason, I’d recommend being somewhat well versed in Python before attempting the sample projects in the book.

Github Repository: https://github.com/Silber8806/kivy_tutorial

Linux Kernel Development (3rd Edition) by Robert Love:

I started reading this book after my colleague at Salesforce.com, a DevOps Engineer, began talking to me more about the Linux Kernel and strange behaviors in Bash.  I read a previous book to get a better understanding of the OS and am really happy that I skimmed through this book.

Kernel Book

Linux Kernel Development covers the Linux source code by going through different subsystems and explaining how it works.  It’s a deep-dive into things like: process management, memory, I/O, Virtual File System, Cacheing, Timesharing… It’s inspirational to see how much thought has gone into the Operating System and how many improvements are mentioned.  You really feel like you are standing on the shoulders of giants after reading this book.

Linux Kernel Development feels like a technical reference and I’d recommend reading a book on Operating Systems before getting too deep into it (or taking courses in C).  I skimmed through this book, because I use Linux daily and feel a bit clueless about what is happening under the covers and wanted to learn more about it.

Terraform: Up and Running by Yevgeniy Brikman:

Terraform: Up and Running covers… Terraform, the cloud provisioning software by HCL corp.  .  Terraform is a framework for deploying IT services to AWS, GCP, Azure and Digital Oceans in an automated fashion.

Terraform is cool in that you write your infrastructure as a series of templates.   These templates are in the form of a declarative language called HCL, which describes your cloud infrastructure typically in a 1-to-1 fashion with the services you have deployed.  What’s special about Terraform is that after you apply a HCL template to your cloud environment, it writes the current state into a file (immutable data structure).  Every additional change begins with your last state and modifies it.  A lot of devops systems are not aware of their previous configurations.  Terraform reminds me a little of Redux, but applied to Infrastructure instead of web development (I might be wrong).

Things I noticed from the book.  They suggested deploying these state files to S3, expect strong concurrency controls around the Terraform library and advice that you don’t mix other devops tools with Terraform.  That way if a user manual creates an account that Terraform isn’t aware of, it won’t create an error (since that manually created user isn’t part of Terraform’s state).  It seems Terraform works best as an isolated service and the “main” provisioning tool for a specific subspace of your infrastructure.

I skimmed through this book as I was interested in Terraform and have heard of a lot of adoptions in industry of this technology.  I didn’t get to test the code in this book and can’t really speak about it.

The Design of Everyday Things by Donald Norman:

The Design of Everyday Things by Donald A. Norman is a book about usability and user design.  It’s the book that I’ve been listening to on Audible, an audio book app by Amazon.com.

The book goes over the design of everyday things: ovens, doors, cars, computers, phone systems, radios and details both design flaws and positive things about different objects.  He focuses a lot on understanding differences between knowledge in the world vs in the mind, how we map controls (buttons etc) to actions and how overloading them can lead to confusion and how to constrict user settings to prevent too many options flooding the users consciousness.

He’s a big proponent of mapping controls to physical aspects of the world.  For example, a car seat control should move the seat forward if a switch is moved forward and backwards if a switch is flipped backwards.  A switch shouldn’t move forward and backwards if a button is pressed left or right as that will confuse a user.  He also favors designs were cues exist in the world that indicate what to do and where each control (button) does one and only one thing.  A button on a watch should only turn off the alarm not adjust the alarm or answer phone calls as well.  If these aspects of designs can’t be applied, he suggests standardization across an industry.  A water facet for example should have cold water knob always to the right of the hot water knob.

This is a great book.  I would suggest reading it on Kindle or as a paperback book. My big issue with Audible and Audio books in general is that I typically listen to them while walking or multi-tasking.  That means I lose focus on certain sections (last 2 chapters).  Overall, I really enjoyed the content I heard from this book.

Best,

Chris

Data Camp: Week Adventure into Python Data Science

Image result for data camp

Introduction:

Noah, a former TechTarget colleague, mentioned  datacamp.com  to me, while we were discussing an issue with remotely hosted files (probably rdp associated) and pandas DataFrames.  On February 10th, I decided to give it a try and today, 6 days later, I finished up the Python Programmer Track (10 classes)!

What is Data Camp?  Data Camp is an online education company that offers data science specific courses.  They focus mostly on two lines of technology: R and Python.  They offer some auxiliary courses in other topics: SQL, Linux and git (as well as a few derivatives).  All the courses are hosted on their online platform, which overall is beautiful to interact with.  A bit more about the topics:

Data Camp Topics:

R – Statistical Programming Language

A statistical programming language inspired by LISP and developed by professors at the University of Auckland.  R has become famous amongst statistics and machine learning researchers where they prototype cutting-edge research and provide it as packages to CRAN, a online index for hosting R code.

Python – General Purpose Language

A general purpose programming language developed by Guido van Rossum.  It’s a dynamically typed, interpreted language that has been used for rapid prototyping and scripting.  Python as a general purpose programming language that can be used for: web development, networking, devops, data analysis and machine learning.  Python has gained popularity over the last few years with increased interest in it’s scientific computing platform.  Pandas, a popular Data Science framework, was inspired by R’s data frame.

SQL – Structured Query Language

Structured Query Language, SQL, is a domain-specific language typically associated with relational database management systems (RDBMS aka SQL databases), but has been coopted by other solutions (BI platforms, Hadoop and Spark).  SQL is easy to learn and revolves around a few common entities: Servers, Databases and Tables (Views etc).  Tables can be viewed as tabular data where rows are entities (employee) and columns are attributes of the entity (salary).  Most SQL is used to retrieve data hosted in tables mediated by RDBMS.

Context

R and Python are used to manipulate data in a procedural way, typically through the use of things like Data Frames (R), Pandas (Python) or Numpy (Python).  These libraries create tabular, matrix, vector or scalar data often rely on vectorized operations to do computation.  Both R and Python have extensive data visualization frameworks to create graphs.  They also have great libraries for statistics, machine learning and artificial intelligence.  SQL is primary associated with data retrieval and is used to get “data” back from a hard drive (through the database).  Most database are more limited in functionality when it concerns more finite data manipulation, graphing and machine learning (though extensions do exist).  They make up for it by being able to store large quantities of data without relying extensively on memory (volatile and limited).  Generally, most software engineers and analysts use both a programming language like (R/Python) and SQL (to access data).

Data Camp:

Summary:

Data Camp breaks down Python/R courses on career tracks: Python Programmer, Data Analyst and Data Scientist.  Each track is composed of a number of courses: 10, 13 and 20 respectively.  Topics covered: basic programming, data manipulation techniques, graphing, statistics, machine learning, network analysis and ai (1 course).

Each course is composed of 3-5 segments.  Each segment has a set of lectures followed by exercises.  You can expect around 3 lectures and 10 exercises per segment.  Most courses build on themselves intuitively beginning with the basics and gradually building up in complexity.  I was surprised to find lectures on generators, closures and how they relate to data frames within the first 4 classes.

Lectures:

datacamp lecture

Lecture portion of the website is beautiful.  Presentations have nice transitions and the website background is not distracting.  Lecturers were clear and easy to follow.  You get a sense that al to of effort was put into the curriculum.  There were multiple lecturers in the 10 courses I took (around 5-6).  Some of these lecturers work for esteemed companies like Anaconda, published books on the topic they spoke on or had a background in software engineering/consulting.  Overall, the lectures were high quality with very few mistakes.

Practice Sets:

datacamp practice problems.

The practice problems were conducted in what looks like a modified ipython notebook embedded in the website.  They have 4 panels, the 2 on the left: exercise and instructions and 2 on the right: Scrapt.py and ipython Shell.

The exercise and instructions provide guidance on how to complete the exercise.  The Exercise section explains the topic.  The instructions tell you what steps need to be completed before submission is excepted.

Script.py is where you write your code.  The run code button let’s you execute script.py and see the output in the IPYTHON SHELL.  It’s pretty interactive.  When you submit answer, it checks if the solution matches the instruction section.  For the most part, there are few cases where code I submitted was marked wrong when it was in fact right.  That’s great!

What if you get stuck on a programming problem?  There is a button called hint.  This provides some extra guidance typically in the form of small chunks of code.  If you press the hint button, a new button called show solution will appear.  Clicking the show solution button will overwrite the SCRIPT.PY window with the correct solution, which you can then run and submit.

In rare occasions, you might end up with a multiple choice question.  They offer an interactive shell in this case.  It always proceeds the more free-form question variant.

Gamification:

Data Camp has one really great feature that I liked.  Each exercise has an amount of xp that you can collect.  Each lecture is worth 50xp, multiple choice is worth 50xp and problem sets are worth 100xp.  If you click the show hint button, the problem set xp is reduced to 70xp.  Show solution gives you 0xp.  When logging in, you will see the total xp you got that day as well as how many days in a row you have utilized data camp (streak). This is a great way to motivate you to do the exercises.

Another layer of gamification comes from certificates you can collect (and post on linkedin) as well as career tracks you can complete.

Cost/Summary:

I think data camp is very friendly to beginners interested in learning about data analysis in Python and R, both marketable skills.  I think the course layouts, lectures and practice problems are well thought out.  I would suggest that beginners also read books and online documentation on subjects like Pandas and Numpy.  I found data camp focused more on practicing skills and less on implementation details (which is a good thing).

Data Camp is currently on sale for $180/year (usually $300/year).  You can also buy it as a monthly subscription for $30/month.  Data Camp is similar to Udemy.  There are 3 advantages Data Camp has over Udemy:

  1. Practice problems make up a larger percent of the curriculum.
  2. $30/month you have access to any one of 100+ courses.  The Udemy equivalent is $15/course.  A Udemy course is equivalent to 3 Data Camp courses (in material)
  3. Data Camp specializes in data science and has really thought about how to naturally progress through the data science material.
  4. If you like structured programs, this is better.

Advantages of Udemy:

  1. Overall larger selection of content, variety and topics.  If you want to practice Python, but want to tackle multiple topics like: web programming, networking, penetration testing or a specific subset of machine learning.  It might be a better option.
  2. There is no subscription fee.  It’s fixed cost.  If you are not sure you want to invest a lot of time into python programming this is a better choice.
  3. Lectures are not as uniform.  You might find some lectures that are more theoretical.  Others are more practical.  That means you can try out different lecture styles to see what works.

Best,

Chris

 

 

Interactive Programming Diagrams

Python Tutors

Introduction:

Philip J. Guo is a professor of Cognitive Science at UC San Diego, who focuses on teaching programming at interactively online.  He has produced great products for both programming beginners and those interested in Python internals (10-hour lecture).

He impressed me with Python Tutors website: python tutors. Python Tutors is: Python, Java, JavaScript, TypeScript, Ruby, C and C++.  Python tutors converts Python written on the left into data structures on the right.  This provides a deeper view of what Python is actually doing.  The diagrams are step-based, which means you can see the execution of each line of code and what happens beneath the covers.  Red and Green arrows show you the line to be executed and the line being executed.  An example below.

Example (click image to go to interactive page):


Related Posts:

introduction to python – health innovators class

great python books – beginner to intermediate

Happy Holidays,

Chris

Introduction to Python Courses

Dear Reader,

I am providing free lectures on intro-to-python.  Salesforce supporting me through their voluntary time off program.  Salesforce.com under it’s 1-1-1 program gives each employee 7 days to work on voluntary projects.

You can find the lectures here:

Lectures

Code

Application Class

The 4th class will be held at Cambridge Innovation Center, 1 Broadway, Cambridge Ma on Saturday November 18th 2017.  They are held every 2nd Saturday afterwards.  Possible projects:

  1. A mini Q/A program.  It introduces the concept of regex using re.
  2. Opening a file in python, reading it’s lines and analyzing words.
  3. Utilizing SimpleHttp server host a basic webpage with <p>, <h1> and <div> tags.  This is a simple introduction to a one-line web server.

The curriculum is free and I encourage people to submit practice problems to the GitHub repository.

Moodle Platform

Moodle is an open-source learning platform. Often, Moodle is used in universities.  I plan on implementing a Moodle instance to host lectures online.

Spammy E-mails, Great!

Moodle implementation has stalled.  I decided to host e-mail service myself.  I did get smtp and e-mail server up.  The obstacle now is getting Google and other e-mail providers to realize my E-mail isn’t spam.

Why is it considered SPAM?

Evidently, you can send an E-mail from chriskottmyer.com, but claim it originated from john smith.com.  Web industry has developed two processes to prevent this: SPF and DKIM.  SPF creates a guarantee that a message from johnsmith.com originates from johnsmith.com.  DKIM encrypts SMTP header preventing snoopers from changing that in transit.

Apache and VPL!

After resolving the spam crisis, I will have to deal with an annoying URL issue.  Moodle loves my IP address.  It loves it so much, it’s bound it to all the URLs.  I don’t like!  I’ll have to make either application-level change or re-route in Apache to resolve.

Having Moodle is great.  It doesn’t support programming assignments out of the box.  Luckily, some wonderful academics invented a plugin called the VPL.  It takes code presented to the web, submits it to a restrictive JVM-based sandbox and runs it.  It should prevent any malicious hackers from hijacking the server (crossing fingers).  It also supports automated grading of coding exercises (yay!).

Both issues aren’t blocking the lectures!  Hopefully everyone can enjoy those!

Best,

Chris

 

Learn Web Development: Backend Certificate

Dear Readers,

In the last few posts, I explored freecodecamp.org, a non-profit that assists people in learning how to develop websites for free.  I also show cased 10 quickly developed front-end prototypes.  Why is free code camp awesome, it’s a free curriculum that prepares you for a Junior Web Developer role (Salary).  It’s also great practice if you want to learn JavaScript, HTML and CSS.  As a database engineer, I really wanted to learn more about these technologies as almost everything touches the web in some way.  In this post, I will talk a bit more about backend-certificate and my current progress towards getting the certificate.

FreeCodeCamp.org provides 3 certificates.  The first one is called the front-end certificate.  It is the one that I would recommend starting with.  It begins with very simple elements of a webpage like text, images and buttons.  It then progresses through 200 lessons until you can develop TicTacToe and the Simon Game.  The first 150 or so lessons will take the average person approximately 5 minutes each and so can be done during your daily commute (train).  The remaining ones take significantly more time, but are well worth going through.

The next certificate I’m attempting is the backend certificate.  Front-end development involves the actual webpage, the buttons, text and images that are presented in your browser.  Front-end applications tend to utilize your computer and the resources tied to it.  The backend certificate focuses more on sending and processing data stored on your a computer to your browser.  Why would anyone want to store data on a separate computer, process it and send it to your web browser?  First, you can process things in one location preventing a bunch of other computers from doing the work.  This is the focus of cacheing.  Second, you want a guarantee that your data doesn’t change, which is near impossible to do if you don’t have control of the computer.  Third, you can take all this data and often aggregate it to look at interesting group behavior.  These are just a few examples why you’d want a “backend” server.

The current curriculum focuses on a framework called Node.js.  Node.js is written in JavaScript and is a framework for dealing with network connections.  A good example of a network connection is going to your facebook profile and making a post.  It involves, getting your profile page from a Facebook backend server called a http get request and then providing data to facebook in the form of your post called a http post request.  Both these requests need to be handled by some form of network software, one to get your profile information and the other to store the new data in the database, permanent place for your data.  Node.js specializes in handling these types of connections.  In fact, it can handle hundreds of connections in a few seconds from many different people.

What other types of technology are covered and what would you learn from this curriculum:

 

 

npm – Node Package Manager, when you are dealing with websites there are a lot of moving pieces.  You want to be able to set up node.js to handle connections.  You also want to build logic around those connections, are you a bank processing credit card balances or is it just retrieving your blog post.  NPM allows you to save a copy of that logic and makes useful logic available for others to use.  It also provides a way to do automated tests.  If you decide to for example change the functionality of your blog, you can run tests to see if it breaks your websites.  Broken website often results in a blank page or missing elements.  In the course, you do one whole module consisting of about 12 tasks that teach you how to manage your code.

node.js – Node as mentioned above is a library that deals with network connections.  Many people use Node specifically for websites, but you can use it for other situations as well (maybe forwarding database connections).  In this curriculum, there are two lessons that utilize node.  One of the things you will end up building is a service where you type in a url with a date and it will convert it into different timezones.  These backend services can of course become significantly more complicated.

express.js – Express is another JavaScript library, but focuses more on building websites.  Express.js would set up for example a template of how a Facebook profile would look like (Picture on the top left, a background picture, short summary and posts below) and bind it to a specific url.  It would also create separate urls for your facebook profile vs your activity feed.  The lessons focus on very simple webpages and templating.

mongodb – mongodb is a database and is not written in JavaScript.  It focuses on storing data on the backend server and is optimized for quick retrieval.  Mongo itself is called a document store, which means that it does not store things in tables like a spreadsheet, but rather as a set of keys and collections.  One neat thing about this is that you can store a set of collections within a collection, which makes it very flexible.  An example of this type of data can be seen here (it is called JSON):

Example Google Maps JSON File

MongoDB section has about a dozen tasks and focuses on storing data, retrieving it and using it in a web application.  This is important in that you can retrieve the data for a specific person and just put it in a template (or webpage).  Hence, your name, summary and posts in facebook are all stored in the database and are unique to the individual (database like Mongo), but are presented in the same way as a profile page (node.js/express.js provide template and logic).

In the next few posts, I will talk about the second part of the backend certificate, which are the 10 projects that utilize the tutorials mentioned previously to produce products people can use.  After doing those 10 projects, a person at freecodecamp.org could potentially develop backends similar to quora.com or facebook.com.  One thing to note, even though that website would look similar to facebook, it would be missing the messaging and not operate at the scale of Facebook.  Handling webpages for billions of users is a much harder problem.

Best,

Chris