Category Archives: Networking

Learn Web Development: Backend Certificate

Dear Readers,

In the last few posts, I explored freecodecamp.org, a non-profit that assists people in learning how to develop websites for free.  I also show cased 10 quickly developed front-end prototypes.  Why is free code camp awesome, it’s a free curriculum that prepares you for a Junior Web Developer role (Salary).  It’s also great practice if you want to learn JavaScript, HTML and CSS.  As a database engineer, I really wanted to learn more about these technologies as almost everything touches the web in some way.  In this post, I will talk a bit more about backend-certificate and my current progress towards getting the certificate.

FreeCodeCamp.org provides 3 certificates.  The first one is called the front-end certificate.  It is the one that I would recommend starting with.  It begins with very simple elements of a webpage like text, images and buttons.  It then progresses through 200 lessons until you can develop TicTacToe and the Simon Game.  The first 150 or so lessons will take the average person approximately 5 minutes each and so can be done during your daily commute (train).  The remaining ones take significantly more time, but are well worth going through.

The next certificate I’m attempting is the backend certificate.  Front-end development involves the actual webpage, the buttons, text and images that are presented in your browser.  Front-end applications tend to utilize your computer and the resources tied to it.  The backend certificate focuses more on sending and processing data stored on your a computer to your browser.  Why would anyone want to store data on a separate computer, process it and send it to your web browser?  First, you can process things in one location preventing a bunch of other computers from doing the work.  This is the focus of cacheing.  Second, you want a guarantee that your data doesn’t change, which is near impossible to do if you don’t have control of the computer.  Third, you can take all this data and often aggregate it to look at interesting group behavior.  These are just a few examples why you’d want a “backend” server.

The current curriculum focuses on a framework called Node.js.  Node.js is written in JavaScript and is a framework for dealing with network connections.  A good example of a network connection is going to your facebook profile and making a post.  It involves, getting your profile page from a Facebook backend server called a http get request and then providing data to facebook in the form of your post called a http post request.  Both these requests need to be handled by some form of network software, one to get your profile information and the other to store the new data in the database, permanent place for your data.  Node.js specializes in handling these types of connections.  In fact, it can handle hundreds of connections in a few seconds from many different people.

What other types of technology are covered and what would you learn from this curriculum:

 

 

npm – Node Package Manager, when you are dealing with websites there are a lot of moving pieces.  You want to be able to set up node.js to handle connections.  You also want to build logic around those connections, are you a bank processing credit card balances or is it just retrieving your blog post.  NPM allows you to save a copy of that logic and makes useful logic available for others to use.  It also provides a way to do automated tests.  If you decide to for example change the functionality of your blog, you can run tests to see if it breaks your websites.  Broken website often results in a blank page or missing elements.  In the course, you do one whole module consisting of about 12 tasks that teach you how to manage your code.

node.js – Node as mentioned above is a library that deals with network connections.  Many people use Node specifically for websites, but you can use it for other situations as well (maybe forwarding database connections).  In this curriculum, there are two lessons that utilize node.  One of the things you will end up building is a service where you type in a url with a date and it will convert it into different timezones.  These backend services can of course become significantly more complicated.

express.js – Express is another JavaScript library, but focuses more on building websites.  Express.js would set up for example a template of how a Facebook profile would look like (Picture on the top left, a background picture, short summary and posts below) and bind it to a specific url.  It would also create separate urls for your facebook profile vs your activity feed.  The lessons focus on very simple webpages and templating.

mongodb – mongodb is a database and is not written in JavaScript.  It focuses on storing data on the backend server and is optimized for quick retrieval.  Mongo itself is called a document store, which means that it does not store things in tables like a spreadsheet, but rather as a set of keys and collections.  One neat thing about this is that you can store a set of collections within a collection, which makes it very flexible.  An example of this type of data can be seen here (it is called JSON):

Example Google Maps JSON File

MongoDB section has about a dozen tasks and focuses on storing data, retrieving it and using it in a web application.  This is important in that you can retrieve the data for a specific person and just put it in a template (or webpage).  Hence, your name, summary and posts in facebook are all stored in the database and are unique to the individual (database like Mongo), but are presented in the same way as a profile page (node.js/express.js provide template and logic).

In the next few posts, I will talk about the second part of the backend certificate, which are the 10 projects that utilize the tutorials mentioned previously to produce products people can use.  After doing those 10 projects, a person at freecodecamp.org could potentially develop backends similar to quora.com or facebook.com.  One thing to note, even though that website would look similar to facebook, it would be missing the messaging and not operate at the scale of Facebook.  Handling webpages for billions of users is a much harder problem.

Best,

Chris

Managing Networks – Trial and Error

I’ve been playing around in my free-time in automating connections between different AWS instances as a way to learn more about networking.  Currently, it’s been pretty fun.  Last post I mentioned a series of libraries just for networking.

This post talks more about user friendly interface in the form of CLI libraries as well as some interesting topics regarding asynchronous processing in Python.  A bit about the CLI libraries that I really like:

Click –  This is a really great library in that it seems almost a natural way of building up trivial CLI in a quick and efficient manner.

You start with instantiating a cli.group, which represents a class to hold all your commands.  You then write functions in python and decorate them with the @cli.command(<help>) python decorator.  Then add @click.arguments(<help>) to add arguments.  The type of arguments available is pretty extensive including the ability to use a file (which it checks if it exists).  The nice thing about this interface is it generates the help menu for you and, if the commands ever get more complex, provides ways to subdivide commands into small groups.  This library is great for centralizing a bunch of commands.  Create a setup.py file with an entry point to make the CLI available anywhere within linux with a custom command prompt (I use something like dbops as a prefix).

Cmd – This allows you to create a command line utility using a single class and defining a few methods.  The Cmd.cmd class provides a shell, which takes in user input and then matches it with a set of commands (if they exist).  Commands are specified with def do_<command name>(self,line): where line is the string that excludes the command name (parse this to get arguments).  To make sure that enter key doesn’t execute the previous command make sure to create a method def emptyline() that returns 0 (return 0 re-prompts the command line for a new command, anything else will stop the loop).  I played around with this command prompt as a front to a network management utility and thought it was pretty effective (Cmd.cmd will run asynchronously, which frees you up to develop other services within the application).  I recommend this if you need to get user input and utilize that within the context of a program.

Argparse – Argparse, (not listed here optiparse) are other options that you can also use.  It works by providing a set of rules to handle arguments for a specific command and then assigns those to variables globally.  Good part of argparse is the argument section is very flexible and you can add things like flags.  I think overall it’s a bit harder to implement then the above two cases (but more flexible).  I think this is used mostly with a single file.

Sys/Os – The system and os library is well worth getting to know.  It provides a great way to interact with the operating system.  From checking on files, directories to … doing a stat on a file to …  One of the great uses of Sys and Os commands is the ability to manipulate stdin, arguments and stdout.  I’ve used this to generate python scripts that accept piped results.  Another interesting library to check on for this specifically is subprocess model, which allows you to run commands in the background and provides file like objects for stdin, stdout and stderr (with subprocess.PIPE allowing you to pipe results between subprocesses).

The parallel processing part of my project was pretty cool.  I worked mostly with multiprocessing and the threading library.  Multiprocessing allows you to produce new processes via fork, threading allows for shared memory between processes.

multiprocessing – I really like this library.  You can create a set of workers and provide them a function to do work in parallel.  The join command (similar to bash) waits until they are all done and then continues the process.  The overall command is pretty easy to pick up, you create a multiprocessing process, provide a target function and a set of arguments for the function (typically in list format).   You than just use the start method on the project and it begins to run it in the background.  Other cool things about multiprocessing is the ability to set up queues, pipes (bi-directional communication) and proxy shared-memory manager for dictionaries and lists (didn’t get to work, but see docs).  One thing I did run into is working around shared memory issues (initial fault in not researching threading vs multiprocessing).

threading – The commands are similar to multiprocessing (in terms of setting up), but runs things in a thread instead of a process.  You’ll see threading used a lot in libraries.  TCPServer in the previous post (SocketServer library) uses it in it’s mixing.

Celery – I didn’t get into celery as much as I’d like to.  Mostly due to not wanting to set up rabbitmq or redis for a tiny application (I used sqlite to keep foot print small and set up easy).  It’s still a great tool to look more into as it runs a queue (or set of queues) for you and allows you to execute things asynchronously (use for messaging too).  I will probably look more into this library and the associated products in the future.

The application I developed was a tool for managing database connections.  It was split into 3 parts, a process that polls AWS for connection information, a database for storing that information (sqlite) and a process that managed SQL connections for me (through port forwarding).  This was all controlled via CLI based on Cmd library.  Messages were sent to polling and SQL connection manager via queues (multiprocessing) with each process run within a separate process (multiprocessing).  Within SQL connection manager, I created TCPServer (SocketServer), which I ran in a different thread and added to a class to manage connections.  The threading was done partially to isolate failures due to a computer shutting down or refusing a connection.  This prevents the entire application from failing due to the actions of a single TCPServer.  Overall, I’ve liked the experiment so far, but don’t intend to do much more with it.  It was a experiment to test out a lot of these libraries and get a deeper understanding about things like ssh.

Quick Experiment with Networks (Python)

Python has some really great libraries for networking.  Going from multi-threaded asynchronous services to one-time use cases to more generalized services to run commands over many servers.  With several hundred virtualized instances to manage, utilities to deal with some of the more common tasks from the command line become a lot more useful.  So, a list of cool libraries I’ve recently looked into:

fabric – A cool python library that abstracts hosts into a single managed list and then allows you to execute commands over the entire host list.  Used in Salt (devops) library and uses paramiko under the cover.  More on that later.  Has a single entry point called a fabfile.  Great for developing a set of tasks for a central computer.

http://www.fabfile.org

paramiko – A library that makes a peer-to-peer ssh connection relatively easy through SSHClient class.  The SSHClient class allows you to set up policies for dealing with unknown hosts etc.  Connecting is pretty easy, you use a connection method and pass in some basic information about ports, ip address, username and key files.  To execute a command, you can then just execute exec_command on the instance and it provides 3 file like objects: stdin, stdout and stderr with typical file reading operations.  Great for setting up single remote connections.

http://www.paramiko.org

sshtunnel – A library for creating ssh tunnels quickly.  Provides a class just for port forwarding.  Worth checking out if you do this occasionally as it has a quick hand solution.  Great for forwarding information, like a database connection.

https://sshtunnel.readthedocs.io/en/latest/

sockets – A library for doing socket manipulation.  Lower level than the other protocols.  You have to do things like send/receive from a given socket.  Provides a bunch of different protocols you can use.  More extensible.  Great for dealing with lower level problems or making network more customizable.

https://docs.python.org/2/library/socket.html

socket server – A library that allows you to create a socket server that handles networking events via a handler.  A series of mixins and servers are available, including ones that make the handler asynchronous.  I used this to implement my own version of port forwarding service.  Great for setting up quick server to do connections.

https://docs.python.org/2/library/socketserver.html

Conch, twisted framework – twisted is a asynchronous network framework in python.  It’s pretty cool project and is similar to Tornado.  Twisted has a client called conch that allows you to handle ssh traffic.  Cool project, but only been through the tutorials.  I’m a big fan of the project, but haven’t done that much with it.

http://twistedmatrix.com/documents/current/conch/howto/conch_client.html