Category Archives: Python

Managing Networks – Trial and Error

I’ve been playing around in my free-time in automating connections between different AWS instances as a way to learn more about networking.  Currently, it’s been pretty fun.  Last post I mentioned a series of libraries just for networking.

This post talks more about user friendly interface in the form of CLI libraries as well as some interesting topics regarding asynchronous processing in Python.  A bit about the CLI libraries that I really like:

Click –  This is a really great library in that it seems almost a natural way of building up trivial CLI in a quick and efficient manner.

You start with instantiating a cli.group, which represents a class to hold all your commands.  You then write functions in python and decorate them with the @cli.command(<help>) python decorator.  Then add @click.arguments(<help>) to add arguments.  The type of arguments available is pretty extensive including the ability to use a file (which it checks if it exists).  The nice thing about this interface is it generates the help menu for you and, if the commands ever get more complex, provides ways to subdivide commands into small groups.  This library is great for centralizing a bunch of commands.  Create a setup.py file with an entry point to make the CLI available anywhere within linux with a custom command prompt (I use something like dbops as a prefix).

Cmd – This allows you to create a command line utility using a single class and defining a few methods.  The Cmd.cmd class provides a shell, which takes in user input and then matches it with a set of commands (if they exist).  Commands are specified with def do_<command name>(self,line): where line is the string that excludes the command name (parse this to get arguments).  To make sure that enter key doesn’t execute the previous command make sure to create a method def emptyline() that returns 0 (return 0 re-prompts the command line for a new command, anything else will stop the loop).  I played around with this command prompt as a front to a network management utility and thought it was pretty effective (Cmd.cmd will run asynchronously, which frees you up to develop other services within the application).  I recommend this if you need to get user input and utilize that within the context of a program.

Argparse – Argparse, (not listed here optiparse) are other options that you can also use.  It works by providing a set of rules to handle arguments for a specific command and then assigns those to variables globally.  Good part of argparse is the argument section is very flexible and you can add things like flags.  I think overall it’s a bit harder to implement then the above two cases (but more flexible).  I think this is used mostly with a single file.

Sys/Os – The system and os library is well worth getting to know.  It provides a great way to interact with the operating system.  From checking on files, directories to … doing a stat on a file to …  One of the great uses of Sys and Os commands is the ability to manipulate stdin, arguments and stdout.  I’ve used this to generate python scripts that accept piped results.  Another interesting library to check on for this specifically is subprocess model, which allows you to run commands in the background and provides file like objects for stdin, stdout and stderr (with subprocess.PIPE allowing you to pipe results between subprocesses).

The parallel processing part of my project was pretty cool.  I worked mostly with multiprocessing and the threading library.  Multiprocessing allows you to produce new processes via fork, threading allows for shared memory between processes.

multiprocessing – I really like this library.  You can create a set of workers and provide them a function to do work in parallel.  The join command (similar to bash) waits until they are all done and then continues the process.  The overall command is pretty easy to pick up, you create a multiprocessing process, provide a target function and a set of arguments for the function (typically in list format).   You than just use the start method on the project and it begins to run it in the background.  Other cool things about multiprocessing is the ability to set up queues, pipes (bi-directional communication) and proxy shared-memory manager for dictionaries and lists (didn’t get to work, but see docs).  One thing I did run into is working around shared memory issues (initial fault in not researching threading vs multiprocessing).

threading – The commands are similar to multiprocessing (in terms of setting up), but runs things in a thread instead of a process.  You’ll see threading used a lot in libraries.  TCPServer in the previous post (SocketServer library) uses it in it’s mixing.

Celery – I didn’t get into celery as much as I’d like to.  Mostly due to not wanting to set up rabbitmq or redis for a tiny application (I used sqlite to keep foot print small and set up easy).  It’s still a great tool to look more into as it runs a queue (or set of queues) for you and allows you to execute things asynchronously (use for messaging too).  I will probably look more into this library and the associated products in the future.

The application I developed was a tool for managing database connections.  It was split into 3 parts, a process that polls AWS for connection information, a database for storing that information (sqlite) and a process that managed SQL connections for me (through port forwarding).  This was all controlled via CLI based on Cmd library.  Messages were sent to polling and SQL connection manager via queues (multiprocessing) with each process run within a separate process (multiprocessing).  Within SQL connection manager, I created TCPServer (SocketServer), which I ran in a different thread and added to a class to manage connections.  The threading was done partially to isolate failures due to a computer shutting down or refusing a connection.  This prevents the entire application from failing due to the actions of a single TCPServer.  Overall, I’ve liked the experiment so far, but don’t intend to do much more with it.  It was a experiment to test out a lot of these libraries and get a deeper understanding about things like ssh.

Quick Experiment with Networks (Python)

Python has some really great libraries for networking.  Going from multi-threaded asynchronous services to one-time use cases to more generalized services to run commands over many servers.  With several hundred virtualized instances to manage, utilities to deal with some of the more common tasks from the command line become a lot more useful.  So, a list of cool libraries I’ve recently looked into:

fabric – A cool python library that abstracts hosts into a single managed list and then allows you to execute commands over the entire host list.  Used in Salt (devops) library and uses paramiko under the cover.  More on that later.  Has a single entry point called a fabfile.  Great for developing a set of tasks for a central computer.

http://www.fabfile.org

paramiko – A library that makes a peer-to-peer ssh connection relatively easy through SSHClient class.  The SSHClient class allows you to set up policies for dealing with unknown hosts etc.  Connecting is pretty easy, you use a connection method and pass in some basic information about ports, ip address, username and key files.  To execute a command, you can then just execute exec_command on the instance and it provides 3 file like objects: stdin, stdout and stderr with typical file reading operations.  Great for setting up single remote connections.

http://www.paramiko.org

sshtunnel – A library for creating ssh tunnels quickly.  Provides a class just for port forwarding.  Worth checking out if you do this occasionally as it has a quick hand solution.  Great for forwarding information, like a database connection.

https://sshtunnel.readthedocs.io/en/latest/

sockets – A library for doing socket manipulation.  Lower level than the other protocols.  You have to do things like send/receive from a given socket.  Provides a bunch of different protocols you can use.  More extensible.  Great for dealing with lower level problems or making network more customizable.

https://docs.python.org/2/library/socket.html

socket server – A library that allows you to create a socket server that handles networking events via a handler.  A series of mixins and servers are available, including ones that make the handler asynchronous.  I used this to implement my own version of port forwarding service.  Great for setting up quick server to do connections.

https://docs.python.org/2/library/socketserver.html

Conch, twisted framework – twisted is a asynchronous network framework in python.  It’s pretty cool project and is similar to Tornado.  Twisted has a client called conch that allows you to handle ssh traffic.  Cool project, but only been through the tutorials.  I’m a big fan of the project, but haven’t done that much with it.

http://twistedmatrix.com/documents/current/conch/howto/conch_client.html

 

 

Getting Hands Dirty…

After about 6 months or so reading up 2-3 books on Django and going through the tutorials, I’m finally at the point where I comfortably can pursue a website with confidence.  Over the last two days, I’ve been setting up a small Amazon instance for demos.  Doing all the server, network and security configuration was awesome as I’ve learned most of that in the last 6 months +.  I’ve got nginx, postgres and django set up, tonight I form the first url and templates.  Exciting!  Once I have a domain set up and production web pages, I’ll post a url on my blog this blog and people can test out the application (nature app).

Machine Learning in Action: Part 1

 

eigenvector14b

I’m interested in learning more about computer programming.  Recently, I’ve picked up a book on algorithms and data structures as well as looked into Greenplum and Postgres.  I wanted to have a slight change of focus this weekend and picked up Machine Learning in Action over the weekend.

The book has been great so far.  It’s written using Python and implements many common machine learning algorithms from scratch.  Currently, I’ve gone through 2 chapters, one on KNN and the other on ID3 trees.  The later was a bit more challenging then the first, requiring quiet a bit of recursion due to the tree structure involved with that methodology.  I like this book so far in that it does a lot of the implementations from scratch, which makes it easier to understand.  I still want to get deeper into Shannon entropy and that up to get a better understanding of it.

For those interested in the code behind the book, it can be found here:

https://github.com/pbharrin/machinelearninginaction

Hopefully, I will get to try out the next few chapters.  Chapter 4 covers Bayesian methodology.

Best,

Chris