In the world of ‘Data Science’ there has been a simmering debate over the advantages of R and Python. Ultimately, this debate is futile. Each language has the tools needed to produce high quality data analysis. Hoping to expand and complement my existing tool kit I’ve spent the last couple of months learning Python.
The bulk of my learning has been through Coursera’s Python For Everybody Specialization. Developed by Dr. Charles Severance at the University of Michigan, the material is presented with dynamic and detailed lectures. It’s also accompanied by a text book Python For Informatics. Broken into four parts, the course sequence is heavily focused on text manipulation.
The first two parts of the sequence cover general programming topics. It teaches the basics of loops, conditional statements, and custom functions. Python’s basic data structures are also covered. Anyone with a programming background should be able to rapidly move through the material in these lessons quickly.
The third course in the sequence, “Accessing Web Data”, is where I really started learning new things. Before diving into the meat of material, the course takes a very useful diversion in the Regular Expressions. The course spends the rest of the time teaching the students how to work with HTML, XML and JSON file structures. In my opinion, Python excels at working with web data. Built-in tools allow programmers or analysts to quickly access data and manipulate all three formats.
The final course in the sequence is about utilizing databases with Python programs. For the course, SQLlite is utilized and sqllitebrowser is utlizied to teach the basics. The course starts with an elementray introduction to database architecture and the basics of SQL. For programmers with experience using relational databases, the SQL introduction can be blown through very quickly. The more important part of the course is the incorporation of the SQL in a Python script. The homework assignements require students to utilize the techniques learned earlier to access and parse data from the web. The extracted data is then stored in a database.
After the four courses in Python For Everybody, I came away with the foundation needed to start using Python for data analysis. Since then, I’ve continued my learning by working through Python for Data Analysis. I’ve also been working on a project which has me building functions in Python to access and parse real estate data. Hopefully, I’ll turn that into a blog post in the near future.