Data Analysis and Visualization in Ruby

By Shekhar Prasad Rajak

Elevator Pitch

Looking for gem to import & export of the data in various formats, data analysis, manipulation and visualization,to be used in web application,IRuby notebook or console? Let’s learn daru and plugin gem daru-io & daru-view from the Ruby Science Foundation(SciRuby).

Description

Outlines

We will be using our GSoC 2017 Blog to show features of daru plugin gems and cool new features of daru gem from the the SciRuby blog.

Main points will be as follows :

  • Ruby is powerful for Data Analysis and visualisation ?

Ruby has NArray, NMatrix (new but promising), Statsample, Ruby/GSL, libsvm, ruby-fann (neural networks), tlearn (neural networks), ai4r, algorithms (k-d tree to do KNN), bayes_motel and others. These powerful tools are well enough for data science and to build tool for data analysis. Ruby is also the preferred language for DevOps related tasks (e.g Puppet and Chef).

Most of the things are online today, so people want to do analysis of data and view the results quickly using plotting. Ruby has very popular frameworks like Rails, Sinatra, Nanoc, Hanami to build the dynamic website.

Hence isn’t Ruby has all the things required for data science and data analysis and visualisation?

  • Why less people use Ruby for Data Analysis and visualisation?

Data scientists need to go to a lower level statistics / programming language and a tool that can do Numerical and statistical analysis, inspecting, cleansing, transforming, and modelling data. Few languages and tools were produced good tools for this , before the Ruby and people are using it.

The day has gone when people says Ruby has to travel long to compete with present data analysis tool. Basically the tool is useless unless people use it (The Network effect). Ruby Science Foundation have produced powerful tools that are really helpful for data analysis and visualisation one of them is daru and newly created plugin gem daru-io and daru-view.

  • Why to use daru and what makes it powerful gem for Data analysis and visualisation ?

In this section, you will learn how we can use daru for analyzing large data sets and get a tour of daru being coupled with other Ruby tools like pry, iruby and nyaplot, Google Charts, Highcharts, data-tables tool for interactive and standalone data analysis and plotting for gaining quick insights into your data — all with a few lines of Ruby code.

  • Examples to demonstrate importing, manipulating, exporting operations in IRuby Notebook using daru and daru-io.

  • Demo of interactive plotting system in web applications (mainly Rails/Sinatra/Nanoc web application frameworks) and in IRuby notebook using daru-view.

Already created apps to demonstrate the power pf daru-view and daru-io :

Notes

  • Ruby developers have to switch to other languages for data analysis & visualization, if there are not aware of cool gems available in Ruby, so in this talk we will be demonstrating how they can do similar things with more features in ruby effectively.

  • Audience will learn the 2 newly created plugin gem for daru: daru-view , daru-io.

  • Other languages are failed to provide very interactive charts and tables. daru-view gem is useful in plotting charts and tables in IRuby notebook as well as in web applications (Rails/Sinatra/Nanoc). daru-view uses Nyaplot, highcharts, google charts for plotting and DataTables for generating table with various features mainly searching, pagination, sorting.


Requirement

Basic knowledge of Ruby.


Outcomes

Audience will learn a powerful data analysis & visualisation gem daru and newly created gem daru-view and daru-io and how they better than current popular tools. Audience will be able to create small web application(like this ) and will run examples in IRuby Notebook. We already have some good examples in daru-view and daru-io github repo, wiki page, documentation.

Attendees will also be introduced to crunching and cleaning data read from a variety of sources and generating beautiful plots of that data with just a few lines of Ruby code. Prior experience with data analysis is not a necessity but having some exposure to basic data analysis is recommended. This is an entry/intermediate level talk that will focus more on introducing listeners to the new world of data analysis in Ruby than elaborating on various methodologies that are used for data analysis.


Long description about daru :

The Daru Family

The daru family is a family of ruby gems that intend to smoothen the integration between Data Analysis and Web Development. Having a family size of 3, it was inducted when the first parent gem called daru, was first released in Oct 2014 under the guidance of Ruby Science Foundation. Ever since, daru has been continuously evolving to a huge codebase for getting more battle-tested. With more and more web applications being developed using Ruby web frameworks like Rails, Sinatra and Nanoc, it only seems logical to have plugin gems to make daru more easily integrable with web frameworks.

Inception of daru-io and daru-view

During Google Summer of Code 2017, Ruby Science Foundation was selected as a mentoring organization with a list of prospective ideas for students to work on and one of them is “Make Daru more ready for integration with modern Web framework”. This project produced the plugin gem for daru named daru-io and daru-view.

Why daru-io?

Importers

When talking Data Analysis frameworks, the use-case is most probably going to have thousands of records being stored in a file / database. These thousands of records are not going to manually fed into a Ruby program, right? This is where the necessity of Importers comes in. Previously, the data had to be extracted from the file / database with a parsing and then the dataframe had to created. But now, you just need to require daru-io and start reading dataframes from files.

Exporters

Another half of daru-io, comes in the form of Exporters. These exporters are quite necessary when trying to work with a team of diverse skill-sets. Say, one of the team members is a R programmer, who isn’t able to quite get data from the GitHub Graph API. Using the JSON Importer, you can easily provide JSON-paths and import the dataframe, run preliminary cleaning with daru and then export to a common format (csv / xls / json / rds / rdata) for your friend to pick up where you left.

Without Data Analysis?

Infact, it’s not even required that you need to have a data-analysis use-case. Even for building a web application dashboard with various “Download as” options, daru-io can be used as a general purpose conversion library with dataframe as it’s intermediate data structure.

Why daru-view?

Charts

Charts are an integrable part of the visualization. There are multiple gems that provide multiple kinds of charts that includes simple charts such as Pie chart, Bar chart, Line chart to complex charts such as Geo-spatial charts, Temperature charts and so on. Daru-view is built on the adapter principle, wherein support is provided for many plotting libraries, and adapter to be used has to set before plotting.

Table

Tables are another important part of visualizing data, that enable viewing individual data points in a plain tabular form. DataTables offer the features to paginate, search and sort based on any column of the table. This has been used by the daru-view table to easily present a dataframe in a tabular form.

Without data analysis?

As a web developer, you might have to present an admin dashboard to your data analyst colleague / boss with different charts and tables between various features and labels. In such cases of building a dashboard with visualizations, you can just simply use daru-view rather than researching through all of the different plotting libraries.

Daru family usage

We saw the individual use-cases of the daru family : daru, daru-io and daru-view. Here’s what I think could be a typical interesting use-case of the daru family. Import dataframe(s) from complexly nested JSON API response(s) (with daru-io Importer), pre-process and compute them (using daru), and finally build a dashboard with visualizations (using daru-view) and export the results to a format of your choice as end-result (using daru-io Exporter).


How I am best person to speak on this topic

We are 2 speakers :

About Shekhar Prasad Rajak :

I am one of the contributor of daru github repository and my Google Summer of Code 2017 project is related to daru gem. I am author of plugin gem for daru, named daru-view which provides interactive charts in web application as well as IRuby Notebook. So I have expert level knowledge on this topic. I can convey my thought easily. My work during GSoC 2017 is described here.


About Athitya Kumar:

By working with the daru family for about 9 months now, I feel that I’ve a good grasp of the current scenario of the daru family and familiarity of the codebase. Being the author of daru-io, I think I’d be the best person to represent it during a talk along with the rest of the daru family.


Talking about our work at SciRuby in front of the very people that have made the Ruby language what it is now will prove very valuable for developers wanting such a tool in Ruby and for us as an organization for getting more contributors and users onboard.