Let's take the machines house hunting!

By Atma Mani

Elevator Pitch

We all go through house hunting at some point. Be it renting or buying, most of the factors to consider are influenced by location. In this talk, we’ll perform Machine Learning using Python, Jupyter notebooks, Pandas, scikit-learn and other data science libraries to identify prospective homes.

Description

One of the fundamental questions in real estate is the question of ‘where’. Numerous studies indicate the place you live can impact a multitude of wellness factors, including, your life expectancy. Home buyers try to weigh several factors such as cost, the distance to major facilities, noise, air quality, community, neighborhood, school district, risks due to natural calamities etc. while looking for a place to live. Such analysis is not limited to just house hunting, business analysts and entrepreneurs run a similar multi-criteria analysis for a multitude of problems such as finding a suitable spot for a new grocery store, dentist office, coffee shop, etc.

In this talk, using house-hunting as an example spatial analysis problem, we will explore how to read spatial and non-spatial data in Python as Pandas DataFrame objects, perform exploratory and statistical analysis and visualize them on a map in a Jupyter notebook. We then score properties based on the criteria. We will finally teach a machine learning model (in scikit-learn) to understand our preferences and let it predict for us in the future. We will use the free ArcGIS API for Python to perform spatial analysis and learn how it can easily interoperate with popular data analysis libraries in the scientific Python and geospatial Python ecosystems.

Notes

I am proposing this talk to suit just about anyone that is interested in the following topics - A systems way of thinking and decision making, building software that act as decision support systems (DSS). I demonstrate a way to perform multi-criteria analysis using analytical hierarchical process. This process weighs multiple parameters through ranks and weights and helps to sort the available candidates (choices) in a data-driven and objective manner. - Exploring, visualizing and solving problems that have a location component. This talk teaches the fundamental concepts of thinking spatially. - Reading, exploring, visualizing geospatial data in Python and Jupyter notebooks.

I have delivered a previous version of this talk at a GeoDev meetup in Portland, Oregon. You can read it as a blog post here.

Outline of the talk Length of the talk is 20 minutes. - Self-introduce - An overview of data analysis libraries available in the Python ecosystem (pandas, scipy stack, Jupyter notebooks – some examples of how popular these are) - Introduce the free ArcGIS API for Python – overview of its architecture, how it can IO with popular Python libs - Introduce Problem statement: house hunting - Body of the talk: - Read and visualize data - Perform exploratory data analysis - Talk about spatial criteria. - Enrich housing data with information on access to facilities - Create a shortlist. Then save the map in the notebook into a web map that can be taken for field visits. - Train a machine learning model to do the same - Conclude with a call for action - Q&A