Time to choice boosting for your task

By Artem Seleznev

Elevator Pitch

There are plenty of boosting algorithms. It happens when you don’t see different between their results. Why?

The problem is that you don’t prepare it properly, each algorithm requires preparation which is suited for.

I’m going to explain by giving examples in a competition way among *boosting

Description

There are plenty of boosting algorithms. It happens when you don’t see different between their results. Why? The problem is that you don’t prepare it properly, each algorithm requires preparation which is suit for it. I’m going to explain by giving examples in a competition way among CatBoost and Xgboost.

With no clear preferred implementation option, I decided to develop a competition as a test. I made the interesting competition proving which boosting realization is good suited to. The competition begins from preparing features, it darts past through extracting data and fitting, it completes after feature explaning (how they were separated).

As a result the competition gives a road map for right interaction with the algorithms and gives knowledge which is better.

The competition and explanation is based on realworld data and real business problem.

I targeted this talk to a group who works with data and use ML algorithms in the main (also who begins working with data and finds their own way). Nevertheless I’ll reveal several visualization methods which will be interesting and comman Python programmers.

Main parts of the speech:

  • Introduce algorithms and choice 2 of them for the competition

  • Make the business problem which is going to be solved

  • Reveal data preparing problems

  • Explain Xgboost pipeline

  • Explain CatBoost pipeline

  • Compare and explain results

  • Conclusion

Notes

I am working as a big data analyst at at the best Russian cell operator MegaFon. There is a huge area for using boostings.

I make use of CatBoost for my daily work, CatBoost usually provides better results. It helps me encompass tasks (There were the task as “find a place for outdoor advertising”, “builders relevance scoring” )