Teaching a .NET developer new tricks: machine learning with ML.NET

Picking a topic

When looking for a dataset with which to start investigating a new machine learning technology, Kaggle offers a wealth of structured data that is available to download and use. I picked two datasets — one was a set of country-based geographic and demographic data, and the other the results of a survey for how happy people in a particular country claim to be, leading to a ‘happiness score’ and rank for each.

Data preparation

Before opening Visual Studio, I first did some data manipulation in good old Excel. Steps involved importing the two datasets and a list of country ISO codes from Wikipedia, adding some look-up formulae and making some manual updates — matching up country names and steering clear of various political issues such as disputed territories — to end up with a single sheet containing the combined dataset that I could output as a CSV file.

Using ML.NET

With the input data prepared, I then created a new console app project in Visual Studio using .NET Core and added the NuGet package Microsoft.ML package. At the time of writing the latest version available was 0.11. Obviously this is still a zero-point release, so the code shown in this article and held in the code repository could change, but as there’s already been some churn in the APIs it’s likely they are settling down now.

Training a model

The next step is to train the model which, when using ML.NET, involves constructing a training pipeline with the various steps necessary for the process. This might include further data cleansing if you haven’t been able to do it in the previous stages of preparing the input data file.

Evaluating a model

To evaluate the model, we use the dataset retained following the initial load to be used as a testing set. We apply that to the mode via the Transform method to get back another dataset (IDataView), this time containing predictions. By evaluating that we get some statistics indicating how successful our model was in predicting the score compared to the expected score we know from the training data.

Using the model for predictions

Once we have the model trained, evaluated to our satisfaction and saved to a file, we can then use it to make individual predictions (likely the use case for predictive models in practice). To do that, we construct an instance of our data object, populating the fields we have available to drive the prediction — in our case the country’s demographic data.

Further investigations into the results

Regression algorithm

As mentioned earlier, there are several regression algorithms available for selection when tackling this type of statistical problem with ML.NET. Someone better versed in the mathematics than I could likely justify in advance which would likely perform better, but my method was simply to try them out and see.

Feature importance

Although I had a model that was predicting results, it was acting rather as a black box, and it would be interesting to know a bit more about how it was reaching its predictive decisions. In particular, the question of which factors were considered most important was relevant.

Direction of correlations

At this point we can see which features are considered most important by the model, but not the direction they act. We might expect some to be obvious, such as greater GDP and lower infant mortality leading to an increase in the reported happiness score. But we can’t tell that from the metrics for sure, and for some — net migration to take a hot political topic — the direction of influence may not be obvious.

Conclusions

I found this an interesting exercise and was pleased to see how straightforward it was to use ML.NET for these types of problems for someone with a .NET developer background but more of a layman’s interest in data science problems. Learning Python or R is of course a possibility for someone in this position, but it’s a lot to take on, especially when you consider all the mathematical subject matter that’s also on the learning path. It’s nice to see this is something Microsoft is rapidly developing, as well as supporting the use of models generated from other platforms like TensorFlow.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Zone

Zone

We write about customer experience, employee experience, design, content & technology to share our knowledge with the wider community.