Machine Learning for STA with AutoGluon

Christopher Nelson

February 1, 2023

In our recent book, Methods of Strategic Trade Analysis, we delve into how using various machine learning methods can be used to uncover potential illicit trade in strategic goods:

"The challenges presented by the modern strategic trade environment lend themselves to machine learning techniques. There is an ever-increasing volume of international trade, which increases the amount of data collected. This huge data stream can be used to screen for strategic trade transactions. There is also a myriad of avenues for illegality, subversion, and diversion of strategic goods – misclassification, smuggling, illicit transshipment/re-exporting, fake transaction parties, etc. Compounding all of this are the limited resources state authorities have to investigate illicit strategic trade and the competing economic priority to move goods as quickly as possible. Machine learning models are intended to tackle exactly these situations, i.e. where we have a massive amount of data, need to find new or shifting patterns in that data, and cannot manually review all the data in any practical way. It is easier now more than ever to apply machine learning to strategic trade due to more robust data collection, greater storage capabilities, increased computing power and decentralized computing, and broader accessibility of machine learning techniques."

The book presents quite a few techniques to approach STA via machine learning, but only delves into a few of the many algorithms available to analysts to actually develop and produce a machine learning model - there is particular focus on the RandomForest algorithm. While there are few "bad" choices in algorithm selection, it can be difficult to

know where to start;
easily compare the results to find the best choice between algorithms; and
know when a combination (or ensemble) or algorithms might yield better results.

One potential tool is to employ the AutoGluon package in Python. From their website:

"AutoGluon enables easy-to-use and easy-to-extend AutoML with a focus on automated stack ensembling, deep learning, and real-world applications spanning image, text, and tabular data. Intended for both ML beginners and experts, AutoGluon enables you to:

Quickly prototype deep learning and classical ML solutions for your raw data with a few lines of code.
Automatically utilize state-of-the-art techniques (where appropriate) without expert knowledge.
Leverage automatic hyperparameter tuning, model selection/ensembling, architecture search, and data processing.
Easily improve/tune your bespoke models and data pipelines, or customize AutoGluon for your use-case."

With a relatively few lines of code, AutoGluon allows you to automatically prepare your data and train it on all eligible machine learning models. AutoGluon can pre-process your data, split the data into training and test sets, and find the best performing models and combination of models (an ensemble approach). Beyond this, it has the key advantages of being easy to use and open-source.

For example, in the following few lines of code, a model will be created using AutoGluon for tabular data (the most likely type of data we would be using for STA):

from autogluon.tabular import TabularDataset, TabularPredictor

>>> train_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/train.csv')

>>> test_data = TabularDataset('https://autogluon.s3.amazonaws.com/datasets/Inc/test.csv')

>>> predictor = TabularPredictor(label='class').fit(train_data=train_data)

>>> predictions = predictor.predict(test_data)

This creates an ensemble model using all of AutoGluon's available algorithms. You can then examine the contribution of each to the overall performance and decide to exclude some, focus on one for simplicity, and other tuning.

Autogluon might not be the solution for every user or application, but depending on your knowledge of machine learning, it might give you an efficient way to move from the intensive work of modeling and onto the core task of analyzing your data for indicators of illicit strategic trade.

For more details see the AutoGluon website and check out their cheat sheet here.

Send us your questions and comments