How can a financial fraud detection model trained in one country be applied in another? How does mastery of C++ lead to rapid mastery of Java and C#?
Associate Professor Sinno Jialin Pan from Nanyang Technological University cited the first as an application of transfer learning, and the second as an analogy to explain how he would like to bring it forward.
Prof Pan believes that a machine can be said to be intelligent only if it has the ability to transfer learning. This is because the ability to learn and transfer skills or knowledge to new situation or context is a particularly strong aspect of human intelligence.
He first heard of the term “transfer learning” in 2006 as a PhD student working on a WiFi-based indoor localisation system using machine learning (ML) techniques. It referred to an ML paradigm motivated by human beings’ ability to transfer learning.
Guided by intuition
Intuition told Prof Pan that transfer learning could hold the answer to the WiFI localisation problem that he was working on. When doing experiments, he found that the distributions of Wi-Fi signals changed over time due to the dynamic environment and the use of different mobile devices. To ensure that a localisation system performs accurately, he had to figure out how to adapt a machine-learning-based model to the changing environment and different types of mobile devices.
Prof Pan set out to develop general transfer learning methodologies that would give machines the ability to learn by transferring knowledge across different tasks automatically. Unlike heuristic transfer learning methods which are designed for specific applications (such as image classification, sentiment classification, etc.), general transfer learning methodologies require two fundamental research issues to be addressed. They are: How to automatically measure the “distance” between any pair of domains or tasks, and how to design learning objectives based on domain/task-invariant information derived from the measurable distance.
Through his research, he found that kernel embedding of distributions was ideal for measuring the distance between domains or tasks. Based on this non-parametric technique, he developed several transfer learning methods to train a model on domain/task-invariant information and build a bridge between different domains/tasks for knowledge transfer.
Transfer learning in fraud detection
One of the many potential applications of this was in fraud detection. Prof Pan noted that ML techniques have been widely used to capture patterns in customers’ behaviours and build fraud detection models based on historical data. However, as behaviours are region-dependent, a fraud detection model trained with historical data from one region or country may fail to make accurate detections in another region or country.
At the same time, it requires a lot of historical data to train an accurate fraud detection model, and this may not be available in, for example, a new market. In this case, transfer learning is a promising technique to help adapt a well-trained fraud detection model to new regions or countries with only limited historical data.
But Prof Pan is still not satisfied. “Though many promising transfer learning methods have been developed, most of existing methods fail to accumulate knowledge when performing transfer learning,” he said. In other words, for each specific transfer learning task involving a specific pair of domains or tasks, the transfer learning procedure has to be run from scratch.
Reuse of knowledge
What Prof Pan is now embarking on is an attempt to develop a continual transfer learning framework, where the machine gets “smarter” and “smarter” after solving more and more transfer learning tasks. He likens this to a computer science student spending six months to master the C++ programming language. After that when he/she wants to learn the Java programming language, he/she may only need to spend less than three months to master it.
If the student further wants to learn the C# programming language, he/she may only need to spend days to master it. “The reason behind this is that with the transfer of learning, the student’s understanding or knowledge of object-oriented languages becomes deeper after he/she learns Java, which also helps him/her to learn C# faster,” he explained.
To translate this learning behaviour into transfer learning algorithms, the knowledge needs to be distilled and cumulated after performing each transfer learning task. A key research issue is how to represent knowledge in a more compact form after the “learning”, so that it can be refined and reused in the next transfer learning task. “In this way, knowledge can be accumulated, which makes machines’ transfer learning ability more powerful,” he said.