A learning theory of meta learning

This paper gives a brief introduction to recent theoretical advance of meta learning.


INFORMATION SCIENCE
Special Topic: Machine Learning Automation

Fang Yao
A key feature of human intelligence is the ability to transfer knowledge learned from multiple learning tasks to other similar but different ones.Namely, humans attempt to learn how to generalize, and thus they can tackle a continuous stream of learning tasks.Correspondingly, in real-world applications, it is always crucial to improve learning in a new task by leveraging knowledge transferred from related tasks that have already been learned.Meta learning or learning to learn aims to tackle such a problem.As an exciting new research direction, there has been a significant interest in developing a theory to formulate meta learning.
The early works have mainly focused on meta learning via representation learning [1 ] or meta-representation learning [2 ].Fixed representation or meta representation has insufficient flexibility that deals with changing task distributions.
Recently, a simulating learning methodology (SLeM) approach [3 ] treats meta learning as learning an explicit hyperparameter prediction meta learner h : T → , mapping from the learning task space T to the hyperparameter space , which should potentially own better task generalizability than early focuses.Formally, one could use LM (D ; ψ ) to represent the extracted learner f ∈ F by implementing the corresponding machine learning process equipped with the hyperparameter configuration ψ ∈ by inputting the dataset D for the task at hand.This formulation allows one to apply standard statistical learning theory tools to investigate meta-learning theory.The following theorem presents the upper bound of the task transfer risk for SLeM meta learning, where ˆ h , h * are the estimated and true meta learners, respectively.
Goal of the generalization Predicting label y for a query sample x Predicting hyperparameter configurations ψ for a query task T Theorem 1 [3 ].
In Table 1 we list the main related notations of machine learning and SLeM meta learning, so that we can easily compare their main differences.As is well known, most procedures of conventional statistical machine learning are designed to learn a model from one single task/distribution.Differently, SLeM aims to construct a meta learner from multiple tasks (i.e.{ D t = (D tr t , D val t ) } T t=1 ), and then transferably use it to help fulfil l Natl Sci Rev, 2024, Vol.11, nwae133 new test tasks μ ∼ η.In addition to the complexity of learning the task-specific learners f ∈ F for machine learning, SLeM contains a hyperparameter α that expresses the similarity between training tasks and test tasks, the complexity of learning the meta learner h ∈ H and the pre-defined divergence metric d F (μ s t , μ q t ) between suppor t (training ) and query (test) sets.The last metric especial ly tel ls us that the requirement of machine learning for the distribution of training and test sets to be consistent is not necessary for SLeM meta learning.All of these elements provide a fresh under-standing and extension of conventional machine learning.Besides, similar to the structural risk minimization (SRM) principle in machine learning [4 ], it is hopeful that SLeM can develop proper metaregularization techniques based on the proposed theory in Theorem 1 to enhance their generalization capability at the task level.This should be potentially beneficial to the meta-learning field, just like the significance of the SRM principle in the conventional machine learning field.

Table 1 .
[3 ]arison of the main notations used in machine learning and SLeM meta learning[3 ].