Selecting Optimal Context Sentences for Event-Event Relation Extraction
Introduction Understanding events entails recognizing the structural and temporal orders between eve(...)
12/12/2021 Computer Vision
Ha Bui (Research Resident)
This paper aims to propose a novel method to tackle the domain-shift problem in a domain generalization (DG) framework. DG has recently become a critical but challenging research topic in machine learning due to its real-world applicability and its close connection to the way humans learn to generalize a new domain. In a DG framework, the learner is trained on multiple datasets collected under different environments without any access to any data on the target domain but still has to perform well on this dataset.
One of the most notable approaches to this problem is to learn the “Domain-Invariant (DI)” features across these training datasets, which relies on the assumption that these invariant representations are also held in unseen target domains. While this has been shown to work well in practice, one of the key drawbacks of the existing dominant DI approach is that it can completely ignore “Domain-Specific (DS)” information that could aid the generalization performance, especially when the number of source domains increases.
To handle these shortcomings, we propose a novel theoretically sound DG approach that aims to extract label-informative DS and then explicitly disentangles the DI and DS representations in an efficient way without training multiple networks for DS. We develop a rigorous framework to formulate elements of DI/DS representations, in which our key insight is to introduce an effective meta-optimization training framework to learn DS representation from multiple training domains. Without accessing any data from unseen target domains, the meta-training procedure provides a suitable mechanism to self-learn DS representation.
The domain-invariant representation Zi is obtained by using an adversarial training framework, in which the domain discriminator D tries to maximize the prediction probability of the domain label from the latent Zi, while the goal of the encoder Q is to map the sample X to the latent Zi, such that Di cannot discriminate between the domain of X. This task can be performed by solving the following min-max game:
To extract the domain-specific Zs from R, we propose the use of the domain classifier D, which is trained to predict the domain label from Zs. The corresponding parameters of D and R are, therefore, optimized with the objective function below:
The disentanglement condition between two random vectors Zi and Zs can be solved by forcing their covariance matrix, denoted by Cov(Zi, Zs) close to 0. The related parameters of Q and R are then updated in the following optimization problem:
Sufficiency of Ds and Di w.r.t the classification task.} The goal of the classifier F is to predict the label of the original sample X based on the domain-invariant Zi and domain-specific Zs, i.e.,
wheredenotes the concatenation operation. Then, the training process of F is then performed by solving:
To encourage the domain-specific representationZsto adapt information learned from the source domains to the unseen target domain, we introduce the use of meta-learning framework, targeting a robust generalization. Note that the domain-invariant feature Zi remains during the meta-learning procedure. In particular, each source domainis split into two subdomains, namely meta-train Smr and meta-test Sme. The domain-specific parameters of R and the classifier parameters of F are then jointly optimized as follows:
The pseudo-code for training and inference processes of our proposed mDSDI framework is presented in the Algorithm below. Each iteration of the training process consists of two steps:
First, we integrate the objective functions (1), (2), (3), and (5) to construct an objective function La defined as follows:
The second step is to employ meta-training to adapt task-related DS from source domains to unseen domains. In each mini-batch, the meta-train and meta-test are split, then the gradient transformation step from meta-train domains to the meta-test domain is performed by solving the optimization problem (6).
Assumption 1 indicates that, for the source domain S1, we can learn ZS1=R(X1) such that I(ZS1;Y|X2) is strictly positive and equals to I(X1;Y|X2), where I(X1;Y|X2) is the specific information that correlates with the label in domain , but not in domain S2. For instance, in the example mentioned in the introduction, if domain S1 is photo while S2 is sketch, the value of epsilon1 should be positive because the background information such as a house, the ocean also provides information to predict whether the object is a dog or fish without considering its conceptual drawing.
Following this plausible assumption, we prove our model having a better generalization and the ineffectiveness of only learning domain-invariant approach by the following theorem:
Ha Bui (Research Resident)