You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I noticed that the code calculates loss1 and loss2 in the same computational graph and updates them simultaneously, which is intuitively difficult to correspond to the Minimax strategy. Moreover, the test indicators based on single-step training cannot correspond to those in the paper (F1 -0.0035).
I tried to use asynchronous alternating updates, updating loss1 first and then loss2, i.e. the max-min strategy (Explanation here). I got similar experimental results to those in the paper (F1 +0.0001), which is better than single-step training.
Also trying to update loss2 first and then loss1, i.e. the min-max strategy, I got results that were better than the single step (F1 +0.0008) but worse than the max-min strategy (F1 -0.0028).
*All training was manually interrupted after only two epochs and dataset is SMD. It is not ruled out that the single-step training effect may be better after multiple epochs, but it is currently observed that the max-min strategy converges faster but takes longer to train.
Can someone explain this observation and the intuition behind using single-step training?
The text was updated successfully, but these errors were encountered:
I noticed that the code calculates loss1 and loss2 in the same computational graph and updates them simultaneously, which is intuitively difficult to correspond to the Minimax strategy. Moreover, the test indicators based on single-step training cannot correspond to those in the paper (F1 -0.0035).
I tried to use asynchronous alternating updates, updating loss1 first and then loss2, i.e. the max-min strategy (Explanation here). I got similar experimental results to those in the paper (F1 +0.0001), which is better than single-step training.
Also trying to update loss2 first and then loss1, i.e. the min-max strategy, I got results that were better than the single step (F1 +0.0008) but worse than the max-min strategy (F1 -0.0028).
*All training was manually interrupted after only two epochs and dataset is SMD. It is not ruled out that the single-step training effect may be better after multiple epochs, but it is currently observed that the max-min strategy converges faster but takes longer to train.
Can someone explain this observation and the intuition behind using single-step training?
The text was updated successfully, but these errors were encountered: