Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Config of SVTR-CPPD (large) #11198

Closed
trantuankhoi opened this issue Nov 5, 2023 · 7 comments
Closed

Config of SVTR-CPPD (large) #11198

trantuankhoi opened this issue Nov 5, 2023 · 7 comments
Assignees

Comments

@trantuankhoi
Copy link

I have successfully trained the SVTR-CPPD (base version) model and achieved excellent results. However, I would like to improve the metrics further, so I am experimenting with SVTR-large as the backbone. I have tried searching and applying other SVTR-large (original) configs like here, but the accuracy is still 0% after a period of training. In the same time frame, SVTR-base has reached about 10%. I think the problem lies in the head config. If anyone has tried SVTR-CPPD (large) before, could you let me know if I have any wrong configs?

My config here:

  use_gpu: True
  epoch_num: 20
  log_smooth_window: 20
  print_batch_step: 10
  save_model_dir: /HDD/kedanhcaptraitim/CPPD/test
  save_epoch_step: 1
  # evaluation is run every 2000 iterations after the 0th iteration
  eval_batch_step: [0, 5000]
  cal_metric_during_train: True
  pretrained_model:
  checkpoints: /HDD/kedanhcaptraitim/CPPD/cppd_v1.2.0/best_accuracy
  save_inference_dir: ./resources/
  rec_model_dir:
  use_visualdl: True
  visualdl_file_name: vdlrecords
  infer_img: doc/imgs_words_en/word_10.png
  # for data or label process
  character_dict_path: ppocr/utils/dict/vietnamese_dict.txt
  character_type: korean
  max_text_length: 128
  infer_mode: False
  use_space_char: True
  save_res_path: ./output/rec/predicts_svtr_cppd_base.txt


Optimizer:
  name: AdamW
  beta1: 0.9
  beta2: 0.99
  epsilon: 1.e-8
  weight_decay: 0.05
  no_weight_decay_name: norm pos_embed char_node_embed pos_node_embed char_pos_embed vis_pos_embed
  one_dim_param_no_weight_decay: True
  lr:
    name: Cosine
    learning_rate: 0.000375 # 4gpus 256bs
    warmup_epoch: 4

Architecture:
  model_type: rec
  algorithm: CPPD
  Transform:
  Backbone:
    name: SVTRNet
    img_size: [32, 768]
    patch_merging: 'Conv'
    embed_dim: [192, 256, 512]
    depth: [6, 6, 9]
    num_heads: [6, 8, 16]
    mixer: ['Conv','Conv','Conv','Conv','Conv','Conv', 'Conv','Conv', 'Conv', 'Conv', 'Global','Global','Global','Global','Global','Global','Global','Global','Global','Global', 'Global']
    local_mixer: [[7, 11], [7, 11], [7, 11]]
    last_stage: False
    prenorm: True
  Head:
    name: CPPDHead
    dim: 512
    vis_seq: 384
    num_layer: 3
    max_len: 128

Loss:
  name: CPPDLoss
  ignore_index: &ignore_index 100 # must be greater than the number of character classes
  smoothing: True
  sideloss_weight: 1.0

PostProcess:
  name: CPPDLabelDecode

Metric:
  name: FschoolMetricEvaluation
  main_indicator: acc
  # Save prediction's log
  prediction_log: True # lưu json log
  # DOTS AND COMMAS config
  dots_and_commas: False
  max_of_difference_character: 2 # sai tối đa 2 ký tự
  # FILL THE BLANK config
  fill_the_blank: True
  threshold_variance : 0.5 # tỷ lệ độ dài chuỗi của prediction và target sẽ nằm trong khoảng [threshold_variance, 1]

Train:
  dataset:
    name: LMDBDataSet
    data_dir: /HDD/kedanhcaptraitim/Data/train/train_lmdb_v2.21.0
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - CPPDLabelEncode: # Class handling label
          ignore_index: *ignore_index
      - SVTRRecResizeImg:
          image_shape: [3, 32, 768]
          padding: False
      - KeepKeys:
          keep_keys: ['image', 'label', 'label_node', 'length'] # dataloader will return list in this order
  loader:
    shuffle: True
    batch_size_per_card: 8
    drop_last: True
    num_workers: 2
    use_shared_memory: True

Eval:
  dataset:
    name: SimpleDataSet
    data_dir: /HDD/kedanhcaptraitim/Data/test/private_test/FQA_v1_final_19.10
    label_file_list: [ "/HDD/kedanhcaptraitim/Data/test/private_test/FQA_v1_final_19.10/test.txt" ]
    transforms:
      - DecodeImage: # load image
          img_mode: BGR
          channel_first: False
      - CPPDLabelEncode: # Class handling label
          ignore_index: *ignore_index
      - SVTRRecResizeImg:
          image_shape: [3, 32, 768]
          padding: True
      - KeepKeys:
          keep_keys: ['image', 'label', 'label_node','length'] # dataloader will return list in this order
  loader:
    shuffle: False
    drop_last: False
    batch_size_per_card: 8
    num_workers: 2
    use_shared_memory: True```
@trantuankhoi
Copy link
Author

Hi @Topdu, can you take a look here pls? Thanks in advance

@Topdu
Copy link
Collaborator

Topdu commented Nov 5, 2023

checking the following config is right, pls:

learning_rate: 0.000375 # if using small batchsize, lr should be demoted.

ignore_index: &ignore_index 100 # must be greater than the number of character classes

  - SVTRRecResizeImg:
      image_shape: [3, 32, 768]
      padding: False # should be True according to eval config

local_mixer: [[7, 11], [7, 11], [7, 11]] # should be [[5, 5], [5, 5], [5, 5]] if using Conv mixer

@trantuankhoi
Copy link
Author

trantuankhoi commented Nov 5, 2023

Thanks for your reply and suggestions

My batchsize is 96, so I will set the lr = 0.000046875 (based on the paper).
About my language dictionary, it has 233 characters, so I will set ignore_index: &ignore_index 234 from now. But I also used ignore_index 100 for my base config and it works, although I dont know why.

The other configs are correct for me. Any notes for head'config which I missed, pls

@Topdu
Copy link
Collaborator

Topdu commented Nov 5, 2023

head'config which is correct and lr may be too small and should be set no less than 0.0001 from experience.
local_mixer: [[7, 11], [7, 11], [7, 11]] # should be [[5, 5], [5, 5], [5, 5]] if using Conv mixer

@trantuankhoi
Copy link
Author

Thanks a lot. I will experience one more time follow your suggestions

@trantuankhoi
Copy link
Author

trantuankhoi commented May 21, 2024

Hi @Topdu

I'm currently training the SVTR_CPPD (base) model and noticed that the train/loss_edge metric is significantly higher than the train/loss_node metric. I'm not sure if this is normal behavior, and I was curious to see if you observed this phenomenon in your experiments as well.

image

@Topdu
Copy link
Collaborator

Topdu commented Jun 1, 2024

Yes! This phenomenon in ours experiments as well.

@PaddlePaddle PaddlePaddle locked and limited conversation to collaborators Jun 10, 2024
@SWHL SWHL converted this issue into discussion #12949 Jun 10, 2024

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants