Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nucleotide specific fileds are empty while importing AmrFinderPlus results based on nucleotide sequences #87

Closed
m3hdad opened this issue Jun 4, 2024 · 3 comments
Labels
bug Something isn't working

Comments

@m3hdad
Copy link

m3hdad commented Jun 4, 2024

I have the following results from AmrFinderPlus by passing --name.

--name NAME
    Text to be added as the first column "name" to all rows of the report, for example it can be an assembly name

Clearly "Protein identifier" is NA however when the data is processed by hamronization NA values are passed for specific nucleotide_field_mapping columns such as Contig id, Start, Stop, etc.

I was just wondering if in this line, for the if statement, there should be a clause to consider whether the user chooses to use --name or not as index 0 might not always reflect "Protein identifier" column.

Name    Protein identifier  Contig id   Start   Stop    Strand  Gene symbol Sequence name   Scope   Element type    Element subtype Class   Subclass    Method  Target length   Reference sequence length   % Coverage of reference sequence    % Identity to reference sequence    Alignment length    Accession of closest sequence   Name of closest sequence    HMM id  HMM description
Thauera-sp_2A1  NA  NZ_SSXV01000004.1   210683  213541  +   clpK    heat shock survival AAA family ATPase ClpK  plus    STRESS  HEAT    NA  NA  BLASTX  953 949 100.00  94.02   953 ASF80763.1  heat shock survival AAA family ATPase ClpK  NA  NA
Thauera-sp_2A1  NA  NZ_SSXV01000008.1   103983  104330  -   merT    mercuric transport protein MerT plus    STRESS  METAL   MERCURY MERCURY BLASTX  116 116 100.00  93.97   116 AAA98222.1  mercuric transport protein MerT NA  NA
Thauera-sp_2A1  NA  NZ_SSXV01000143.1   103856  104203  -   merT    mercuric transport protein MerT plus    STRESS  METAL   MERCURY MERCURY BLASTX  116 116 100.00  93.97   116 AAA98222.1  mercuric transport protein MerT NA  NA

This is quite misleading when there are two copies of res genes matching with the same reference since the important information on contig id, start, stop columns will be ignored. See rows 2 and 3 in this example.

@m3hdad m3hdad added the bug Something isn't working label Jun 4, 2024
@fmaguire
Copy link
Member

fmaguire commented Jun 5, 2024

Thanks for pointing this out it looks like Name was not previously in the output so I'll update/modify now.

@m3hdad
Copy link
Author

m3hdad commented Jun 5, 2024

Thanks @fmaguire !

In AMRFinderPlus there is an option --name that prints out the sample names to the output.
You might wanna consider another approach (maybe specifying column name?) to cover both behavior in case users choose either options.

I updated the bug report to describe the behavior.

@fmaguire
Copy link
Member

Finally got around to fixing this issue... apologies for the delay @m3hdad

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants