Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Output Not Escaped #15

Closed
kadler opened this issue Oct 19, 2018 · 32 comments
Closed

Output Not Escaped #15

kadler opened this issue Oct 19, 2018 · 32 comments
Labels
enhancement New feature or request

Comments

@kadler
Copy link
Member

kadler commented Oct 19, 2018

Original report by Danny Roessner (Bitbucket: droessner, GitHub: danny-hcs).


I have this test RPG program:

#!rpg
**free

dcl-pi *N;
  output char(10);
end-pi;

output = '&<>"''';

return;         

I use XMLSERVICE to call the program:

#!SQL

call QXMLSERV.iPLUG4K(
    'NA',
    '*here',
    '<pgm name="ESCAPETEST" lib="DROESSNER" error="on"><parm io="out"><data name="OUTPUT" type="10a"></data></parm></pgm>',
    'NULL'
);

The output is:

#!XML
<pgm name="ESCAPETEST" lib="DROESSNER" error="on">
  <parm io="out">
    <data name="OUTPUT" type="10a">&<>"'</data>
  </parm>
  <success><![CDATA[+++ success DROESSNER ESCAPETEST ]]></success>
</pgm>

None of the special XML characters are being escaped.

@kadler
Copy link
Member Author

kadler commented Oct 29, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


CALL XMLSERVICE.iPLUG4K('*NA', '*here', '

<pgm name=''XMLTEST'' lib=''JIMOXML'' error=''fast''>
<parm io=''out''>
<data type=''10A''>


', ?)

Would that be possible to work? XMLService can understand the CDATA passed in. Then user can choose the raw output format. The output is:

**]]>**

@kadler
Copy link
Member Author

kadler commented Oct 29, 2018

Original comment by Danny Roessner (Bitbucket: droessner, GitHub: danny-hcs).


@jimojimo That does seem to work for the example provided. Is there a reason for making the developer specify when to wrap the raw data in the CDATA tags instead of just always escaping the data? I can't see a case where I would want to pass back the raw data since there is a chance it could create invalid XML. Shouldn't it either always escape the data or wrap character data with the CDATA tag by default?

@kadler
Copy link
Member Author

kadler commented Oct 29, 2018

Original comment by Brian Jerome (Bitbucket: bjerome, GitHub: brianmjerome).


@jimojimo
Adding the to every char output field adds a lot of unnecessary byte size to the output, especially inside . Is there another solution?

@kadler
Copy link
Member Author

kadler commented Oct 30, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


Not sure if changing the raw output format may beak parsers of some clients without escape chars. I will turn to another option to see if this can be control by users with a flag.

@kadler
Copy link
Member Author

kadler commented Nov 2, 2018

Original comment by Brian Jerome (Bitbucket: bjerome, GitHub: brianmjerome).


@jimojimo Any progress on this?

@kadler
Copy link
Member Author

kadler commented Nov 5, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


Sorry.. got busy last week. I am going to look at if we can leverage the ipc flag doCDdata to surround all values.

@kadler
Copy link
Member Author

kadler commented Nov 5, 2018

Original comment by Danny Roessner (Bitbucket: droessner, GitHub: danny-hcs).


@jimojimo I have to also say that surrounding all values with a CDATA tag is not the way to go here. That adds unnecessary bytes to the output. The right way to fix this is to actually escape the reserved characters in the output so that valid XML is always returned. I was very surprised that this wasn't already being done.

@kadler
Copy link
Member Author

kadler commented Nov 6, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


-------------------------- cdata to turn global CDATA on ----------------------------------------------------------------
CALL XMLSERVICE.iPLUG4K('NA', 'here
cdata**', '  <pgm name=''XMLTEST'' lib=''JIMOXML'' error=''fast''>    <parm io=''out''>        <data type=''10A''>       ', ?)  

===== output ============>

  ]]> ****  

There is already a control flag to turn the CData flag on/off in the original design. Form the code, '*cdata' in the control parm to control global CData enabled; '<!CDATA[]>' specified in input xml to enable specified output fields.
Compared with scanning each output char and escape the performance should be better. Just a few more extra chars injected but leave controls to users without reserved char output. That might the initial intent that I tend to agree with. :-)

@kadler
Copy link
Member Author

kadler commented Nov 6, 2018

Original comment by Danny Roessner (Bitbucket: droessner, GitHub: danny-hcs).


@jimojimo Here's an example of why I disagree with this approach. We have a use case where we send grid data back to the client. This is a data structure array in the pgm. Let's say for example, we send back 500 records of 10 fields. That would add 5000 potentially unnecessary CDATA tags. At 12 bytes each, we are looking at 60KB of unnecessary data in the output. Is there a counter argument against just correctly escaping the xml characters?

@kadler
Copy link
Member Author

kadler commented Nov 7, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


@danny-hcs
Just had a look at procedure xmlWrkData. There is no flag/code to do escape. That is a brand new function. :-)

@kadler
Copy link
Member Author

kadler commented Nov 7, 2018

Original comment by Jesse G (Bitbucket: ThePrez, GitHub: ThePrez).


Changing the "kind" of this issue to "enhancement" (which I think is technically correct since the existing function allows for a *cdata option for this specific reason, presumably).

I'm guessing the underlying request here would be to have xmlservice automatically escape & as &amp as per https://www.w3.org/TR/xml/#syntax
Please verify that to be correct.

I think that's feasible to pursue, but we must proceed with caution when changing the default behavior to ensure that existing toolkits (or other software that may be using a non-compliant XML parser) will not be broken.

@kadler
Copy link
Member Author

kadler commented Nov 7, 2018

Original comment by Danny Roessner (Bitbucket: droessner, GitHub: danny-hcs).


I agree it's probably best to put this on a flag to allow for backward compatibility.

I think the service should convert:

& to &amp;

< to &lt;

> to &gt;

' to &apos;

" to &quot;

@kadler
Copy link
Member Author

kadler commented Nov 8, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


Agree with a new flag. I will look into the changes for reserved char escape. Looks like one more thing need to be mentioned that the output data length like '10A' may be changed to reflect the length of the escaped output, since escape injects more chars. Also seems if *cdata and escapse both specified, *cdata overrides escapse option.

@kadler
Copy link
Member Author

kadler commented Nov 8, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


@ThePrez if user requests like <data type=''1A''>, the output char is '&'. Then xmlservice escape it to '&amp' with 4 chars back. In theory, it's still just 1 char to user.
Should xmlservice set the ouput to <data type=''4A''>? Not sure if users would parse xml output depending on the length. I am not sure which way to go.

@kadler
Copy link
Member Author

kadler commented Nov 8, 2018

Original comment by Brian Jerome (Bitbucket: bjerome, GitHub: brianmjerome).


I think any modern xml parser wouldn't care what the length was.

@kadler
Copy link
Member Author

kadler commented Nov 12, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


OK. I will leave the length as is.. just escape the reserved chars. That's kind of reasonable that the length indicates the raw output.

@kadler
Copy link
Member Author

kadler commented Nov 14, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


@danny-hcs @brianmjerome I just pushed the changes into repository. Would you have a test please?
A new IPC flag '*escp' is added, which turn escape on.

#!xml
CALL XMLSERVICE.iPLUG512K( '*NA', 
'*here*escp', 
'<?xml version=''1.0''?> 
<myscript>  
<pgm name=''XMLTEST'' lib=''JIMOXML'' error=''fast''>
<parm io=''out'' >
<data type=''5A''></data>
</parm>
</pgm> 
</myscript>', ?)   
Return Code = 0 Output Parameter #4 (CO) = 
<?xml version='1.0'?> 
<myscript>  
<pgm name='XMLTEST' lib='JIMOXML' error='fast'> 
<parm io='out' > 
<data type='5A'>&amp;&lt;&apos;&quot;&gt;</data> 
</parm> 
<success><![CDATA[+++ success JIMOXML XMLTEST ]]></success> 
</pgm> 
</myscript>

@kadler
Copy link
Member Author

kadler commented Nov 15, 2018

Original comment by Danny Roessner (Bitbucket: droessner, GitHub: danny-hcs).


@jimojimo I tested with several different RPG programs with various output and the output was always escaped correctly. I think this looks good. Thanks for the quick turn-around on this one!

@kadler
Copy link
Member Author

kadler commented Nov 16, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


Thanks @danny-hcs

@kadler
Copy link
Member Author

kadler commented Nov 16, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


Added a new IPC control flag *escp to turn xml escape on.

@kadler
Copy link
Member Author

kadler commented Dec 11, 2018

Original comment by Brian Jerome (Bitbucket: bjerome, GitHub: brianmjerome).


@jimojimo When escaping a large string this flag seems to add a huge chunk of time before getting a response. Like 1s without *escp vs 15s with *escp for only about 64KB passed to XMLSERVICE. Can you please look into this?

@kadler
Copy link
Member Author

kadler commented Dec 12, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


@brianmjerome Can I have your test xml sample and output sample of your program called? As mentioned in early discussion, this could leads to performance downgrade, as for each char in the output, there is 5 times comparison for <'&">.

@kadler
Copy link
Member Author

kadler commented Dec 12, 2018

Original comment by Brian Jerome (Bitbucket: bjerome, GitHub: brianmjerome).


@jimojimo For each char in the output, does that include looking in the xml tags??

You can just take the example program and make OUTPUT io="both" and type="64000a", then input/output a 64000 character string.

@kadler
Copy link
Member Author

kadler commented Dec 13, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


@brianmjerome Nope. Just scan the data of output for escape.. not scanning any tags. I will take a test to how it goes on my system.

@kadler
Copy link
Member Author

kadler commented Dec 14, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


@brianmjerome Sorry I took back what I just commented. 1024 bytes increased about 300ms on my box. Looks like use *escp took 300 x 64 ms like your test.

@kadler
Copy link
Member Author

kadler commented Dec 14, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


@brianmjerome @danny-hcs Please refresh your code to see if it works now. I made typo's in declaration but rpg is so tolerant... without errors prompted when assigning values to const variables.. and comparisons ( = ) on those took more time (don't know why.. interesting..) Went bunches of wrong ways to locate the performance issue.. but it should be the right spot.. Sorry!
Tried with 64k, 640k, 512k.. looks not much time increased now.

@kadler
Copy link
Member Author

kadler commented Dec 17, 2018

Original comment by Danny Roessner (Bitbucket: droessner, GitHub: danny-hcs).


@jimojimo I installed the latest code and then tested a string of ~60k characters that contained many escape characters and the speed with and without the *escp flag was almost the same. This looks good to me.

@kadler
Copy link
Member Author

kadler commented Dec 18, 2018

Original comment by Ji Mo (Bitbucket: JiMoJiMo, GitHub: Unknown).


@danny-hcs Thank you for the test!

@kadler
Copy link
Member Author

kadler commented Dec 31, 2018

Original comment by Brian Jerome (Bitbucket: bjerome, GitHub: brianmjerome).


@jimojimo Thank you for addressing this issue. I verified the speed as well and looks much better!

@kadler kadler closed this as completed Dec 31, 2018
@bjeromeHCS
Copy link

@kadler Was this change put into 2.0.1 or is there a newer version? We re-installed 2.0.1 from the .zip available on the youngiprofessionals.com but it seems the output is not escaping these characters again even with the *escp flag.

@kadler
Copy link
Member Author

kadler commented Apr 15, 2019

This change was made after 2.0.1, so it wouldn't be in there and we have yet to release a version since then. You can either download the zip from that release or the latest from the master branch.

@bjeromeHCS
Copy link

Okay thanks. We'll work on switching our scripts to use the git repo instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants