Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

total is undefined in LoadingTask.onProgress #9103

Closed
VadimDez opened this issue Nov 4, 2017 · 17 comments
Closed

total is undefined in LoadingTask.onProgress #9103

VadimDez opened this issue Nov 4, 2017 · 17 comments
Labels

Comments

@VadimDez
Copy link

VadimDez commented Nov 4, 2017

Attach (recommended) or Link to PDF file here:
https://jsfiddle.net/p3ybwp7d/1/

Configuration:

  • Web browser and its version: Google Chrome - Version 62.0.3202.75
  • Operating system and its version: Mac OS 10.13
  • PDF.js version: 1.10.97
  • Is a browser extension: No

Steps to reproduce the problem:

Create loading task and add onProgress callback:

let loadingTask: any = PDFJS.getDocument(this.src);

loadingTask.onProgress = (progressData) => {
  // progressData won't contain "total", only "loaded"
};

What is the expected behaviour? (add screenshot)
In previous versions, onProgress did return both total and loaded.

What went wrong? (add screenshot)
total field is undefined in loadingTask.onProgress callback.

Link to a viewer (if hosted on a site other than mozilla.github.io/pdf.js or as Firefox/Chrome extension):
https://jsfiddle.net/p3ybwp7d/1/

@timvandermeij
Copy link
Contributor

timvandermeij commented Nov 4, 2017

Is this Chrome-specific? In Firefox I get the following output, which looks good:

PDF.js ProgressData
{"loaded":14480,"total":1016315}
{"loaded":836400,"total":1016315}
{"loaded":1016315,"total":1016315}

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Nov 4, 2017

It's related to fetch, since you can observe the same issue in Firefox too (with the dom.streams.enabled and javascript.options.streams prefs set in about:config).

@himanish-star
Copy link
Contributor

@timvandermeij , Can I try this issue ?

@Snuffleupagus
Copy link
Collaborator

Snuffleupagus commented Jan 15, 2018

Since this issue can be observed in two different browsers, are we sure that this isn't a problem with the Fetch standard[1] itself?
If it's a Fetch standard limitation, or perhaps a browser one, then I'm not sure if we'd be reasonably able to fix this in the PDF.js library.


[1] This looks like the relevant part of the specification: https://fetch.spec.whatwg.org/#terminology-headers

@himanish-star
Copy link
Contributor

himanish-star commented Jan 15, 2018

Basically while using the Firefox browser, total is defined in this manner

total: data.lengthComputable ? data.total : this._contentLength,

and while using Chrome browser, totalis defined in this way because it uses the fetch_stream
total: this._contentLength,

So basically this._contentLength is undefined in both cases but in the first case data.total contains the total count so it doesn't cause a problem. So we can try to do something similar in the second case too.


Shall I give this a try ?

@mukulmishra18
Copy link
Contributor

So basically this._contentLength is undefined in both cases but in the first case data.total contains the total count so it doesn't cause a problem.

In fetch_stream we are setting this._contentLength = source.length here and both source.length and data.total are logically same things. So I don't think doing the way you are thinking going to solve the problem.

@himanish-star
Copy link
Contributor

@mukulmishra18 , ok let me look into this.

@himanish-star
Copy link
Contributor

himanish-star commented Jan 16, 2018

@mukulmishra18 , could you explain when is the PDFFetchStream constructor called.

class PDFFetchStream {
I just want to see from where is the argument source passed. As source.length turns out to be undefined

@mukulmishra18
Copy link
Contributor

First we are setting PDFNetworkStream at

pdf.js/src/pdf.js

Lines 35 to 45 in 237bc2e

if (isNodeJS()) {
var PDFNodeStream = require('./display/node_stream.js').PDFNodeStream;
pdfjsDisplayAPI.setPDFNetworkStreamClass(PDFNodeStream);
} else if (typeof Response !== 'undefined' && 'body' in Response.prototype &&
typeof ReadableStream !== 'undefined') {
var PDFFetchStream = require('./display/fetch_stream.js').PDFFetchStream;
pdfjsDisplayAPI.setPDFNetworkStreamClass(PDFFetchStream);
} else {
var PDFNetworkStream = require('./display/network.js').PDFNetworkStream;
pdfjsDisplayAPI.setPDFNetworkStreamClass(PDFNetworkStream);
}

based on the environment and support for stream. So if streaming is supported by the browser then PDFNetworkStream is PDFFetchStream.

After this we are calling this constructor with all the provided params from:

networkStream = new PDFNetworkStream(params);

If you want to see what these params are, you can read here:

pdf.js/src/display/api.js

Lines 98 to 140 in 237bc2e

* Document initialization / loading parameters object.
*
* @typedef {Object} DocumentInitParameters
* @property {string} url - The URL of the PDF.
* @property {TypedArray|Array|string} data - Binary PDF data. Use typed arrays
* (Uint8Array) to improve the memory usage. If PDF data is BASE64-encoded,
* use atob() to convert it to a binary string first.
* @property {Object} httpHeaders - Basic authentication headers.
* @property {boolean} withCredentials - Indicates whether or not cross-site
* Access-Control requests should be made using credentials such as cookies
* or authorization headers. The default is false.
* @property {string} password - For decrypting password-protected PDFs.
* @property {TypedArray} initialData - A typed array with the first portion or
* all of the pdf data. Used by the extension since some data is already
* loaded before the switch to range requests.
* @property {number} length - The PDF file length. It's used for progress
* reports and range requests operations.
* @property {PDFDataRangeTransport} range
* @property {number} rangeChunkSize - Optional parameter to specify
* maximum number of bytes fetched per range request. The default value is
* 2^16 = 65536.
* @property {PDFWorker} worker - The worker that will be used for the loading
* and parsing of the PDF data.
* @property {string} docBaseUrl - (optional) The base URL of the document,
* used when attempting to recover valid absolute URLs for annotations, and
* outline items, that (incorrectly) only specify relative URLs.
* @property {string} nativeImageDecoderSupport - (optional) Strategy for
* decoding certain (simple) JPEG images in the browser. This is useful for
* environments without DOM image and canvas support, such as e.g. Node.js.
* Valid values are 'decode', 'display' or 'none'; where 'decode' is intended
* for browsers with full image/canvas support, 'display' for environments
* with limited image support through stubs (useful for SVG conversion),
* and 'none' where JPEG images will be decoded entirely by PDF.js.
* The default value is 'decode'.
* @property {Object} CMapReaderFactory - (optional) The factory that will be
* used when reading built-in CMap files. Providing a custom factory is useful
* for environments without `XMLHttpRequest` support, such as e.g. Node.js.
* The default value is {DOMCMapReaderFactory}.
* @property {boolean} stopAtErrors - (optional) Reject certain promises, e.g.
* `getOperatorList`, `getTextContent`, and `RenderTask`, when the associated
* PDF data cannot be successfully parsed, instead of attempting to recover
* whatever possible of the data. The default value is `false`.
*/

@himanish-star
Copy link
Contributor

@mukulmishra18 , Thanks for all this explanation and now I can understand things better.


In fetch_stream we are setting this._contentLength = source.length here and both source.length and data.total are logically same things.

But when I debugged this I found out that in both the cases i.e. be it fetchStream or not , the source.length is always undefined. I also found out that in the source object there exists no length parameter

total: data.lengthComputable ? data.total : this._contentLength,

In this line data.total contains the value of total size

onProgress: function NetworkManager_onProgress(xhrId, evt) {

In this line evt is actually the data object used above. So should I try doing something like this in the fetchStream case too.
Or
Can you suggest me something ? Because I can see that @Snuffleupagus was right on this as I think this is a fetch API limitation

@mukulmishra18
Copy link
Contributor

Can you suggest me something ?

I will suggest you to create a simple PDF.js app and try to run in Chrome and check if you are getting right headers(especially Content-Length) somewhere:

return response.headers.get(name);

I also think it may be a problem of fetch standard or browser as mentioned in #9103 (comment). If that is the case, we can't do a lot in PDF.js to fix this.

@himanish-star
Copy link
Contributor

I will suggest you to create a simple PDF.js app

I have been using the app all the time, mentioned in #9103 (comment)

check if you are getting right headers(especially Content-Length) somewhere:

That's what I'm trying to say that I can't find the content.length and it always comes out to be undefined. So even I now think that this is a short coming of the Fetch API

@Rob--W
Copy link
Member

Rob--W commented Feb 6, 2018

FYI, _contentLength is set at:

const getResponseHeader = (name) => {
return response.headers.get(name);
};
let { allowRangeRequests, suggestedLength, } =
validateRangeRequestCapabilities({
getResponseHeader,
isHttp: this._stream.isHttp,
rangeChunkSize: this._rangeChunkSize,
disableRange: this._disableRange,
});
this._contentLength = suggestedLength;

and the value originates from

let length = parseInt(getResponseHeader('Content-Length'), 10);
if (!Number.isInteger(length)) {
return returnValues;
}
returnValues.suggestedLength = length;

But this value is guarded behind range requests. Before the above snippet, the function returns early if Range requests are disabled:

function validateRangeRequestCapabilities({ getResponseHeader, isHttp,
rangeChunkSize, disableRange, }) {
assert(rangeChunkSize > 0, 'Range chunk size must be larger than zero');
let returnValues = {
allowRangeRequests: false,
suggestedLength: undefined,
};
if (disableRange || !isHttp) {
return returnValues;
}
if (getResponseHeader('Accept-Ranges') !== 'bytes') {
return returnValues;
}
let contentEncoding = getResponseHeader('Content-Encoding') || 'identity';
if (contentEncoding !== 'identity') {
return returnValues;
}
let length = parseInt(getResponseHeader('Content-Length'), 10);

I don't see an immediate reason for blocking that, so perhaps it makes sense to unconditionally use the value of the Content-Length header (unless Transfer-Encoding is specified but not starting with identity).

@GKich
Copy link

GKich commented Apr 5, 2018

Hi! My first comment in GitHub, sorry if I make a mistake.
I kind of found a solution:
The suggestedLength of returnValues is never really updated with the length calculated.

So I did:

returnValues.suggestedLength = length ; ,
before any "if" .

Also the Http header must match, so attention with Content-Length and Content-length (case sensitive)

@mwakerman
Copy link

mwakerman commented Apr 6, 2018

We just updated to 1.10.97 which broke our loading task. As a (hopefully temporary) workaround, we do a header-only fetch for the content-length header beforehand:

const total = await fetch(new Request(documentUrl, { method: 'HEAD', credentials: 'include' }))
    .then(res => parseInt(res.headers.get('content-length'), 10));
    
const loadingTask = window.PDFJS.getDocument({url: documentUrl, withCredentials: true});
loadingTask.onProgress = ({ loaded }) => {
    // do stuff with `loaded` and `total`
};

@Rob--W
Copy link
Member

Rob--W commented Apr 8, 2018

Adding good-beginner-bug label because I've already explained the issue and how it can be fixed in #9103 (comment)

@AliSubhan
Copy link

can anybody help me with the architecture of pdf.js . i'm working on the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants