Add support for Harvard case.law #3225

adam3smith · 2024-01-13T18:51:10Z

https://cite.case.law/

Requested: https://forums.zotero.org/discussion/110768/zotero-connector-and-case-law-courtlistener/p1

Looks like you can easily get to API-based JSON results that should work well

franklindyer · 2024-01-14T22:19:13Z

I was tinkering with this and noticed some really strange query selector behavior, maybe someone can point out what I'm doing wrong. I'm using this entry as my test page.

First step was to extract the api.case.law URL corresponding to the given cite.case.law page. At first glance it might look like we can snatch the case ID e.g. 9903854 directly from the URL, but this isn't reliably possible, see for example this page in which the cite page URL doesn't contain the ID used in the api page URL. So it will be necessary to actually extract the api page URL from an HTML element's href attribute.

When I use the following query selector in my scrape function, it yields no results:

attr(doc, "a[href*='api.case.law/v1/cases/']", 'href');

But when I open the page in another browser, the corresponding query selector does yield results:

document.querySelectorAll("a[href*='api.case.law/v1/cases/']")

Also, using a regex-based text search of the entire document's innerHTML (which is awful, but I needed a sanity check) also does yield a match:

doc.body.innerHTML.match(/api\.case\.law\/v1\/cases\/([0-9]+)/)[0];

Any ideas what's going wrong here? Can anyone reproduce this? Maybe there is a silly mistake in my query selector. But I can't see why the same query selector that fails in scrape would succeed in another browser (I also have "defer": true, so I don't think it's a timing issue) even when the desired element is actually present in the raw text of the page's HTML.

adam3smith · 2024-01-15T01:54:43Z

How exactly are you testing this? I just loaded https://cite.case.law/am-samoa/2/3/ into the Scaffold browser, created a new translator with the Web Translator template, changed detect so it always detects as a case and then put
Z.debug(attr(doc, "a[href*='api.case.law/v1/cases/']", 'href')) into the scrape function and ran doWeb. That returned https://api.case.law/v1/cases/206939/ as expected.
It sounds like you starting with the test cases (otherwise defer wouldn't matter)? I wouldn't recommend that. Test-driven development is not an effective way to develop Zotero translators. Figure out what works with the built-in detect and do buttons first, then handle page loading issues (which can come up with tests in general) later.

franklindyer · 2024-01-15T18:01:06Z

Okay, thanks for the tip. I've tried again using just doWeb and added the exact same line of code Z.debug(attr(doc, "a[href*='api.case.law/v1/cases/']", 'href')) to my scrape function. It still gives no hits, despite the fact that the body plainly contains the anchor tag I'm looking for, and the regex match continues to find it successfully.

So, at the very least, we can say this isn't a page loading issue now... but I'm not sure what local misconfiguration could be causing this only on my end. (I'm not behind on any Zotero updates.)

adam3smith · 2024-01-15T18:04:00Z

This is in scaffold?

franklindyer · 2024-01-15T18:04:32Z

That's correct.

adam3smith · 2024-01-15T20:15:45Z

Hmm -- at this point hard to say anything without seeing more code -- could you either put this into a draft PR or on a gist?

franklindyer · 2024-01-15T21:04:53Z

Here's a gist containing the code I currently have for the translator, and I'm testing on the /am-samoa/2/3/ test case using the Run do* button as per your suggestion. The output is

14:02:48 Running doWeb
14:02:48 
14:02:48 Translation successful

adam3smith · 2024-01-15T21:18:11Z

Works for me. I'm in Zotero 7, but I can't really see how that matters.
You sure you have the page open in the browser in Scaffold? Because you also should be getting a monster return for the doc.body function debug you have in there.

franklindyer · 2024-01-15T21:42:24Z

Oops, I commented out the doc.body dump locally but forgot to do so in the gist. In any case, yes, with that line uncommented I get a huge return for the body and nothing for the href.

Turns out I was using Zotero 6, and I just installed the Zotero 7 beta as a last resort... and the code works for me there! Perplexing...

adam3smith · 2024-01-15T22:22:44Z

Cool. I'm guessing it might be the old Firefox version running underneath Zotero 6? I don't think I have seen this before. We obviously want to test with Z6, but I'm guessing it'll work outside of scaffold

Fixes #3225

adam3smith added New Translator Pull requests for new translators Difficulty: Easy labels Jan 13, 2024

This was referenced Jan 16, 2024

Fixes zotero/translators#3225 by adding Harvard Caselaw Access Project translator #3230

Merged

Fixes zotero/translators#3124 by adding translator for TinRead #3223

Merged

adam3smith closed this as completed in #3230 Feb 18, 2024

adam3smith pushed a commit that referenced this issue Feb 18, 2024

Adding Harvard Caselaw Access Project translator (#3230)

c07d2df

Fixes #3225

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for Harvard case.law #3225

Add support for Harvard case.law #3225

adam3smith commented Jan 13, 2024

franklindyer commented Jan 14, 2024

adam3smith commented Jan 15, 2024 •

edited

Loading

franklindyer commented Jan 15, 2024

adam3smith commented Jan 15, 2024

franklindyer commented Jan 15, 2024

adam3smith commented Jan 15, 2024

franklindyer commented Jan 15, 2024

adam3smith commented Jan 15, 2024

franklindyer commented Jan 15, 2024 •

edited

Loading

adam3smith commented Jan 15, 2024

Add support for Harvard case.law #3225

Add support for Harvard case.law #3225

Comments

adam3smith commented Jan 13, 2024

franklindyer commented Jan 14, 2024

adam3smith commented Jan 15, 2024 • edited Loading

franklindyer commented Jan 15, 2024

adam3smith commented Jan 15, 2024

franklindyer commented Jan 15, 2024

adam3smith commented Jan 15, 2024

franklindyer commented Jan 15, 2024

adam3smith commented Jan 15, 2024

franklindyer commented Jan 15, 2024 • edited Loading

adam3smith commented Jan 15, 2024

adam3smith commented Jan 15, 2024 •

edited

Loading

franklindyer commented Jan 15, 2024 •

edited

Loading