Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes zotero/translators#3225 by adding Harvard Caselaw Access Project translator #3230

Merged
merged 13 commits into from
Feb 18, 2024

Conversation

franklindyer
Copy link
Contributor

@franklindyer franklindyer commented Jan 15, 2024

Closes #3225.

Copy link
Collaborator

@adam3smith adam3smith left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some questions, some requests for changes

"translatorID": "2a1cafb9-6f61-48d3-b621-c3265fde9eba",
"label": "Harvard Caselaw Access Project",
"creator": "Franklin Pezzuti Dyer",
"target": "^https://(.*\\.)*case\\.law",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this every anything other then cite.case.law? If not, let's just use (cite\\.)? (also, don't use * where ? suffices)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It can also be just case.law, namely on the search pages. I can make this more specific by narrowing it down to (cite\\.)?case\\.law.

var rows = doc.querySelectorAll('div.result-title > div > a');
for (let row of rows) {
let href = row.href;
Z.debug(href);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

comment out all but the most essential debug statements. Most translators won't have any.

"creators": [],
"dateDecided": "1947-02-07",
"court": "High Court of American Samoa",
"docketNumber": "No. 2-1944",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd be inclined to strip [Nn]o\\.?\s* from docketNumber, but it is typically included in citations, so I could to either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have no preference, but I'll strip them for the sake of consistency with your new CourtListener translator and with this one.

"items": [
{
"itemType": "case",
"caseName": "PASA of FAGATOGO, Plaintiff v. FAIISIOTA of FAGANEANEA, Defendant",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we'll probably have to leave this as is, but I'm not loving the all caps here, if we can get rid of them? Problem is that I don't see a way in which we wouldn't also get rid of (properly) all capsed acronyms that way.

Copy link
Contributor Author

@franklindyer franklindyer Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One brute-force solution occurs to me:

  1. Split the title into words and detect which ones are all-caps.
  2. For each of those words, search the case text body for the first (case-insensitive) match.
  3. If an all-caps match is found, chances are that it's an acronym and should be left capitalized.
  4. Otherwise, capitalize the word like a name. (I'm betting on the likelihood that if it's an acronym, it will appear again within the case text.)

This should deal with acronyms properly, but it involves searching an arbitrary amount of the case's text. Matches can probably be found early when they're present at all, so it may suffice to just look at the first couple paragraphs. Overkill?

Copy link
Collaborator

@adam3smith adam3smith Jan 16, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's clever -- and this is on triggered actions (i.e. within doWeb) so we don't need to go crazy chasing efficiency (and it doesn't make additional requests to the site). Let's try it if you don't mind?

@adam3smith adam3smith merged commit c07d2df into zotero:master Feb 18, 2024
1 check failed
@adam3smith
Copy link
Collaborator

Great, thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

Add support for Harvard case.law
2 participants