-
Notifications
You must be signed in to change notification settings - Fork 775
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes zotero/translators#3225 by adding Harvard Caselaw Access Project translator #3230
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some questions, some requests for changes
Harvard Caselaw Access Project.js
Outdated
"translatorID": "2a1cafb9-6f61-48d3-b621-c3265fde9eba", | ||
"label": "Harvard Caselaw Access Project", | ||
"creator": "Franklin Pezzuti Dyer", | ||
"target": "^https://(.*\\.)*case\\.law", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this every anything other then cite.case.law? If not, let's just use (cite\\.)?
(also, don't use *
where ?
suffices)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It can also be just case.law
, namely on the search pages. I can make this more specific by narrowing it down to (cite\\.)?case\\.law
.
Harvard Caselaw Access Project.js
Outdated
var rows = doc.querySelectorAll('div.result-title > div > a'); | ||
for (let row of rows) { | ||
let href = row.href; | ||
Z.debug(href); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
comment out all but the most essential debug statements. Most translators won't have any.
Harvard Caselaw Access Project.js
Outdated
"creators": [], | ||
"dateDecided": "1947-02-07", | ||
"court": "High Court of American Samoa", | ||
"docketNumber": "No. 2-1944", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd be inclined to strip [Nn]o\\.?\s*
from docketNumber
, but it is typically included in citations, so I could to either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have no preference, but I'll strip them for the sake of consistency with your new CourtListener translator and with this one.
Harvard Caselaw Access Project.js
Outdated
"items": [ | ||
{ | ||
"itemType": "case", | ||
"caseName": "PASA of FAGATOGO, Plaintiff v. FAIISIOTA of FAGANEANEA, Defendant", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we'll probably have to leave this as is, but I'm not loving the all caps here, if we can get rid of them? Problem is that I don't see a way in which we wouldn't also get rid of (properly) all capsed acronyms that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One brute-force solution occurs to me:
- Split the title into words and detect which ones are all-caps.
- For each of those words, search the case text body for the first (case-insensitive) match.
- If an all-caps match is found, chances are that it's an acronym and should be left capitalized.
- Otherwise, capitalize the word like a name. (I'm betting on the likelihood that if it's an acronym, it will appear again within the case text.)
This should deal with acronyms properly, but it involves searching an arbitrary amount of the case's text. Matches can probably be found early when they're present at all, so it may suffice to just look at the first couple paragraphs. Overkill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's clever -- and this is on triggered actions (i.e. within doWeb
) so we don't need to go crazy chasing efficiency (and it doesn't make additional requests to the site). Let's try it if you don't mind?
Co-authored-by: Sebastian Karcher <karcher@u.northwestern.edu>
…capitalization inferred from context
Great, thanks! |
Closes #3225.