Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API: Log some general use metrics for APIs. #1104

Closed
kcondon opened this issue Nov 10, 2014 · 11 comments
Closed

API: Log some general use metrics for APIs. #1104

kcondon opened this issue Nov 10, 2014 · 11 comments
Labels
Feature: API Type: Feature a feature request User Role: Superuser Has access to the superuser dashboard and cares about how the system is configured

Comments

@kcondon
Copy link
Contributor

kcondon commented Nov 10, 2014

Seeing as the usage profile for APIs may be significantly different than for UI, it may be useful to record some general metrics for API use: timeofday, function/ request/ user/ ip addr.

We could/should? also consider adding Google Analytics calls when API is accessed as we do in the UI.

@kcondon kcondon added this to the In Review - Dataverse 4.0 milestone Nov 10, 2014
@scolapasta scolapasta modified the milestones: Beta 13 - Dataverse 4.0, In Review - Dataverse 4.0 Jan 23, 2015
@scolapasta scolapasta modified the milestones: Beta 13 - Dataverse 4.0, In Review - Dataverse 4.0 Feb 6, 2015
@eaquigley
Copy link
Contributor

@pdurbin I reviewed the four links you have listed in the above comment. To clarify, the comments in the Trello card are the suggested implementation, yes? Is that only for the Data Deposit SWORD API or for all APIs?

@pdurbin
Copy link
Member

pdurbin commented Feb 9, 2015

@eaquigley the Trello comments suggest tracking who deposited something based on who authenticated, which makes sense.

I would imagine we would want tracking not just for SWORD but any API.

It seems like @posixeleni has definite an initial use case/user story at https://lists.iq.harvard.edu/pipermail/pkp-dataverse-integration/2014-April/000059.html

@eaquigley eaquigley modified the milestones: In Review - Dataverse 4.0, Beta 13 - Dataverse 4.0, Post 4.0 Feb 9, 2015
@eaquigley
Copy link
Contributor

@kcondon
Copy link
Contributor Author

kcondon commented Feb 9, 2015

@pdurbin @eaquigley
My use case was to understand how much load is coming from API use. I don't believe API calls are covered by Google Analytics but we can just look at the access logs. Not sure if there is some better way or other use cases such as Eleni mentioned.

@posixeleni
Copy link
Contributor

@kcondon @eaquigley @pdurbin on top of my use case I think like Kevin said its important to know how many people are uploading, updating (generally managing) datasets via an API vs the GUI. This would help us prioritize new API features in the future.

@michbarsinai
Copy link
Member

I'd steer clear of IP addresses and users - these have privacy implications, and I'm not sure we want to go there unless we have some good reason to.
Other than that - great idea. We can also expand a bit, and have the engine store statistics about the commands it executes.

@scolapasta scolapasta modified the milestones: In Review - Long Term, In Review - Short Term May 8, 2015
@pdurbin
Copy link
Member

pdurbin commented Dec 15, 2015

I'd steer clear of IP addresses and users - these have privacy implications, and I'm not sure we want to go there unless we have some good reason to.

@michbarsinai #2729 (comment) has IP, username, country, city, etc.

@scolapasta scolapasta removed this from the Not Assigned to a Release milestone Jan 28, 2016
@pdurbin pdurbin removed the zTriaged label Jun 30, 2017
@pdurbin pdurbin added User Role: Superuser Has access to the superuser dashboard and cares about how the system is configured and removed zEffort 3: Large labels Jul 12, 2017
@pdurbin
Copy link
Member

pdurbin commented Mar 5, 2018

I don't believe API calls are covered by Google Analytics but we can just look at the access logs.

Right. Google Analytics depends on cookies so downloads done by scripts hitting Dataverse APIs won't be included. Over in #4481 I'm starting to look into server-side solutions that could give us metrics for all HTTP traffic, regardless of if it's made through the GUI or via a script or whatever. As @kcondon indicates, every request is logged to a Glassfish access log (if it's enabled) or an Apache access log (if Apache is being used).

@djbrooke
Copy link
Contributor

I'm going to close this, as we have Splunk Cloud at Harvard to generate reports/notifications about API usage and other groups could set up the log monitoring software of their choice.

@kcondon
Copy link
Contributor Author

kcondon commented Jul 27, 2020

@djbrooke I'm fine with whatever we decide but the intent was for IQSS Dataverse and by extension other installations, to have a quick and easy dashboard to understand site/load usage and APIs were and are an area where batch access and therefore high load is likely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature: API Type: Feature a feature request User Role: Superuser Has access to the superuser dashboard and cares about how the system is configured
Projects
None yet
Development

No branches or pull requests

8 participants