-
Notifications
You must be signed in to change notification settings - Fork 21
Features in SolrWayback
This wiki site contains an overview of features available in SolrWayback. The following features are described below:
- Text Search
- Link Graphs
- Wordclouds
- N-grams
- Search result visualisation by domain
- Image Search
- Image Geo Search
- Search by upload
- Export
- Alternative playback
- Memento API
SolrWayback have many possibilities for discovery. One of these is free text search in all resources (HTML pages, PDFs, metadata for different media types, URLs, etc.).
List of search results with facets.
A lot of tools are to be found in the SolrWayback toolbox. One of these is the interactive link graph tool. This tool can be used to visualise ingoing and outgoing links.
Interactive domain link graph
Another tool found in the toolbox is the wordcloud generator. This tool can generate wordclouds from text on single domains.
A wordcloud for the domain youtube.com
The toolbox also contains a tool for visualising search results as an n-gram graph.
n-gram visualization of results by year, relative to the number of results that year.
Another way to visualise search results is by domain over time. SolrWayback also has a feature to analyse and visualise statistics on domain level. These statistics include the size of the domain and numbers of ingoing and outoging links.
Visualization of results by domain over time.
With a simple checkbox it is possible to gain access to an image search that only contains image results and show them in a way relatable to how Google Image Search presents images.
Image search, show only images as results.
Another image search capability is the image geo search, which searches images based on their GPS location.
Search in images by gps location in images having exif location information about the location.
Another way to search in SolrWayback is by uploading a file (e.g., image, PDF). By doing this you can check whether the file has been harvested and find HTML pages, that are using the uploaded file.
SolrWayback can export search results in multiple ways.
- Search results can be exported to WARC files, which is done through a streaming download. This means that there is no limit to the size of the downloaded WARC file.
- Text from search results can also be exported to CSV, where fields for export is customisable.
- Large scale export of link graphs in Gephi format. (See https://labs.statsbiblioteket.dk/linkgraph/)
In SolrWayback it is possible to configure an alternative playback engine. This can be done to utilise the search and discovery capabilities of SolrWayback and another engine for playback such as OpenWayback or pywb.
SolrWayback supports the memento protocol. Mementos of a given URL can be found at timegates like this: /solrwayback/services/memento/{date}/url
. Where the date can be left
out to retrieve the newest memento in the archive. Dates can be specified as wayback dates on the following format: 20170101120000
and also as shorter dates with only year and
month specified as an example: 201712
.
The memento timemap API is also supported. The timemap API supports the following response formats:
- link
- json
- spec
Link is the original format specified by memento. While the result returned with type=json is on the JSON format used by PyWB
instances and archive.org. This format is a JSON array of arrays of content, which looks a lot like the response from a CDX index API call. When using the format spec
a JSOn
response following the format specified by memento is returned.
The timemap API can be queried at: /solrwayback/services/memento/timemap/{type}/url
. The link
and spec
types supports paging of the result. If the result is paged, then
this functionality can be used with the following URL: /solrwayback/services/memento/timemap/page/{type}/url