Skip to content

Commit

Permalink
feat(repo): update readme with extraction methods and how to use IDic…
Browse files Browse the repository at this point in the history
…tionary overload

feat(repo): add trivial change to test actions

fix(repo): add missing awaits to tests
  • Loading branch information
AJCJ1 authored and cjroebuck committed Jan 21, 2025
1 parent 4965ba8 commit 3363422
Show file tree
Hide file tree
Showing 4 changed files with 165 additions and 26 deletions.
92 changes: 81 additions & 11 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Check out our [blog](https://urlbox.com/blog) for more insights on everything sc
#### Checkout [OneMillionScreenshots](https://onemillionscreenshots.com/) - A site that uses Urlbox to show over 1 million of the web's homepages!
***


# Table Of Contents

<!-- TOC -->
Expand All @@ -27,6 +28,7 @@ Check out our [blog](https://urlbox.com/blog) for more insights on everything sc
* [Configuring Options](#configuring-options-)
* [Using the options builder](#using-the-options-builder)
* [Using the `new` keyword, setting during initialization](#using-the-new-keyword-setting-during-initialization)
* [What to do if an option isn't available in the builder](#what-to-do-if-an-option-isnt-available-in-the-builder)
* [Render Links - `GenerateRenderLink()`](#render-links---generaterenderlink)
* [Sync Requests - `Render()`](#sync-requests---render)
* [Async Requests - `RenderAsync()`](#async-requests---renderasync)
Expand All @@ -35,16 +37,23 @@ Check out our [blog](https://urlbox.com/blog) for more insights on everything sc
* [Handling Errors](#handling-errors)
* [Dependency Injection](#dependency-injection)
* [Utility Functions](#utility-functions)
* [`TakeScreenshot(options)`](#takescreenshotoptions)
* [`TakePdf(options)`](#takepdfoptions)
* [`TakeMp4(options)`](#takemp4options)
* [`TakeFullPage(options)`](#takefullpageoptions)
* [`TakeMobileScreenshot(options)`](#takemobilescreenshotoptions)
* [`TakeScreenshotWithMetadata(options)`](#takescreenshotwithmetadataoptions)
* [`ExtractMetadata(options)`](#extractmetadataoptions)
* [`ExtractMarkdown(options)`](#extractmarkdownoptions)
* [`ExtractHtml(options)`](#extracthtmloptions)
* [`ExtractMhtml(options)`](#extractmhtmloptions)
* [`DownloadAsBase64(options)`](#downloadasbase64options-)
* [`DownloadToFile(options, filePath)`](#downloadtofileoptions-filepath-)
* [`GeneratePNGUrl(options)`](#generatepngurloptions-)
* [`GenerateJPEGUrl(options)`](#generatejpegurloptions-)
* [`GeneratePDFUrl(options)`](#generatepdfurloptions-)
* [Popular Use Cases](#popular-use-cases)
* [Taking a Full Page Screenshot](#taking-a-full-page-screenshot)
* [Example MP4 (Full Page)](#example-mp4--full-page-)
* [Taking a Mobile view screenshot](#taking-a-mobile-view-screenshot)
* [Failing a request on 4XX-5XX](#failing-a-request-on-4xx-5xx)
* [Extracting Markdown/Metadata/HTML](#extracting-markdownmetadatahtml)
* [Generating a Screenshot Using a Selector](#generating-a-screenshot-using-a-selector)
Expand All @@ -67,9 +76,9 @@ Check out our [blog](https://urlbox.com/blog) for more insights on everything sc
* [`SyncUrlboxResponse`](#syncurlboxresponse)
* [`AsyncUrlboxResponse`](#asyncurlboxresponse)
* [`UrlboxException`](#urlboxexception)
* [`UrlboxMetadata`](#urlboxmetadata)
* [Available Enums](#available-enums)
* [Examples](#examples)
* [Example MP4 (Full Page)](#example-mp4--full-page-)
* [Example HTML](#example-html)
* [Example PDF](#example-pdf)
* [Example PDF Highlighting](#example-pdf-highlighting)
Expand Down Expand Up @@ -146,7 +155,7 @@ namespace MyNamespace

If you use the above with your own keys, it will give you back an object with a `renderUrl`. Making a GET request to that renderUrl will give you back a PNG back like this:

![](Images/urlbox-png.png)
![](../Images/urlbox-png.png)

***

Expand Down Expand Up @@ -214,6 +223,28 @@ options.FullPage = true;

AsyncUrlboxResponse response = await urlbox.TakeScreenshot(options);
```

### What to do if an option isn't available in the builder

Our [latest](https://urlbox.com/docs/options#engine_version) engine is updated regularly, including new options which are released to better help you render screenshots.

If you can't find an option within the builder, because our SDK isn't yet in sync with any latest changes, please do use our overloads for `render` and `renderAsync` which take an `IDictionary<string, object>` instead of a `UrlboxOptions` type.

Here's an example:

```CS
IDictionary<string, object> options = new Dictionary<string, object>
{
{ "click_accept", true },
{ "url", "https://urlbox.com" }
{ "theOption", "YouCouldntFind" }
};
SyncUrlboxResponse response = await urlbox.Render(options);

Console.WriteLine(response);
```
Please Bear in mind that this won't have the benefit of pre-validation.

***

## Render Links - `GenerateRenderLink()`
Expand Down Expand Up @@ -405,30 +436,43 @@ app.Run();

To make capturing and rendering screenshots even simpler, we’ve created several methods for common scenarios. Use these methods to quickly generate specific types of screenshots or files based on your needs:

### `TakeScreenshot(options)`
Our simplest method to take a screenshot. Uses the `/async` Urlbox endpoint, and polls until the render is ready to reduce the time network requests stay open.

### `TakePdf(options)`
Convert any URL or HTML into a PDF.

### `TakeMp4(options)`
Turn any URL or HTML into an MP4 video. For a scrolling effect over the entire page, set `FullPage = true` to capture the full length of the content.

### `DownloadAsBase64(options)`
### `TakeScreenshotWithMetadata(options)`
Takes a screenshot of any URL or HTML, bringing back a [UrlboxMetadata](#urlboxmetadata) object too with more information about the site.

### `ExtractMetadata(options)`
Takes a screenshot of any URL or HTML, but extracts only the metadata from the render. Useful when you only need the `UrlboxMetadata` object from the render.

### `ExtractMarkdown(options)`
Takes a screenshot of any URL or HTML, downloads it and gives back the extracted markdown file as a string.

### `ExtractHtml(options)`
Takes a screenshot of any URL or HTML, downloads it and gives back the extracted HTML file as a string.

### `ExtractMhtml(options)`
Takes a screenshot of any URL or HTML, downloads it and gives back the extracted MHTML file as a string.

### `DownloadAsBase64(options)`
Gets a render link, runs a GET to that link to render your screenshot, then downloads the screenshot file as a Base64 string.

### `DownloadToFile(options, filePath)`

Gets a render link, runs a GET to that link to render your screenshot, then downloads and stores the screenshot to the given filePath.

### `GeneratePNGUrl(options)`

Gets a render link for a screenshot in PNG format.

### `GenerateJPEGUrl(options)`

Gets a render link for a screenshot in JPEG format.

### `GeneratePDFUrl(options)`

Gets a render link for a screenshot in PDF format.

# Popular Use Cases
Expand Down Expand Up @@ -480,7 +524,7 @@ SyncUrlboxResponse response = await urlbox.Render(options);

Which should render you something like the below example:

![](../Examples/mobile.png)
![](/Examples/mobile.png)

## Failing a request on 4XX-5XX

Expand Down Expand Up @@ -562,7 +606,7 @@ SyncUrlboxResponse response = await urlbox.Render(options);

This will take the ID selector ".octicon-mark-github", and return a screenshot that looks like this:

![](./Images/gh.png)
![](../Images/gh.png)

## Uploading to the cloud via an S3 bucket

Expand Down Expand Up @@ -841,6 +885,32 @@ Properties:
- **`Code`** - The error code for the request. See a list [here](https://urlbox.com/docs/api#error-codes).
- **`Errors`** - A more detailed list of errors that occurred in the request.

#### `UrlboxMetadata`

Properties:

- **`UrlRequested`** - The original URL requested for rendering.
- **`UrlResolved`** - The final resolved URL after any redirects.
- **`Url`** - The canonical URL of the rendered page.
- **`Author`** - The author of the content, if available.
- **`Date`** - The publication date of the content, if available.
- **`Description`** - The meta description of the page.
- **`Image`** - The primary image of the page, if available.
- **`Logo`** - The logo associated with the page or publisher.
- **`Publisher`** - The name of the publisher of the content.
- **`Title`** - The title of the page.
- **`OgTitle`** - The Open Graph title of the page.
- **`OgImages`** - A list of Open Graph images found on the page.
- **`OgDescription`** - The Open Graph description of the page.
- **`OgUrl`** - The Open Graph URL of the page.
- **`OgType`** - The Open Graph type of the page (e.g., article, website).
- **`OgSiteName`** - The Open Graph site name of the page.
- **`OgLocale`** - The locale specified by Open Graph metadata.
- **`Charset`** - The character encoding used by the page.
- **`TwitterCard`** - The Twitter card type for the page.
- **`TwitterSite`** - The Twitter site associated with the page.
- **`TwitterCreator`** - The Twitter creator associated with the page.

### Available Enums

There are a number of options which are one of a select few. We have made enums for these, which can be accessed directly from the UrlboxOptions namespace:
Expand Down
8 changes: 4 additions & 4 deletions UrlboxSDK.MsTest/UrlboxTest.cs
Original file line number Diff line number Diff line change
Expand Up @@ -1221,7 +1221,7 @@ public async Task ExtractMetadata_Throws()

UrlboxOptions options = new(url: "https://urlbox.com");

Assert.ThrowsExceptionAsync<System.Exception>(async () => await urlbox.ExtractMetadata(options));
await Assert.ThrowsExceptionAsync<System.Exception>(async () => await urlbox.ExtractMetadata(options));
}

[TestMethod]
Expand Down Expand Up @@ -1313,7 +1313,7 @@ public async Task ExtractMarkdown_result_null_throws()

UrlboxOptions options = new(url: "https://urlbox.com");

Assert.ThrowsExceptionAsync<System.Exception>(async () => await urlbox.ExtractMarkdown(options));
await Assert.ThrowsExceptionAsync<System.Exception>(async () => await urlbox.ExtractMarkdown(options));
}

[TestMethod]
Expand Down Expand Up @@ -1405,7 +1405,7 @@ public async Task ExtractHtml_throws()

UrlboxOptions options = new(url: "https://urlbox.com");

Assert.ThrowsExceptionAsync<System.Exception>(async () => await urlbox.ExtractHtml(options));
await Assert.ThrowsExceptionAsync<System.Exception>(async () => await urlbox.ExtractHtml(options));
}

[TestMethod]
Expand Down Expand Up @@ -1522,6 +1522,6 @@ public async Task ExtractMhtml_throws()

UrlboxOptions options = new(url: "https://urlbox.com");

Assert.ThrowsExceptionAsync<System.Exception>(async () => await urlbox.ExtractMhtml(options));
await Assert.ThrowsExceptionAsync<System.Exception>(async () => await urlbox.ExtractMhtml(options));
}
}
2 changes: 1 addition & 1 deletion UrlboxSDK/Options/Resource/UrlboxOptions.cs
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
// <auto-generated />
//
// To parse this JSON data, add NuGet 'System.Text.Json' then do:
//
// TEST - trivial change
// using UrlboxSDK.Options.Resource;
//
// var urlboxOptions = UrlboxOptions.FromJson(jsonString);
Expand Down
Loading

0 comments on commit 3363422

Please sign in to comment.