Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Initial SDK #2

Merged
merged 27 commits into from
Jan 3, 2024
Merged
Show file tree
Hide file tree
Changes from 23 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
6bb1b50
init commit
maaz-munir Nov 23, 2023
62e861a
some improvements + added bing and baidu
maaz-munir Nov 24, 2023
7ada257
Apply suggestions from code review
maaz-munir Nov 28, 2023
37300b9
added google_search + adjusted or multiple return types
maaz-munir Dec 1, 2023
6e66508
added google source and check for empty url
maaz-munir Dec 1, 2023
fb3c840
Apply suggestions from code review v2
maaz-munir Dec 7, 2023
e72bf48
added remaining google serp sources
maaz-munir Dec 7, 2023
2ffd8e4
comments + some more checks
maaz-munir Dec 8, 2023
f2f2749
check for async runtime models
maaz-munir Dec 11, 2023
14e3d24
Apply suggestions from code review v3 + yandex
maaz-munir Dec 12, 2023
f7dd301
bing and baidu async models + some improvements
maaz-munir Dec 12, 2023
905ed7c
2 google funcs + better error handling with channels
maaz-munir Dec 12, 2023
1420df5
rest of google sources for async polling model
maaz-munir Dec 12, 2023
d14237b
parse checks in google_async + some comment fixes
maaz-munir Dec 16, 2023
d3c3c64
proxy endpoint integration method
maaz-munir Dec 16, 2023
ace829f
send custom headers with proxy endpoint
maaz-munir Dec 16, 2023
0d54a22
make GeoLocation param a ptr
maaz-munir Dec 18, 2023
1a9a06b
refactor async functions
maaz-munir Dec 18, 2023
179164d
update creating payload in google_search funcs
maaz-munir Dec 19, 2023
ce1b8ce
update public func comments
maaz-munir Dec 19, 2023
8fda0b6
Apply suggestions from code review v4
maaz-munir Dec 20, 2023
608084f
update readme
maaz-munir Dec 20, 2023
c689f01
comment
maaz-munir Dec 20, 2023
9270fd9
comments + spelling fixes
maaz-munir Dec 20, 2023
8a4b9c8
update readme
maaz-munir Dec 20, 2023
fe7dda5
update readme
maaz-munir Dec 21, 2023
c250f36
fmt
maaz-munir Dec 21, 2023
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Empty file added CHANGELOG.md
Empty file.
Empty file added CODE_OF_CONDUCT.md
Empty file.
279 changes: 278 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
@@ -1 +1,278 @@
# oxylabs-sdk-go
# Oxylabs SDK Go

Welcome to the official SERP API SDK for [Oxylabs](https://oxylabs.io).

The Oxylabs SERP SDK simplifies interaction with the Oxylabs SERP API, providing a seamless integration for developers to retrieve search engine results pages (SERP) data with ease.

- [Features](#features)
- [Getting Started](#getting-started)
- [Requirements](#requirements)
- [Setting Up](#setting-up)
- [Quick Start](#quick-start)
- [General Information](#general-information)
- [Integration Methods](#integration-methods)
- [Sources](#sources)
- [Query Parameters](#query-parameters)
- [Configurable Options](#configurable-options)
- [Context Options for Google Sources](#context-options-for-google-sources)
- [Integration Methods](#integration-methods-1)
- [Realtime Integration](#realtime-integration)
- [Push-Pull (Polling) Integration](#push-pull)
- [Proxy Endpoint](#proxy-endpoint)

## Features

- **Simplified Interface:** Abstracts away complexities, offering a straightforward user interface for interacting with the Oxylabs SERP API.

- **Automated Request Management**: Streamlines the handling of API requests and responses for enhanced efficiency and reliability.

- **Error Handling:** Provides meaningful error messages and handles common API errors, simplifying troubleshooting.

- **Result Parsing:** Streamlines the process of extracting relevant data from SERP results, allowing developers to focus on application logic.

## Getting Started
You will need an Oxylabs API username and password which you can get by signing up at https://oxylabs.io. You can check things out with a free trial at https://oxylabs.io/products/scraper-api/serp for a week.


### Requirements
```bash
go 1.21.0 or above
```

### Setting Up

Start a local Go project if you don't have one:

```bash
go mod init
```

Install the package:

```bash
go get github.com/mslmio/oxylabs-sdk-go
```

### Quick Start
Basic usage of the SDK.

```go
package main

import (
"fmt"

"github.com/mslmio/oxylabs-sdk-go/oxylabs"
)

func main() {
// Set your Oxylabs API Credentials.
const username = "username"
const password = "password"

// Initialize the SERP realtime client with your credentials.
c := serp.Init(username, password)

// Use `google_search` as a source to scrape Google with adidas as a query.
res, err := c.ScrapeGoogleSearch(
"adidas",
)
if err != nil {
panic(err)
}

fmt.Printf("Results: %+v\n", res)
}
```

## General Information

### Integration Methods
There are three integration method for the Oxylabs SERP API.

- Realtime (Sync)
- Push-Pull (Async)
- Proxy Endpoint

To use either them you can just use the following init functions respectively:

- `serp.Init(username,password)`

- `serp.InitAsync(username,password)`

- `proxy.Init(username,password)`

Learn more about integration methods [on the official documentation](https://developers.oxylabs.io/scraper-apis/getting-started/integration-methods) and how this SDk uses them [here](#integration-methods-1).

### Sources
The Oxylabs SERP API scrapes according to the source provided to the API. There are currently four search engines you can scrape with the Oxylabs SERP API all with different sources.


| Search Engine | Sources
| ------------- | --------------
| **Google** | `google`, `google_search`, `google_ads`, `google_hotels`, `google_travel_hotels`, `google_images`, `google_suggest`, `google_trends_explore`
| **Yandex** | `yandex`, `yandex_search`
| **Bing** | `bing`, `bing_search`
| **Baidu** | `baidu`, `baidu_search`


Our SDK makes it easy for you, you just need to call the relevant function name from the client. For example if you wish to scrape Yandex with `yandex_search` as a source you
just need to invoke:

```go
res, err :=c.ScrapeYandexSearch(
"football",
)
```

### Query Parameters
Each source has different accepted query parameters. For a detailed list of accepted parameters by each source you can head over to https://developers.oxylabs.io/scraper-apis/serp-scraper-api.

This SDK provides you with the option to query with default parameters by not sending anything as the second argument as seen in the above example. Lets say we want to send in some query parameters it is as simple as:

```go
res, err := c.ScrapeYandexSearch(
"football",
&serp.YandexSearchOpts{
StartPage: 1,
Pages: 3,
Limit: 4,
Domain: "com",
Locale: "en",
},
)
```

### Configurable Options
For consistency and ease of use, this SDK provides a list of pre-defined commonly used parameter values as constants in our library.

Currently these are available for the `Render` and`UserAgent` parameters. For the full list you can check `oxylabs/types.go`. You can send in these values as strings too.

These can be used like this:

```go
res, err := c.ScrapeGoogleSearch(
"adidas",
&serp.GoogleSearchOpts{
UserAgent: oxylabs.UA_DESKTOP_CHROME, //desktop_chrome
Render: oxylabs.HTML, // html
Domain: oxylabs.DOMAIN_COM, // com
}
)
```

### Context Options for Google sources

The SDK easily allows you to send in context options relevant to google sources.

Here is an example of how you could send context options for Google Search:

```go
res, err := c.ScrapeGoogleSearch(
"adidas",
&serp.GoogleSearchOpts{
Parse: true,
Context: []func(serp.ContextOption){
serp.ResultsLanguage("en"),
serp.Filter(1),
serp.Tbm("isch"),
serp.LimitPerPage([]serp.PageLimit{{Page: 1, Limit: 1}, { Page: 2, Limit: 6}})
}
}
)
```
## Integration Methods

### Realtime Integration
Realtime is a synchronous integration method. This means that upon sending your job submission request, **you will have to keep the connection open** until we successfully finish your job or return an error.


The **TTL** of Realtime connections is **150 seconds**. There may be rare cases where your connection times out before you receive a response from us, for example, if our system is under heavier-than-usual load or the job you submitted was extremely hard to complete:


### Push Pull(Polling) Integration <a id="push-pull"></a>
Push-Pull is an asynchronous integration method. This SDK implements this integration with a polling technique to poll the endpoint for results after a set interval of time.

Using it as straightforward as using the realtime integration. The only difference is that it will return a channel with the Response. Below is an example of this integration method:


```go
package main

import (
"fmt"

"github.com/mslmio/oxylabs-sdk-go/oxylabs"
"github.com/mslmio/oxylabs-sdk-go/serp"
)

func main() {
const username = "username"
const password = "password"

// Initialize the SERP push-pull client with your credentials.
c := serp.InitAsync(username, password)

ch, err := c.ScrapeGoogleAds(
"adidas shoes",
&serp.GoogleAdsOpts{
UserAgent: oxylabs.UA_DESKTOP,
Parse: true,
},
)
if err != nil {
panic(err)
}

res := <-ch
fmt.Printf("Results: %+v\n", res)
}
```

### Proxy Endpoint
This method is also synchronous (like Realtime), but instead of using our service via a RESTful interface, you **can use our endpoint like a proxy**. Use Proxy Endpoint if you've used proxies before and would just like to get unblocked content from us.

Since the parameters in this method are sent as as headers there are only a few parameters which this integration method accepts. You can find those parameters at
https://developers.oxylabs.io/scraper-apis/getting-started/integration-methods/proxy-endpoint#accepted-parameters.

The proxy endpoint integration is very open ended allowing many different use cases. To cater this, the user is provided a pre-configured `http.Client` and they can use it as they deem fit:

```go
package main

import (
"fmt"
"io"
"net/http"

"github.com/mslmio/oxylabs-sdk-go/oxylabs"
"github.com/mslmio/oxylabs-sdk-go/proxy"
)

func main() {
const username = "username"
const password = "password"

// Init returns an http client pre configured with the proxy settings.
c, _ := proxy.Init(username, password)

request , _ := http.NewRequest(
"GET",
"https://www.example.com",
nil,
)

// Add relevant Headers.
proxy.AddGeoLocationHeader(request, "Germany")
proxy.AddUserAgentHeader(request, oxylabs.UA_DESKTOP)
proxy.AddRenderHeader(request, "html")
proxy.AddParseHeader(request, "google_search")


request.SetBasicAuth(username, Password)
response, _ := c.Do(request)

resp, _ := io.ReadAll(response.Body)
fmt.Println(string(resp))
}
```
3 changes: 3 additions & 0 deletions go.mod
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
module github.com/mslmio/oxylabs-sdk-go

go 1.21.0
46 changes: 46 additions & 0 deletions oxylabs/common.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
package oxylabs

import (
"fmt"
"net/url"
"strings"
"time"
)

var (
DefaultTimeout = 50 * time.Second
DefaultWaitTime = 2 * time.Second
)

func ValidateURL(
inputURL string,
host string,
) error {
// Check if url is empty.
if inputURL == "" {
return fmt.Errorf("url parameter is empty")
}

// Parse the URL.
parsedURL, err := url.ParseRequestURI(inputURL)
if err != nil {
return fmt.Errorf("failed to parse URL: %v", err)
}

// Check if the scheme (protocol) is present and non-empty.
if parsedURL.Scheme == "" {
return fmt.Errorf("url is missing scheme")
}

// Check if the Host is present and non-empty.
if parsedURL.Host == "" {
return fmt.Errorf("url is missing a host")
}

// Check if the Host matches the expected domain/host.
if !strings.Contains(parsedURL.Host, host) {
return fmt.Errorf("url does not belong to %s", host)
}

return nil
}
Loading