Skip to content

Commit 71db9c5

Browse files
committed
customization docs
1 parent a461443 commit 71db9c5

File tree

2 files changed

+122
-0
lines changed

2 files changed

+122
-0
lines changed

docs/source/customization.md

+114
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,114 @@
1+
# Customization
2+
3+
Papercast is designed to be extensible.
4+
5+
## Creating a Pipeline Component
6+
7+
The most straightforward customization is to create a new pipeline component. Pipeline Components are the building blocks of Papercast pipelines. There are three base classes that Pipeline Components can inherit from: `BaseProcessor`, `BaseSubscriber`, and `BasePublisher`.
8+
9+
### BaseProcessor
10+
11+
`BaseProcessor` is a base class for Pipeline Components that process a document or document identifier and return an updated document. `BaseProcessor` has two class attributes, `input_types` and `output_types`, which define the expected input and output of the Pipeline Component.
12+
13+
Subclasses of `BaseProcessor` must implement the abstract `process` method, which takes an instance of `Production` as input and returns an instance of `Production`. The `process` method should contain the logic for processing the input document and returning an updated document.
14+
15+
```python
16+
@abstractmethod
17+
def process(self, input: Production, *args, **kwargs) -> Production:
18+
```
19+
20+
### BaseSubscriber
21+
22+
`BaseSubscriber` is a base class for Pipeline Components that initiate document processing based on external events, like a webhook. Subclasses of `BaseSubscriber` must implement the abstract `subscribe` method, which should contain the logic for subscribing to the external event and triggering document processing.
23+
24+
```python
25+
@abstractmethod
26+
async def subscribe(self) -> Production:
27+
```
28+
29+
### BasePublisher
30+
31+
`BasePublisher` is a base class for Pipeline Components that publish documents to an external system. `BasePublisher` has one class attribute, `input_types`, which defines the expected input of the Pipeline Component.
32+
33+
Subclasses of `BasePublisher` must implement the abstract `process` method, which takes an instance of `Production` as input and publishes it to an external system. The `process` method should contain the logic for publishing the input document.
34+
35+
```python
36+
@abstractmethod
37+
def process(self, input: Production, *args, **kwargs) -> None:
38+
```
39+
40+
## Step 1: Choose a Base Class
41+
42+
**Processor:** Choose this if you want to process a document or document identifier and return an updated document.
43+
44+
**Subscriber:** Choose this if you want to initiate document processing based on external events, like a webhook.
45+
46+
**Publisher:** Choose this if you want to publish documents to an external system.
47+
48+
49+
## Step 2: Create the Component
50+
51+
To create a new Pipeline Component, you will need to create a new Python class that inherits from one of the base classes: `BaseProcessor`, `BaseSubscriber`, or `BasePublisher`.
52+
53+
For example, to create a new Processor, you can create a class that inherits from `BaseProcessor`:
54+
55+
```python
56+
class MyProcessor(BaseProcessor):
57+
input_types = {"my_input": str}
58+
output_types = {"my_output": str}
59+
60+
def process(self, input: Production, *args, **kwargs) -> Production:
61+
"""
62+
Processes a document by returning the input string with "processed" appended.
63+
64+
Args:
65+
input (Production): The input document containing a "my_input" attribute.
66+
67+
Returns:
68+
Production: The processed document containing a "my_output" attribute.
69+
"""
70+
input_string = getattr(input, "my_input")
71+
processed_string = f"{input_string} processed"
72+
output = Production(my_output=processed_string)
73+
return output
74+
```
75+
76+
In this example, `MyProcessor` inherits from `BaseProcessor` and defines the expected input and output types using the `input_types` and `output_types` class attributes. It also implements the `process` method to perform the processing logic and return the processed document.
77+
78+
You can customize the implementation of the `process` method to suit your specific use case. Once your new Pipeline Component is defined, you can instantiate it and use it as part of a Pipeline to perform your desired document processing.
79+
80+
## Step 3: Define Input and Output Types
81+
82+
When defining a new Pipeline Component, you will need to specify the expected input and output types of the component. This is done using the `input_types` and `output_types` class attributes, which are dictionaries that map attribute names to their expected data types.
83+
84+
For example, to define the input and output types for a custom Processor that takes a string as input and returns an integer as output, you can define the class like this:
85+
86+
```python
87+
class MyProcessor(BaseProcessor):
88+
input_types = {"input_string": str}
89+
output_types = {"output_int": int}
90+
91+
def process(self, input: Production, *args, **kwargs) -> Production:
92+
"""
93+
Processes a document by converting the input string to an integer and returning it.
94+
95+
Args:
96+
input (Production): The input document containing an "input_string" attribute.
97+
98+
Returns:
99+
Production: The processed document containing an "output_int" attribute.
100+
"""
101+
input_string = getattr(input, "input_string")
102+
output_int = int(input_string)
103+
output = Production(output_int=output_int)
104+
return output
105+
```
106+
107+
In this example, `MyProcessor` defines the expected input and output types using the `input_types` and `output_types` class attributes. The `input_types` attribute specifies that the input document should have an attribute named "input_string" that is of type `str`. The `output_types` attribute specifies that the processed document should have an attribute named "output_int" that is of type `int`.
108+
109+
You can customize the input and output types of your Pipeline Component to match your specific use case.
110+
111+
112+
## Step 4: Share your Pipeline Component (Optional)
113+
114+
If you would like to share your Pipeline Component with the Papercast community, you can follow the steps at the [contributing guide](./contributing.md).

docs/source/index.md

+8
Original file line numberDiff line numberDiff line change
@@ -34,6 +34,14 @@ Papercast is designed around 3 types of modules:
3434

3535
Customize the behavior at each of these steps by writing your own modules.
3636

37+
```{toctree}
38+
:caption: Customization
39+
:hidden:
40+
./customization.md
41+
```
42+
43+
44+
3745
```{toctree}
3846
:caption: Examples
3947
:hidden:

0 commit comments

Comments
 (0)