Skip to content

Commit

Permalink
Merge pull request #55 from arthurprevot/packaging
Browse files Browse the repository at this point in the history
Packaging for pypi
  • Loading branch information
arthurprevot authored Nov 1, 2021
2 parents 19378a9 + 800ecbf commit c8bd23d
Show file tree
Hide file tree
Showing 3 changed files with 213 additions and 2 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# pyspark_aws_etl
# Yaetos

This is a framework to write ETLs on top of [spark](http://spark.apache.org/) (the python binding, pyspark) and deploy them to Amazon Web Services (AWS). It can run locally (using local datasets and running the process on your machine), or on AWS (using S3 datasets and running the process on an AWS cluster). The emphasis was on simplicity while giving access to the full power of spark for processing large datasets. All job input and output definitions are in a human readable yaml file.
This is a framework to write ETLs on top of [spark](http://spark.apache.org/) (the python binding, pyspark) and deploy them to Amazon Web Services (AWS). It can run locally (using local datasets and running the process on your machine), or on AWS (using S3 datasets and running the process on an AWS cluster). The emphasis was on simplicity while giving access to the full power of spark for processing large datasets. All job input and output definitions are in a human readable yaml file. It's name stands for "Yet Another ETL Tool on Spark".
- In the simplest cases, an ETL job can consist of an SQL file only. No need to know any programming for these.
- In more complex cases, an ETL job can consist of a python file, giving access to Spark dataframes, RDDs and any python library.

Expand Down
7 changes: 7 additions & 0 deletions setup.cfg
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
[metadata]
# This includes the license file(s) in the wheel.
# https://wheel.readthedocs.io/en/stable/user_guide.html#including-license-files-in-the-generated-wheel-file
license_files = LICENSE.md

[bdist_wheel]
universal=1
204 changes: 204 additions & 0 deletions setup.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,204 @@
"""A setuptools based setup module.
See:
https://pypi.org/project/yaetos
https://github.com/arthurprevot/yaetos
"""

# Always prefer setuptools over distutils
from setuptools import setup, find_packages
import pathlib

here = pathlib.Path(__file__).parent.resolve()

# Get the long description from the README file
long_description = (here / 'README.md').read_text(encoding='utf-8')

# Arguments marked as "Required" below must be included for upload to PyPI.
# Fields marked as "Optional" may be commented out.

setup(
# This is the name of your project. The first time you publish this
# package, this name will be registered for you. It will determine how
# users can install this project, e.g.:
#
# $ pip install sampleproject
#
# And where it will live on PyPI: https://pypi.org/project/sampleproject/
#
# There are some restrictions on what makes a valid project name
# specification here:
# https://packaging.python.org/specifications/core-metadata/#name
name='yaetos', # Required

# Versions should comply with PEP 440:
# https://www.python.org/dev/peps/pep-0440/
#
# For a discussion on single-sourcing the version across setup.py and the
# project code, see
# https://packaging.python.org/guides/single-sourcing-package-version/
version='0.9.0', # Required

# This is a one-line description or tagline of what your project does. This
# corresponds to the "Summary" metadata field:
# https://packaging.python.org/specifications/core-metadata/#summary
description='Data Pipelines with Spark on AWS', # Optional

# This is an optional longer description of your project that represents
# the body of text which users will see when they visit PyPI.
#
# Often, this is the same as your README, so you can just read it in from
# that file directly (as we have already done above)
#
# This field corresponds to the "Description" metadata field:
# https://packaging.python.org/specifications/core-metadata/#description-optional
long_description=long_description, # Optional

# Denotes that our long_description is in Markdown; valid values are
# text/plain, text/x-rst, and text/markdown
#
# Optional if long_description is written in reStructuredText (rst) but
# required for plain-text or Markdown; if unspecified, "applications should
# attempt to render [the long_description] as text/x-rst; charset=UTF-8 and
# fall back to text/plain if it is not valid rst" (see link below)
#
# This field corresponds to the "Description-Content-Type" metadata field:
# https://packaging.python.org/specifications/core-metadata/#description-content-type-optional
long_description_content_type='text/markdown', # Optional (see note above)

# This should be a valid link to your project's main homepage.
#
# This field corresponds to the "Home-Page" metadata field:
# https://packaging.python.org/specifications/core-metadata/#home-page-optional
url='https://github.com/arthurprevot/yaetos', # Optional

# This should be your name or the name of the organization which owns the
# project.
author='Arthur Prevot', # Optional

# This should be a valid email address corresponding to the author listed
# above.
author_email='prevota@gmail.com', # Optional

# Classifiers help users find your project by categorizing it.
#
# For a list of valid classifiers, see https://pypi.org/classifiers/
classifiers=[ # Optional
# How mature is this project? Common values are
# 3 - Alpha
# 4 - Beta
# 5 - Production/Stable
'Development Status :: 4 - Beta',

# Indicate who your project is intended for
'Intended Audience :: Developers',
'Topic :: Scientific/Engineering',

# Pick your license as you wish
'License :: OSI Approved :: Apache Software License',

# Specify the Python versions you support here. In particular, ensure
# that you indicate you support Python 3. These classifiers are *not*
# checked by 'pip install'. See instead 'python_requires' below.
'Programming Language :: Python :: 2.7',
'Programming Language :: Python :: 3',
'Programming Language :: Python :: 3.6',
'Programming Language :: Python :: 3.7',
'Programming Language :: Python :: 3.8',
'Programming Language :: Python :: 3.9',
"Programming Language :: Python :: 3.10",
# 'Programming Language :: Python :: 3 :: Only',
],

# This field adds keywords for your project which will appear on the
# project page. What does your project relate to?
#
# Note that this is a list of additional keywords, separated
# by commas, to be used to assist searching for the distribution in a
# larger catalog.
keywords='etl,data pipelines,spark,aws', # Optional

# When your source code is in a subdirectory under the project root, e.g.
# `src/`, it is necessary to specify the `package_dir` argument.
# package_dir={'': 'src'}, # Optional

# You can just specify package directories manually here if your project is
# simple. Or you can use find_packages().
#
# Alternatively, if you just want to distribute a single Python file, use
# the `py_modules` argument instead as follows, which will expect a file
# called `my_module.py` to exist:
#
# py_modules=["my_module"],
#
packages=find_packages(where='core'), # Required

# Specify which Python versions you support. In contrast to the
# 'Programming Language' classifiers above, 'pip install' will check this
# and refuse to install the project if the version does not match. See
# https://packaging.python.org/guides/distributing-packages-using-setuptools/#python-requires
python_requires='>=3.6, <4',

# This field lists other packages that your project depends on to run.
# Any package you put here will be installed by pip when your project is
# installed, so they must be valid existing projects.
#
# For an analysis of "install_requires" vs pip's requirements files see:
# https://packaging.python.org/discussions/install-requires-vs-requirements/
# install_requires=['peppercorn'], # Optional

# List additional groups of dependencies here (e.g. development
# dependencies). Users will be able to install these using the "extras"
# syntax, for example:
#
# $ pip install sampleproject[dev]
#
# Similar to `install_requires` above, these must be valid existing
# projects.
# extras_require={ # Optional
# 'dev': ['check-manifest'],
# 'test': ['coverage'],
# },

# If there are data files included in your packages that need to be
# installed, specify them here.
# package_data={ # Optional
# 'sample': ['package_data.dat'],
# },

# Although 'package_data' is the preferred approach, in some case you may
# need to place data files outside of your packages. See:
# http://docs.python.org/distutils/setupscript.html#installing-additional-files
#
# In this case, 'data_file' will be installed into '<sys.prefix>/my_data'
# data_files=[('my_data', ['data/data_file'])], # Optional

# To provide executable scripts, use entry points in preference to the
# "scripts" keyword. Entry points provide cross-platform support and allow
# `pip` to create the appropriate form of executable for the target
# platform.
#
# For example, the following would provide a command called `sample` which
# executes the function `main` from this package when invoked:
# entry_points={ # Optional
# 'console_scripts': [
# 'sample=sample:main',
# ],
# },

# List additional URLs that are relevant to your project as a dict.
#
# This field corresponds to the "Project-URL" metadata fields:
# https://packaging.python.org/specifications/core-metadata/#project-url-multiple-use
#
# Examples listed include a pattern for specifying where the package tracks
# issues, where the source is hosted, where to say thanks to the package
# maintainers, and where to support the project financially. The key is
# what's used to render the link text on PyPI.
project_urls={ # Optional
'Bug Reports': 'https://github.com/arthurprevot/yaetos/issues',
# 'Funding': 'https://donate.pypi.org',
# 'Say Thanks!': 'http://saythanks.io/to/example',
'Source': 'https://github.com/arthurprevot/yaetos/',
},
)

0 comments on commit c8bd23d

Please sign in to comment.