-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DM-43722 Implement Transformed EFD service #72
base: main
Are you sure you want to change the base?
Conversation
This may be planned but you will need to cleanup the commit history. Commit messages should be like "Add something," etc. (Assuming you are familiar with DM standards on this, and I realize this is still a draft.) You should also not be doing merges from main. This should be done with |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still working through this, but posting some initial thoughts.
|
||
# Create and populate the data directory | ||
RUN mkdir -p /opt/lsst/software/stack/data | ||
COPY --chown=lsst:lsst tmp/efd_transform/*.db /opt/lsst/software/stack/data/ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not seeing where this tmp
directory comes from. Should this be removed for production?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it does. It was the database used before we had postgres available.
Dockerfile.efdtransform
Outdated
PGUSER="rubin" \ | ||
CONSDB_URL="sqlite:////opt/lsst/software/stack/data/test.db" \ | ||
TIMEDELTA="5" \ | ||
LOG_FILE="/opt/lsst/software/stack/data/transform.log" \ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Writing to stdout or stderr is more Kubernetes-friendly, and it doesn't risk (as much) filling the disk and crashing the job. But if jobs deal with short time periods, this is not so much of an issue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It currently writes logs to both stdout/stderr and a log file. The log file can be removed or made optional.
@@ -0,0 +1,39 @@ | |||
"""Provides a structured framework for processing and transforming data from the (EFD). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation here does not exactly conform to https://developer.lsst.io/python/numpydoc.html though it is close. A later ticket can be used to clean this up.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure. I'll double check it.
5047fa4
to
a9bfb54
Compare
…method. Introduced mutable tables for better data modifications. Added logging for memory usage to track performance. Allowed handling of duplicated idle tasks. Developed a new query method for improved data retrieval.
… errors across the codebase. Renaming the con attribute to connexion improves readability. Additionally, flake8-reported issues were fixed to maintain code quality.
a9bfb54
to
bad8fbd
Compare
Overview
This pull request implements a structured framework for processing and transforming data from the Engineering and Facilities Database (EFD). It enables data retrieval, transformation, schema generation, and integration within the LSST ecosystem.
Main components:
This implementation supports data processing for instruments such as LATISS, LSSTComCam, and LSSTComCamSim.
Configuration Framework
Field
,Topic
,Column
,ConfigModel
) for YAML validationconfig_latiss.yaml
)Data Transformation Pipeline
Transform
classInfluxDbDao
ExposureEfdDao/VisitEfdDao
Summary Statistics
Schema Generation & Alembic Migrations
generate_schema_from_config.py
creates database schemas from configuration filesexposure_efd
,visit1_efd
) and key-value tables (exposure_efd_unpivoted
,visit1_efd_unpivoted
)transformed_efd_scheduler
table for task trackingTask Management
QueueManager
handles task creation, retries, and status trackingCode Structure
config_model.py
– Configuration validation modelssummary.py
– Statistical operations on time-series datatransform.py
– Core transformation logictransform_efd.py
– CLI entry point and workflow orchestrationgenerate_schema_from_config.py
– Schema generationdao/*.py
– Database access layer (PostgreSQL/InfluxDB)queue_manager.py
– Task queue managementValidation & Error Handling
Testing