Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

improved Alpaca data process time #1123

Merged

Conversation

mikazlopes
Copy link
Contributor

@mikazlopes mikazlopes commented Oct 23, 2023

I performed some changes in the Alpaca download_data, clean_data, add_technical_indicators, and add_vix, which reduce the processing time by 80%. I tested the data sanity by running both data collecting and cleaning processes side by side and comparing them using this function:


        def compare_data(data1, data2):
            return data1.equals(data2)


        is_equal = compare_data(data, data_downloaded)
        print(f"The data is equal: {is_equal}")
        (data == data_downloaded).all().all()
        different_rows = data[data != data_downloaded].dropna(how='all')
        print(different_rows)
        len(data) == len(data_downloaded)

And it returned True. I also changed a line in the data_processor.py code to allow it to use my trainer code at https://github.com/mikazlopes/training-farm, which can cache the clean data to avoid repeating the operation over and over using the same dataset. I also avoided using df.copy as much as possible to reduce RAM usage and add to the processing time.

@zhumingpassional
Copy link
Collaborator

Good codes.

thanks for your work.

@zhumingpassional zhumingpassional merged commit 75f1b87 into AI4Finance-Foundation:master Oct 25, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants