improved Alpaca data process time #1123

mikazlopes · 2023-10-23T14:20:12Z

I performed some changes in the Alpaca download_data, clean_data, add_technical_indicators, and add_vix, which reduce the processing time by 80%. I tested the data sanity by running both data collecting and cleaning processes side by side and comparing them using this function:


        def compare_data(data1, data2):
            return data1.equals(data2)


        is_equal = compare_data(data, data_downloaded)
        print(f"The data is equal: {is_equal}")
        (data == data_downloaded).all().all()
        different_rows = data[data != data_downloaded].dropna(how='all')
        print(different_rows)
        len(data) == len(data_downloaded)

And it returned True. I also changed a line in the data_processor.py code to allow it to use my trainer code at https://github.com/mikazlopes/training-farm, which can cache the clean data to avoid repeating the operation over and over using the same dataset. I also avoided using df.copy as much as possible to reduce RAM usage and add to the processing time.

for more information, see https://pre-commit.ci

zhumingpassional · 2023-10-25T03:57:39Z

Good codes.

thanks for your work.

mikazlopes and others added 2 commits October 23, 2023 10:12

improved Alpaca data process time

f84126c

[pre-commit.ci] auto fixes from pre-commit.com hooks

273c8f2

for more information, see https://pre-commit.ci

zhumingpassional merged commit 75f1b87 into AI4Finance-Foundation:master Oct 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

improved Alpaca data process time #1123

improved Alpaca data process time #1123

mikazlopes commented Oct 23, 2023 •

edited

Loading

zhumingpassional commented Oct 25, 2023

improved Alpaca data process time #1123

improved Alpaca data process time #1123

Conversation

mikazlopes commented Oct 23, 2023 • edited Loading

zhumingpassional commented Oct 25, 2023

mikazlopes commented Oct 23, 2023 •

edited

Loading