Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARROW-467: [Python] Run Python parquet-cpp unit tests in Travis CI #311

Closed
wants to merge 2 commits into from

Conversation

wesm
Copy link
Member

@wesm wesm commented Feb 2, 2017

This means we'll have to tolerate broken builds whenever APIs change (a good incentive to avoid changing them as much as possible)

@wesm
Copy link
Member Author

wesm commented Feb 2, 2017

I can reproduce the Linux failure locally...trying to figure out what's going on

@wesm
Copy link
Member Author

wesm commented Feb 2, 2017

Yikes, well this is not great. Here we have

  • libarrow etc. built in Travis Ci
  • libparquet from conda-forge, build with devtoolset-2
  • pyarrow built linking to the former two libraries

This is yielding a segfault, valgrind reports:

==4159== Process terminating with default action of signal 11 (SIGSEGV)
==4159==  Bad permissions for mapped region at address 0x0
==4159==    at 0x0: ???
==4159==    by 0xDA6AFC4: parquet::AllocateBuffer(arrow::MemoryPool*, long) (???:539)
==4159==    by 0xDA6B9E8: parquet::InMemoryOutputStream::InMemoryOutputStream(arrow::MemoryPool*, long) (???:459)
==4159==    by 0xDA0297A: parquet::ColumnWriter::InitSinks() (???:59)
==4159==    by 0xDA0300C: parquet::ColumnWriter::ColumnWriter(parquet::ColumnChunkMetaDataBuilder*, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long, bool, parquet::Encoding::type, parquet::WriterProperties const*) (???:55)
==4159==    by 0xDA1396E: parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2> >::TypedColumnWriter(parquet::ColumnChunkMetaDataBuilder*, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long, parquet::Encoding::type, parquet::WriterProperties const*) (???:188)
==4159==    by 0xDA042B8: construct<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> >, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long int&, parquet::Encoding::type&, const parquet::WriterProperties*&> (???:120)
==4159==    by 0xDA042B8: _S_construct<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> >, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long int&, parquet::Encoding::type&, const parquet::WriterProperties*&> (???:254)
==4159==    by 0xDA042B8: construct<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> >, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long int&, parquet::Encoding::type&, const parquet::WriterProperties*&> (???:393)
==4159==    by 0xDA042B8: _Sp_counted_ptr_inplace<parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long int&, parquet::Encoding::type&, const parquet::WriterProperties*&> (???:399)
==4159==    by 0xDA042B8: construct<std::_Sp_counted_ptr_inplace<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> >, std::allocator<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> > >, (__gnu_cxx::_Lock_policy)2u>, const std::allocator<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> > >, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long int&, parquet::Encoding::type&, const parquet::WriterProperties*&> (???:120)
==4159==    by 0xDA042B8: _S_construct<std::_Sp_counted_ptr_inplace<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> >, std::allocator<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> > >, (__gnu_cxx::_Lock_policy)2u>, const std::allocator<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> > >, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long int&, parquet::Encoding::type&, const parquet::WriterProperties*&> (???:254)
==4159==    by 0xDA042B8: construct<std::_Sp_counted_ptr_inplace<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> >, std::allocator<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> > >, (__gnu_cxx::_Lock_policy)2u>, const std::allocator<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> > >, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long int&, parquet::Encoding::type&, const parquet::WriterProperties*&> (???:393)
==4159==    by 0xDA042B8: __shared_count<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> >, std::allocator<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> > >, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long int&, parquet::Encoding::type&, const parquet::WriterProperties*&> (???:502)
==4159==    by 0xDA042B8: __shared_ptr<std::allocator<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> > >, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long int&, parquet::Encoding::type&, const parquet::WriterProperties*&> (???:957)
==4159==    by 0xDA042B8: shared_ptr<std::allocator<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> > >, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long int&, parquet::Encoding::type&, const parquet::WriterProperties*&> (???:316)
==4159==    by 0xDA042B8: allocate_shared<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> >, std::allocator<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> > >, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long int&, parquet::Encoding::type&, const parquet::WriterProperties*&> (???:598)
==4159==    by 0xDA042B8: make_shared<parquet::TypedColumnWriter<parquet::DataType<(parquet::Type::type)2u> >, parquet::ColumnChunkMetaDataBuilder*&, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long int&, parquet::Encoding::type&, const parquet::WriterProperties*&> (???:614)
==4159==    by 0xDA042B8: parquet::ColumnWriter::Make(parquet::ColumnChunkMetaDataBuilder*, std::unique_ptr<parquet::PageWriter, std::default_delete<parquet::PageWriter> >, long, parquet::WriterProperties const*) (???:281)
==4159==    by 0xDA45E90: parquet::RowGroupSerializer::NextColumn() (???:172)
==4159==    by 0xD258BF6: parquet::arrow::FileWriter::Impl::WriteFlatColumnChunk(arrow::PrimitiveArray const*, long, long) (???:604)
==4159==    by 0xD258F6E: parquet::arrow::FileWriter::WriteFlatColumnChunk(arrow::Array const*, long, long) (???:702)
==4159==    by 0xD2593AF: parquet::arrow::WriteFlatTable(arrow::Table const*, arrow::MemoryPool*, std::shared_ptr<parquet::OutputStream> const&, long, std::shared_ptr<parquet::WriterProperties> const&) (???:740)
==4159==    by 0xD259A8D: parquet::arrow::WriteFlatTable(arrow::Table const*, arrow::MemoryPool*, std::shared_ptr<arrow::io::OutputStream> const&, long, std::shared_ptr<parquet::WriterProperties> const&) (???:752)
==4159== 

From what I can tell in gdb, the segfault is happening when parquet-cpp tries to use the arrow::PoolBuffer vtable. My conclusion is that, at least where virtual functions are involved at shared library boundaries, we need to use the same compiler.

@xhochy I think we have 2 ways forward:

  • Build parquet-cpp from source in the Arrow CI build
  • Set up a separate integration test repository on GitHub where we use Circle CI (where we can use Docker) / Travis CI (OS X) to validate the stack

@xhochy
Copy link
Member

xhochy commented Feb 2, 2017

I would prefer currently to do Build parquet-cpp from source in the Arrow CI build. Still in certain situations you should be able to mix the compiler but that can be quite different here as parquet-cpp is sandwiched in between the two arrow artefacts.

If you want to dig more into the problem, you could compare the arrow builds in Travis with those on conda with https://github.com/lvc/abi-compliance-checker

@wesm
Copy link
Member Author

wesm commented Feb 2, 2017

I'm working on this; I'll update this patch when I have the build working in Travis CI

@wesm wesm force-pushed the ARROW-467 branch 2 times, most recently from 73a73d4 to 89e3540 Compare February 2, 2017 21:02
@wesm
Copy link
Member Author

wesm commented Feb 2, 2017

I think I have it working now. Not sure what AppVeyor is cranky about though

wesm added 2 commits February 2, 2017 16:15
Change-Id: I9c3638bcc5c00763040601a4748bb5a71db3c51e
@wesm
Copy link
Member Author

wesm commented Feb 2, 2017

Rebased after PARQUET-834, ARROW-523

@wesm
Copy link
Member Author

wesm commented Feb 3, 2017

Passing build: https://travis-ci.org/wesm/arrow. Will merge in morning

Copy link
Member

@xhochy xhochy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1, LGTM

@asfgit asfgit closed this in 720d422 Feb 3, 2017
@wesm wesm deleted the ARROW-467 branch February 3, 2017 16:15
wesm added a commit to wesm/arrow that referenced this pull request Sep 2, 2018
I am working on ARROW-865 which exposes these to Python users

Closes apache#270

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#311 from wesm/PARQUET-915 and squashes the following commits:

0a89639 [Wes McKinney] Add test for time64[ns]
6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond]
37c1b42 [Wes McKinney] cpplint
5167a7a [Wes McKinney] Add unit test for date64->date32 cast
440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts
e626ebd [Wes McKinney] Use inline visitor in LevelBuilder
2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types
3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS
wesm added a commit to wesm/arrow that referenced this pull request Sep 4, 2018
I am working on ARROW-865 which exposes these to Python users

Closes apache#270

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#311 from wesm/PARQUET-915 and squashes the following commits:

0a89639 [Wes McKinney] Add test for time64[ns]
6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond]
37c1b42 [Wes McKinney] cpplint
5167a7a [Wes McKinney] Add unit test for date64->date32 cast
440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts
e626ebd [Wes McKinney] Use inline visitor in LevelBuilder
2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types
3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS

Change-Id: I37aade098d3e893e6987a212affdd5dccd33cc07
wesm added a commit to wesm/arrow that referenced this pull request Sep 6, 2018
I am working on ARROW-865 which exposes these to Python users

Closes apache#270

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#311 from wesm/PARQUET-915 and squashes the following commits:

0a89639 [Wes McKinney] Add test for time64[ns]
6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond]
37c1b42 [Wes McKinney] cpplint
5167a7a [Wes McKinney] Add unit test for date64->date32 cast
440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts
e626ebd [Wes McKinney] Use inline visitor in LevelBuilder
2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types
3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS

Change-Id: I37aade098d3e893e6987a212affdd5dccd33cc07
wesm added a commit to wesm/arrow that referenced this pull request Sep 7, 2018
I am working on ARROW-865 which exposes these to Python users

Closes apache#270

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#311 from wesm/PARQUET-915 and squashes the following commits:

0a89639 [Wes McKinney] Add test for time64[ns]
6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond]
37c1b42 [Wes McKinney] cpplint
5167a7a [Wes McKinney] Add unit test for date64->date32 cast
440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts
e626ebd [Wes McKinney] Use inline visitor in LevelBuilder
2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types
3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS

Change-Id: I37aade098d3e893e6987a212affdd5dccd33cc07
wesm added a commit to wesm/arrow that referenced this pull request Sep 8, 2018
I am working on ARROW-865 which exposes these to Python users

Closes apache#270

Author: Wes McKinney <wes.mckinney@twosigma.com>

Closes apache#311 from wesm/PARQUET-915 and squashes the following commits:

0a89639 [Wes McKinney] Add test for time64[ns]
6331d8c [Wes McKinney] Cast time32[second] to time32[millisecond]
37c1b42 [Wes McKinney] cpplint
5167a7a [Wes McKinney] Add unit test for date64->date32 cast
440b40f [Wes McKinney] Add unit test for date/time types that write without implicit casts
e626ebd [Wes McKinney] Use inline visitor in LevelBuilder
2ab7f12 [Wes McKinney] Plumbing and expansions for rest of Arrow date/time types
3aa64fa [Wes McKinney] Add conversion for TIMESTAMP_MICROS

Change-Id: I37aade098d3e893e6987a212affdd5dccd33cc07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants