Shape "Tech-debt payment #1" project #1550

binarylogic · 2020-01-20T18:13:21Z

I would like to discuss and obtain consensus on what 2 weeks of tech debt work would look like.

LucioFranco · 2020-01-20T21:51:26Z

Two big things stand out to me that we should work on as part of this issue:

Clean up unused code

Right now because we make literally everything public, meaning that all sink structs/types are exposed publicly via a public module nested in a public module. We never get warnings about dead/unused code because the compiler can't check future uses. I suggest that we make almost everything private except certain required types to allow the compiler to infer what parts are used/not used.

Add more doc comments

As the code base gets larger and we have more contributors it will become increasingly hard to keep track of specific implementation details. I want us to get into the habit of adding more doc comments explaining high level things related to the source. This could mean for batching to explain concepts around batching. This should enable any engineer to read the comment and ramp up on the actual code without having to jump from place to place.

MOZGIII · 2020-02-06T21:40:57Z

I'd be happy if we try to address #1514. Editor responsiveness with rust-analyzer is not as bad as with rls, but it's still a bit painful.

LucioFranco · 2020-02-07T16:48:33Z

The new rust-analyzers in the past week have been very good for me 👍 If you haven't tried it out I do recommend it!

lukesteensen · 2020-02-09T22:50:27Z

Proposal

I've gone through all the tech debt issues we've amassed and attempted to triage them. There are a few larger groupings that we should be able to knock out with well-planned changes, and a lot of one-off smaller things that can be tackled independently.

Below, I've broken down the larger items I think we should tackle.

Simplifying Shutdown

I believe this may be a partial blocker to the tokio/futures piece below, but it's worth discussing on its own.

Our current shutdown strategy is very difficult to understand. It's dealt with both at the topology level and within each source, and the implementations have no direct link, making them both more confusing than they should be.

While there are proposals floating around about various forms of structured concurrency that would be very useful for this kind of problem, none are far enough along that we should take the risk of adopting them. Instead, we should find the smallest, simplest change we can make that gets the code to a reasonable level of understandability.

This will require some exploration and a proposal process, but I'm reasonably optimistic that something as simple as passing a shutdown signal into each source at startup would make things much easier to follow.

Plan of attack

Explore options for simplifying shutdown
Write up proposal with spike-level code
Discuss and come to a consensus on how to proceed
Implement the agreed-upon changes

Related issues: #1091, #1195, #1702

Tokio 0.2 / Futures 0.3

This is the big one. We need to keep up with the ecosystem here to avoid becoming blocked on upgrading important dependencies.

Our goal should be to set ourselves up for easy, incremental change. Currently, we have some futures 0.3 code that we map back to futures 0.1 via the compat layer. Success here means flipping that around. We'll still have plenty of futures 0.1 code, but it should be compat-mapped into futures 0.3, which is then the default for all new code.

Actually moving totally to futures 0.3 will be a long, incremental journey. This piece of work should lay the foundation and plot a rough course.

Plan of attack

Investigate and clear potential blockers (Investigate file descriptor use with tokio 0.2 #1695, Verify that tokio 0.2 fixes concurrency bottleneck #1696)
Make the minimal change onto tokio-compat as described in Upgrade to tokio v0.2 and std::future::Future #1142
Test heavily
Triage existing futures 0.1 code

Other related issues: #1639, #716, #990

Simplifying Sinks

We've built up a lot of layers for contributors to use when building sinks. This has gotten to the point where the number of moving parts is much higher than it should be, making it difficult to work on and understand sink code.

The goal here should be to collapse and simplify the conceptual pieces involved in sink building, particularly around batching. There should be a few simple, reusable pieces for contributors to reach for, with clear names and obvious APIs.

Plan of attack

Survey sinks for patterns of both behavior and existing component use
Based on those observed patterns, write up a proposal for improved design of the shared components and a plan for migrating existing sinks, including spike-level code
Discuss and come to a consensus on how to proceed
Implement the agreed-upon changes

Related issues: #1666, #1713, #1659, #1408, #1676

Code Organization

As @LucioFranco noted above, our pervasive use of pub modules means that we miss out on warnings for things like unused code. This largely falls out of how we organize our code between main.rs and everything else.

We currently implement an enormous amount of logic directly in main. This was a simple way to start but has become a bit overwhelming. By carefully breaking down and factoring out that logic, we can have a much smaller, more understandable main that requires far less to be made public. We can then scale back what's exposed and regain those useful warnings.

Plan of attack

Survey main.rs for coherent chunks of logic that could effectively be factored out, either as functions or some Application-like struct with methods
Write up a high-level proposal for restructuring (i.e. not method-by-method, but a rough sketch to help others see direction)
Discuss and come to a consensus on how to proceed
Implement the agreed-upon changes

Related issues: #1383

There's an unending list of other tech debt items I'd like to tackle, but these four are (in my opinion) the most strategically important and well defined. They focus on keeping our component code easy to understand, modern, and clean.

I chose not to include things like improving compile times because I don't believe we have a good enough idea of what exactly needs to be done there and it's unclear how big an impact we could actually have.

There are others like adding doc comments that I think are important but make less sense as part of a one-off, concerted effort. These are the types of things we should start to do as part of our normal work, striving to always leave code more understandable than when we found it.

Let me know what you think!

binarylogic · 2020-02-10T03:02:47Z

Nice write up! Those look like good projects. Based on what you said:

I believe this may be a partial blocker to the tokio/futures piece below, but it's worth discussing on its own.

It sounds like "Tokio 0.2 / Futures 0.3" is the only project blocked, and we could start on the other 3 in parallel (if we wanted).

binarylogic · 2020-02-10T16:26:38Z

As part of the "Simplifying Sinks" work we should look at #1376. Nothing, we don't have to solve this as part of the project, it's just an interesting PR to look at.

lukesteensen · 2020-02-14T16:30:16Z

#299 is also relevant to the "Simplifying Sinks" work.

LucioFranco · 2020-02-19T18:30:27Z

I'd like to add another note here, this issue #1402 seems like a good candidate that we should clean up. I'd like to also propose that we move some of our http stuff outside of sinks::util into crate::http since we now have two transforms using that code.

binarylogic added the type: task Generic non-code related tasks label Jan 20, 2020

binarylogic added this to the Tech-debt payment #1 milestone Jan 20, 2020

binarylogic assigned lukesteensen Jan 20, 2020

binarylogic changed the title ~~Outline and plan tech-debt payment #1~~ Sahpe "Tech-debt payment #1" project Jan 27, 2020

binarylogic changed the title ~~Sahpe "Tech-debt payment #1" project~~ Shape "Tech-debt payment #1" project Jan 27, 2020

binarylogic assigned binarylogic and unassigned lukesteensen Feb 11, 2020

binarylogic removed this from the Tech-debt payment #1 milestone Feb 22, 2020

binarylogic mentioned this issue Feb 22, 2020

Spec out the "Simplify Shutdown" project #1894

Closed

binarylogic closed this as completed Mar 10, 2020

binarylogic mentioned this issue Sep 24, 2020

Collect logs from files v3 RFC #3480

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Shape "Tech-debt payment #1" project #1550

Shape "Tech-debt payment #1" project #1550

binarylogic commented Jan 20, 2020

LucioFranco commented Jan 20, 2020

MOZGIII commented Feb 6, 2020

LucioFranco commented Feb 7, 2020

lukesteensen commented Feb 9, 2020

binarylogic commented Feb 10, 2020

binarylogic commented Feb 10, 2020 •

edited

Loading

lukesteensen commented Feb 14, 2020

LucioFranco commented Feb 19, 2020

Shape "Tech-debt payment #1" project #1550

Shape "Tech-debt payment #1" project #1550

Comments

binarylogic commented Jan 20, 2020

LucioFranco commented Jan 20, 2020

Clean up unused code

Add more doc comments

MOZGIII commented Feb 6, 2020

LucioFranco commented Feb 7, 2020

lukesteensen commented Feb 9, 2020

Proposal

Simplifying Shutdown

Plan of attack

Tokio 0.2 / Futures 0.3

Plan of attack

Simplifying Sinks

Plan of attack

Code Organization

Plan of attack

binarylogic commented Feb 10, 2020

binarylogic commented Feb 10, 2020 • edited Loading

lukesteensen commented Feb 14, 2020

LucioFranco commented Feb 19, 2020

binarylogic commented Feb 10, 2020 •

edited

Loading