Pipeline Team

People

Team members

James Greenhill
Team lead
Software Engineer
Yakko Majuri
Software Engineer
Karl-Aksel Puulmann
Software Engineer
Tiina Turban
Software Engineer
Harry Waye
Software Engineer
Luke Harries
Head of Product
Xavier Vello
Software Engineer

Mission

Provide the best events pipeline in the world.

Objectives: Q1 2023

Objective: Performance
- Key Results: We have wrapped up the person-on-event project and have deprecated the old non-person-on-events queries
  - Why? Performance speed up
- Key Results: We have reduced the cost per event for capture by an order of magnitude
  - Why? Infra savings and improves performance
Objective: Reliability
- Key Results: We have converted all current US dashboards into IaC dashboards configured in Terraform and made all necessary migrations from StatsD to Prom to support this.
  - Why? Gets US and EU equivalent in terms of monitoring
- Key Results: All of our alerts have runbooks
  - Why? Improve incident recovery times and share knowledge with all engineers, so that most incidents can be resolved without escalating to the team
- Key Results: Backfills do not slow us down or take down the system. We have tests for this.
  - Why? Improves service quality and protects against bad actors
- Key Results: Erroring apps fail gracefully, do not take down anything else, and we have tests to prove this. And re-enable after temporary unavailability
  - Why? Improves service quality and tackles customer annoyance of apps turning off when there's an error

Responsibilities

Team Ingestion owns our ingestion pipeline end-to-end. That means we own the Django server ingestion API, the ingestion (plugin) server, as well as our client libraries, Kafka and ClickHouse setup, where it pertains to event ingestion.

Our work generally falls into one of three categories:

Scaffolding to support core PostHog features

In order to achieve company goals or introduce new features (often owned by other teams), changes to our ingestion pipeline may be required.

An example of this is the work to remodel our events to store person and group data, which is essential to ensuring we can provide fast querying for users.

While querying data is not owned by this team, the change to enable faster queries inevitably requires a large restructuring of our events pipeline, and thus we are owners of that component of the project.

In short, a core responsibility of our team is to enable other teams to be successful.

Ingestion robustness

On the road to providing the best events pipeline in the world, we need to build a system that is robust.

To do so, we must ensure:

Reliability: We should not lose events and events ingested should be correct
Scalability: We should be able to scale to massive event volumes
Maintainability: It should be easy to debug and contribute to our ingestion pipeline

Thus, it is our responsibility to consistently revise our past decisions and improve processes where we see fit, from client library behaviors to ClickHouse schemas.

Extensibility

Our ingestion pipeline is powerful because it allows for plugins to be built on top of it, to do things like transform and export events, and well as import data from third parties.

It is our responsibility to ensure that the extensibility of the pipeline does not interfere with ingestion robustness, as well as:

Build new features to support plugin developers in building more powerful tools
Ensure a delightful experience for plugin developers

How do we work?

We run a quick 15min standup on Monday, Wednesdays, and Fridays, and extend the slot if we feel the need to have a longer synchronous discussion about a specific topic. We document every standup on this doc.

We are happy to sync anytime if we feel it is important to do so. This is generally coordinated on Slack where someone will spontaneously drop a Zoom link. Some of the reasons we sync include: debugging outages, sharing context (including shadowing), making decisions when there's been a deadlock, and pairing sessions.

We work as a team. Our priorities are owned by the team, and we work together towards the same overall goal every sprint. It is inevitable that sometimes tasks will fall on one person or another, but we try hard to share context and collaborate as much as possible.

Slack channel

#team-ingestion

What we're building

PostHog Customer Data Platform
We cover some of the functionality of existing CDP solutions, such as Segment. By creating a UI which specifically encompasses the ideas of "Sources" and "Destinations", along with building out more of the integrations, we can turn PostHog into a leading CDP solution.
Progress
- PostHog CDP

People

Team members

Mission

Objectives: Q1 2023

Responsibilities

Scaffolding to support core PostHog features

Ingestion robustness

Extensibility

How do we work?

Slack channel

What we're building

PostHog Customer Data Platform

Progress

Questions?

Was this page useful?

Product Analytics Team