People
Team members
James Greenhill
Team leadSoftware Engineer
Yakko Majuri
Software Engineer
Karl-Aksel Puulmann
Software Engineer
Tiina Turban
Software Engineer
Harry Waye
Software Engineer
Luke Harries
Head of Product
Xavier Vello
Software Engineer
Mission
Provide the best events pipeline in the world.
Objectives: Q1 2023
- Objective: Performance
- Key Results: We have wrapped up the person-on-event project and have deprecated the old non-person-on-events queries
- Why? Performance speed up
- Key Results: We have reduced the cost per event for capture by an order of magnitude
- Why? Infra savings and improves performance
- Key Results: We have wrapped up the person-on-event project and have deprecated the old non-person-on-events queries
- Objective: Reliability
- Key Results: We have converted all current US dashboards into IaC dashboards configured in Terraform and made all necessary migrations from StatsD to Prom to support this.
- Why? Gets US and EU equivalent in terms of monitoring
- Key Results: All of our alerts have runbooks
- Why? Improve incident recovery times and share knowledge with all engineers, so that most incidents can be resolved without escalating to the team
- Key Results: Backfills do not slow us down or take down the system. We have tests for this.
- Why? Improves service quality and protects against bad actors
- Key Results: Erroring apps fail gracefully, do not take down anything else, and we have tests to prove this. And re-enable after temporary unavailability
- Why? Improves service quality and tackles customer annoyance of apps turning off when there's an error
- Key Results: We have converted all current US dashboards into IaC dashboards configured in Terraform and made all necessary migrations from StatsD to Prom to support this.
Responsibilities
Team Ingestion owns our ingestion pipeline end-to-end. That means we own the Django server ingestion API, the ingestion (plugin) server, as well as our client libraries, Kafka and ClickHouse setup, where it pertains to event ingestion.
Our work generally falls into one of three categories:
Scaffolding to support core PostHog features
In order to achieve company goals or introduce new features (often owned by other teams), changes to our ingestion pipeline may be required.
An example of this is the work to remodel our events to store person and group data, which is essential to ensuring we can provide fast querying for users.
While querying data is not owned by this team, the change to enable faster queries inevitably requires a large restructuring of our events pipeline, and thus we are owners of that component of the project.
In short, a core responsibility of our team is to enable other teams to be successful.
Ingestion robustness
On the road to providing the best events pipeline in the world, we need to build a system that is robust.
To do so, we must ensure:
- Reliability: We should not lose events and events ingested should be correct
- Scalability: We should be able to scale to massive event volumes
- Maintainability: It should be easy to debug and contribute to our ingestion pipeline
Thus, it is our responsibility to consistently revise our past decisions and improve processes where we see fit, from client library behaviors to ClickHouse schemas.
Extensibility
Our ingestion pipeline is powerful because it allows for plugins to be built on top of it, to do things like transform and export events, and well as import data from third parties.
It is our responsibility to ensure that the extensibility of the pipeline does not interfere with ingestion robustness, as well as:
- Build new features to support plugin developers in building more powerful tools
- Ensure a delightful experience for plugin developers
How do we work?
We run a quick 15min standup on Monday, Wednesdays, and Fridays, and extend the slot if we feel the need to have a longer synchronous discussion about a specific topic. We document every standup on this doc.
We are happy to sync anytime if we feel it is important to do so. This is generally coordinated on Slack where someone will spontaneously drop a Zoom link. Some of the reasons we sync include: debugging outages, sharing context (including shadowing), making decisions when there's been a deadlock, and pairing sessions.
We work as a team. Our priorities are owned by the team, and we work together towards the same overall goal every sprint. It is inevitable that sometimes tasks will fall on one person or another, but we try hard to share context and collaborate as much as possible.
Slack channel
What we're building
PostHog Customer Data Platform
We cover some of the functionality of existing CDP solutions, such as Segment. By creating a UI which specifically encompasses the ideas of "Sources" and "Destinations", along with building out more of the integrations, we can turn PostHog into a leading CDP solution.
Progress