Machine learning is changing how we process and learn from data—but there’s a growing need to do it privately. This project explores how we can train powerful models without centralizing sensitive data, using a combination of federated learning and random forests.
At a high level, federated learning is a way to train machine learning models across multiple devices or clients without moving the data. Instead of uploading all your data to a central server, each device trains the model locally and shares only the model updates.
Why is this cool? Because it helps:
This idea is already in use—Google uses federated learning in Gboard, their Android keyboard, to improve next-word predictions without uploading your messages to the cloud. Your phone trains locally and shares only what it learns (not what you typed).
Random forests are one of the most popular machine learning algorithms. They're made up of many decision trees, each trained on slightly different data. By combining the results of these trees, we get a model that’s more accurate and robust.
Key features of random forests:
This project combines the best of both worlds: the power of random forests and the privacy of federated learning.
This setup mimics real-world scenarios, like phones training locally and syncing models to a cloud service. While this does not ensure privacy, it opens the door to a world where we can use private processing methods and encrypted communication to higher degree in our lives to ensure more privacy-friendly technologies.
As concerns around data privacy grow, projects like this show a possible path forward: learn from user data without ever storing it in one place. Whether you're training a next-word predictor on your phone or analysing private sensor data, federated learning could be a key part of the solution.
Thanks for reading! If you're into machine learning, privacy, or building scalable systems, feel free to reach out.