- What is a repository?
- What is a monorepository?
- "Polyrepo" or "Multirepo"
- Monorepo vs monolithic application: The misconception
- Advantages and Challenges of using Monorepositories
- When to use a Monorepo
- How we use Monorepos for our projects
- What Monorepo tools to use
- How to start using Monorepos
Monorepos (Mono-Repositories) are used in the software industry for many years now. Long before the word became trendy in the last 2 years. Most notably, big companies like Google, Stripe, Microsoft use monorepos in their development because of the many advantages a monorepo has.
If you want to know more about monorepos what they are or how to use the in practice keep reading. By the end of this article you will have a detailed overview about everything related monorepository and for the developers a in-depth tutorial how to apply them in practice plus what tooling to use to make your everyday experience working with monorepo's as smooth as possible.
Watch this video about mono-repositories at NDC Conferences in Oslo from Kari Meling Johannessen.
A repository, often abbreviated as "repo" is a place where data can be stored in a version controlled way. It serves as a centralized location for storing and managing digital assets, primarily source code, but it can also include documentation, images, configuration files, and more.
Today, most source code of the software we use on a daily basis is stored in a repository, enabling teams to collaborate easily and no matter how much code there is or how big the team is. Check out this article from freeCodeCamp about GitHub and how to set up your own private or public repository there. The main software for version control used today is git.
There are multiple ways a mono-repository (often referred to as a "monorepo")can be explained. The most common description is: A single repository containing multiple projects separated by directories. It is also sometimes referred to as one-repo or unirepo. Here is an example of multiple projects separated in folders in one single repository.
A polyrepo or multirepo is the approach to store each project in its own dedicated repository. Usually this is a git repository, since git is the most used version control system in the industry. The project is therefore not sharing the same change history as the other projects and has its own settings, issues, merge requests etc. To orchestrate/manage the interaction between these projects for example a backend project containing a Django application that makes an API for the Vue.js frontend website we have to individually build, test and deploy them using pipelines in a CI/CD system. Now the backend can publish a link to the frontend team to consume the API and therefore connecting the 2 systems together. Updating the API could lead to situations that the frontend does not work correctly anymore. This is a balancing and collaboration challenge between the teams and could lead to friction and inefficiencies.
There is a huge misconception that, using a monorepository is the same as using/making monolithic software. This is not the case! A monolithic application is a centralized system that is built, tested and deployed as one project, usually on a single server. The opposite of a monolithic app is using microservices, which are multiple projects that can be deployed separately and independently. Both software development approaches can be developed using a monorepository. Read more about “Misconceptions about Monorepos: Monorepo != Monolith”
Access management with Monorepo
Restricting access to a project in GitLab is very simple. When using a monorepository it becomes more complicated. This is usually not a problem for smaller teams. For lager enterprise teams, GitLab has a solution called "Codewoners" to limit access to certain files and folders for specific users. This is a paid feature, so you would need to have a license for gitlab.com or your self-hosted GitLab instance.
This becomes at the same time more easy and also more complicated. Packages that needed to be its own repository before can be copied into the monorepo into its own folder and installed with a relative link from there. On the other hand, dependencies that have to be built before they can be used like a Rust CLI app would need to have special rules so that they only get rebuilt when the actual underlying code changed. If this is not taken care of it could lead to very long built times (we are talking days for large projects). This is not such a big deal for node packages, since these dependencies usually get build in the same build workflow as the main app. So if you are using dependencies that need their own build/compile flow you should make sure they are not built when the related code did not change. Especially when every push of changes to the repository triggers a new version.
This can become much more easy because the code is all stored in one repository and everybody clones the code locally. This encourages the team to make sure unused code is removed, and old stuff is refactored. If you have multiple repositories, the likelihood of code becoming old and forgotten is much higher. The team still needs some rules and guidelines even when working with a monorepository to keep stuff up to date and refactoring happening often. The barrier to overcome to actually do it is just much smaller because everything is already setup locally and people can just make a new Merge request or pull request and push their changes for review.
With a monorepository we don't want to encourage developers to tightly couple software together. This leads to monolithic software. Using microservices it the way to go, and this also requires less refactoring overall because systems are not dependent on each other.
Cloning huge repositories can be very slow
Git operations are slow. Keeping large and not needed files out of the repository is the way to go. If you still have a very large repository you can shallow clone it (effectifely only cloning one branch), allowing for much faster onboarding times for developers. To find out more how to handle large repositories' checkout this article from Atlassian: How to handle big repositories with Git.
Encourages tight coupling
Developers need to know what they are doing when working with monorepos to not fall into this issue of tight coupling and then producing software that has a monolithic tendency. This might be a bit of an issue but in general having one or 2 experienced developers that can onboard the team and check merge requests is sufficient to avoid this issue.
The pipeline needs to be smart. That is why we are using GitLab and its CI/CD system for our projects. Out of the box you are getting a lot of features that help you keep your project built, tested and continuously deployed. Keep reading for more information on how to setup your own monorepository with GitLab and use the CI/CD system with very little setup time.
Dependencies can get very complicated and hard to curate over time. This is when a helper tool that can visualize dependencies in the project and help you build needed things when stuff changes to save on build time. Read more about what tools can help you here:
Merge request/Pull request management
This is sometimes an issue when teams have many merge requests piling up and developers loose overview of what is important for them. We have not noticed this as an issue, but more of an advantage that all the merge requests are in one spot because the entire team can see what is up. This was even possible before because we use GitLab groups, and GitLab aggregates all merge requests from all projects in one group into a single place. So this might be only an issue for large teams and only when you don't have a clear structure how to name your merge requests and keep them as small as possible. Our ticketing system, Pivotal Tracker, actually has a dedicated section in each ticket that links to each merge request, so this removes even more of the overview issue. The other advantage is that a ticket for a feature that impacts multiple projects like frontend and backend can be held in one merge request and both teams work together on one branch.
Broken master issue
When the master is broken, the whole team is halted for new deployments until the issue is fixed. This could lead to delayed delivery time for features and is overall just not a very nice experience. I personally like that this could be an issue since it keeps the teams accountable and the overall process more clean. To make sure this rarely happens, the team has to have solid work processes in place like merge request reviewers and deployment permissions.
Using a monorepo (short for "monolithic repository") can be a beneficial approach for managing your software projects in certain situations. However, it's not a one-size-fits-all solution, and you should consider using a monorepo when the following conditions apply:
1. Interdependencies: Your projects or components have significant interdependencies. If changes in one part of your codebase frequently require corresponding changes in other parts, a monorepo can help ensure consistent versioning and easier coordination.
2. Shared Code: You have shared code or libraries that multiple projects rely on. A monorepo makes it easier to manage and version shared code, reducing duplication and ensuring consistent updates across all projects.
3. Continuous Integration/Continuous Deployment (CI/CD): Your development and release processes are highly automated and involve continuous integration and deployment. A monorepo can simplify CI/CD pipelines by providing a centralized repository to trigger builds, tests, and deployments.
4. Team Collaboration: Multiple teams or development groups work on related projects. A monorepo can enhance collaboration by providing a single source of truth for the entire organization, reducing the risk of duplication and conflicts.
5. Code Ownership: Your organization has a strong code ownership model, and teams are responsible for specific sections of the codebase. Monorepos allow teams to maintain ownership of their code while still providing visibility and access to other parts of the codebase.
6. Version Consistency: Ensuring version consistency across all your projects is critical. With a monorepo, you can easily manage dependencies and enforce version compatibility.
7. Build and Test Efficiency: If your projects share a similar development environment and have common build and test processes, a monorepo can simplify these tasks and improve overall efficiency.
8. Monolithic Architecture: Your software architecture tends toward a monolithic style, where different components are tightly integrated. In such cases, a monorepo can reflect the architecture more accurately.
9. Historical Context: If your organization has historically used a monorepo and migrating to separate repositories would be a significant undertaking, it might make sense to continue with the existing setup.
However, it's essential to consider the potential downsides of using a monorepo, including increased complexity, longer repository histories, and potential performance issues as the codebase grows. Additionally, if your projects have very different release cycles, technologies, or development teams, a monorepo might not be the best choice, and separate repositories may be more appropriate.
Ultimately, the decision to use a monorepo should align with your specific organizational needs and the nature of your software projects. It's worth carefully evaluating the pros and cons before committing to this approach.
Since the beginning of the company, we decided to always host our code in a self-hosted system to have full control over everything. The best way to do this was using GitLab CE. GitLab had from the start a very good feature called groups, which helps to group related project repositories with each other. This is very useful if you want to share certain development aspects with multiple projects like environment variable, GitLab runners, members, permissions, access tokens, integrations to just name a few of the very important features we relied and still rely on. The GitLab group is also a place where a lot of common features of projects are pulled together, and there is also no nesting limit for groups in groups. This useful feature allowed us to go very long without considering switching to a monorepo because these group features are covering most of the pain points that a monorepository solves for other development platforms like GitHub, Bitbucket, Azure and more.
Our struggle with versions
We became interested in monorepositories about 3 months ago when we noticed an issue in our way of versioning of software projects. Each project would contain multiple repository's setup to be versioned with git tags (mention git tag link to some external source) by our automatic CI pipeline. This would lead to situations where a frontend part of a software has the version 1.2.1 compared to the backend part, where we would have 1.3.2 as an example. To make sure we are on the same version on both front and backend was a big headache and overhead and leads to issues. Here a monorepository comes in very handy, and it was the main reason why we switched. In a mono-repository, every time a git tag was created all the code frontend and backend would be at the exact state when the version was applied, eliminating the issue that the frontend version would not fit and work with the backend version.
Pyango‘s current state with monorepos
Now that we have successfully moved to a monorepository for multiple projects, me and the team can already see the benefits of it. We solved the issue of making major and public version releases for the entire project by simply creating a git tag. Its so much easier and more efficient. Additionally, collaboration has clearly improved between us because the entire team has now the full source code for the project, this sparked interest fixing problems for example in the backend by a frontend developer. Also, we have all the merge requests in one place, which simplifies the management of the project.
As there could be many downsides to using a monorepository we only experienced one till now, and it's not at all a big deal.
The drawback using a monorepo that we encountered was that we have only one place for environment variables,, which means that all the projects environment variables get put in one place and forces to give more specific names to know for which project they are meant to be used for.
What's next for our monorepo love story?
The next steps for our team could be a few things. We could include all the code we produce for any project into a single repository. This would improve the way we use the continuous integration system and reduce the amount of work we have to do on multiple projects becuase fixing one problem would fix it for everything. On the other hand, environment variables management would become very overwhelming since GitLab does not allow for grouping or name spacing variables in a single project. We are still exploring this step, but it could be a reasonable decision to simplify new project setups and deployment by a lot.
Hybrid repository structure
I Ieard about 5 years ago that the industry is pushing towards the direction of monorepository for its code bases and so naturally, I started investigating. At the time, I was sure that for our purposes a monorepository would be a risk of overengeneering things which would lead to longer development times especially for prototypes or smaller projects. I discarded the idea for the time being even though big players like Google, Facebook, Netflix, Uber and Microsoft are all working with monorepositories. We always used a selfhosted gitlab instance and working on a project/group basis was totally sufficient. Recently, we had more and more standartisation in our projects especially for our CI/CD processes to the point where we shared pipline code from project to project with a nice gitlab feature called include.
We used the feature to import common CI/CD configuration from a separate repository on the same Gitlab instance. This allowed us to share code between projects and if we make changes if would be automatically used by all the projects that use this code.
This was the point where i started to think again about the benefits of a monorepository and started again to investigate about the topic. After some more research, I decided to discard the idea again because of specific drawbacks that I would not like to risk at the time.
Slow build times
When using a monorepository every push a developer does to the project leads to spinning up a gitlab runner that builds, tests and pushes everything at once. This can be very very slow because the entire codebase needs rebuilding. To overcome this problem a smart solution needs to be added which i mention more about below.
What problems we solved using Monorepo?
When doing a radical switch from multi repository approach to a monorepo approach, we asked ourselves what problem we are trying to solve. The main issue we had was working on a project as a small team, we had the issue of version management across all systems. This was the single most important reason why we did the switch from a group based multirepo approach to a monorepository approach.
Other advantages we saw were:
- Version management (Colaboration between PO/PM and Development Team)
- DRY (Don't repeat yourself)
- Combined Infrastructure
First when I heard about monrepo tools I was confused what this actually means. Using GitLab we are already spoiled with a great CI/CD system that has lots and lots of possibilities. So using monorepo tools can be as simple as using gitlab and restricting pipline execution based on rules for file changes in the repository. This already hugely improves buildtimes. There are a number of other tools that can be used that where created because of the issues that specifically monorepos have like slow build times because dependency management get difficult and also caching. Most of the issues can be solved just using GitLab's execlent CI/CD system. Here are some other tools that are interesting to look into:
- Npm, yarn, pnpm Workspaces
A comprehensive guide to start using monorepos efficiently. This tutorial assumes you have some experience with Git and Docker. The system does not use any other helper software except gitlab at this point to assist with the monorepo. In most cases this is sufficient but if you would like to use any other tool in the process like this possible without much effort. The tools we are going to use are:
- Hetzner (or any other hosting provider that can host Virtual Machines)