DevOps for Updating and Optimizing Infrastructure (physical servers)
Preparatory technical task for DevOps
The full task will be compiled after consulting the performer of the work with the technical leader of the team, as there is no certainty about the nuances, the optimal number of servers, and their potential configuration.
Project background
Physical servers are used. Their use is planned for the future as well, since similar configurations in cloud providers will be significantly more expensive and, in my opinion, require more complex configuration. In our model, it turned out to be more profitable to order one large and powerful server, which usually serves about 2 million unique visits daily and, I assume, has five times the power reserve and an additional backup server in another data center that can be switched to in case of critical breakdowns.
Issues with the existing solution:
- The current server requires replacement of SSD disks, which have been throwing warnings for some time (this actually prompted the search for DevOps). We are considering either temporarily switching to a backup, replacing the disks, and fully installing a new distribution and the entire application, or purchasing new 2-3 servers, deploying the full architecture on them, and switching traffic (the advantage of the second option is in convenience, control, testing capabilities, and the ability to get faster disks and processors, the downside is that for purchasing new servers, additional costs will be incurred for installation). As an option - deploy the architecture on one or two new servers, switch traffic to them, then update the disks on the old server and use it as a backup with MySQL replication.
- The server is running on outdated CentOS 7. Perhaps it is worth upgrading to Rocky/Alma or even consider transitioning to Debian (this question is part of the consultation).
- ISPManager remains for historical reasons and now hinders more than helps in management. Such a control panel has almost lost its meaning - configuration will be done through Nginx.
- Docker is only used for MySQL, while other applications were installed via yum with all the drawbacks of such a solution (unexpected updates, inability to create a managed environment, the need to configure each application).
- The server hosts at least two interconnected projects and one indirectly related. Most likely, it is advisable to keep them on one server, but considering multiple servers is an option if it provides certain advantages.
- Current working model: one main server and one auxiliary server in another data center. In case of a failure, switching to the backup is done manually through a panel we developed, which changes IP addresses via the Cloudflare API. The auxiliary server operates in replication mode. Clickhouse does not use replication (at the project architecture creation stage, it was necessary to additionally install Zookeeper, which is very finicky, so a simpler solution seemed to be a self-sufficient log that runs in parallel on the backup server). A decision will need to be made whether to maintain the status quo or deploy three Clickhouse instances in a cluster using the Clickhouse keeper coordinator.
- Lack of adequate server monitoring (processor load, disk usage, number of hanging connections, request processing time). However, for monitoring, it is desired to use open-source solutions rather than subscription-based models that work on SaaS, as this would significantly increase server maintenance costs.
Tasks
- Consultation. Describing the current project, its issues, and possible solutions.
- Create the server architecture of the solution. Decide on the MySQL working model (currently one-way replication) and Clickhouse (deploy a cluster with two servers in data center #1 and one in data center #2 or maintain the current model of two independent servers with parallel writing).
- Create Docker-compose scripts. A Docker image for setting up PHP-FPM already exists, so parts of it may be reusable. A decision needs to be made whether we will have one Nginx serving three projects (requiring separate config folders, each with its own domains for each project) or one Nginx proxying requests to internal Nginx containers (potentially identifying any delays that may occur). Required images: Nginx, MySQL, Clickhouse, PHP-FPM, Redis. Currently, systemctl is used for running long processes, so it may be worth considering a solution with supervisord or another approach?
- Develop a solution to automate the addition of new domains (including generating LetsEncrypt certificates for them and automatically adding them to Nginx configs).
- Consultation on MySQL. Will each project use its own MySQL instance or a shared one? We have a relatively small business part in MySQL and several large log tables. Consult on whether it is worth moving the log tables to a separate database or even a separate MySQL instance with its own settings (this could help speed up deploying a new replica since currently, deploying a replica requires a full backup that takes up tens of gigabytes and then takes several hours to deploy, which could be critical).
- Decide on the number of servers, their parameters, and locations. It may be necessary to purchase 1, 2, or 3 additional servers. We will handle the organizational part of the purchase, needing to find the best balance between server price and power.
- Tune the server for maximum response speed. Adjust kernel network parameters for application tasks, scheduler settings, processor/memory operation mode, possibly - RAID configuration if the default setup is suboptimal.
- Prepare a solution for automatic failover to the backup server (part of this will be implemented together with the development team, but setting up replication and consulting on the optimal path may be required).
- Configure the MySQL config for optimal operation on the required hardware.
- Currently, the Deployer script is used, which fetches the master branch upon request and deploys with rollback capabilities, restarting all necessary processes. Discuss the solution and explore possibilities for a more optimal approach suitable for the development team.
- Discuss the possibility of creating a test environment to test new features for managers and testers.
- Install a monitoring system (discuss existing solutions, preferably without excessive monetization models, ideally self-hosted solutions rather than SaaS, such as Grafana + Prometheus, Zabbix, or others). The system should provide an overview of all servers - load, traffic volume, average request times on specific endpoints, and send notifications in case of critical situations.
- Backup solution exists, but it may need optimization considering the new configuration and creating scripts for faster data deployment from backups.
- All the above steps (especially server tuning specifics and new application instance deployment procedures) must be thoroughly documented so that a technical leader can reproduce all steps in case of critical need or someone else sufficiently competent in case the system administrator is unavailable. In other words, the system should not be a black box: "someone set up the system several years ago, and no one knows why it's done this way, not another, and it's unclear how to modify or transfer it to another server."
-
2986 37 0 1 Hello, I will help set up everything. Price is negotiable. I have extensive experience in devops.
-
1459 28 0 Good day, I am ready to consult you on the possibilities of solving your task and to perform it with the highest quality using all the best practices. I already have several ideas for solving your task. We will discuss the details in private messages.
-
Привіт,
сподіваюсь ви в беспеці та маєте гарний настрій.
Дякую за такий детальний опис, схоже у вас є бачення вашої бажаної інфраструктури.
В деяких важливих моментах воно співпадає з моїм баченням, це важливо для вдалої співпраці.
Але такої кількості деталей вистачить лише щоб зацікавити в отриманні подробиць, але не дає можливість зробити якусь адекватну ставку.
Пропоную свої послуги:
Спеціалізація - інфраструктура на базі виділених лінукс-серверів, контейнерізація, безпека, контроль, ціна володіння.
Працюю як фоп по договору надання послуг, договір на нерозголошення та щомісячні акти виконанних робіт:
• Ведення та актуалізація документації або її аналогу для серверної інфраструктури;
• Робота над підвищенням відмовостійкості інфраструктури та працюючих на ній додатках та їх частинах;
• Робота над підвищенням доступності інфраструктури та працюючих на ній додатках та їх частинах;
• Робота щодо усунення причин та наслідків аварій, що сталися на рівні серверної інфраструктури;
• Інтеграція інструментів для допомоги в діагностиці несправностей на рівні серверної інфраструктури, а також працюючих на ній додатків та їх частинах;
• Робота над підвищенням рівня інформаційної безпеки на рівні серверної інфраструктури, а також працюючих на них додатках та їх частинах;
• Консультації та дослідження щодо нових технологій;
• Інтеграція системи бекапів та періодична перевірка їх працездатності;
• Автоматизація процесів та етапів розробки;
• Формування та опис періодичних процесів, формування регламентів;
• Перемовини з технічною підтримкою дата центрів;
• Робота з експлуатації та підтримки серверної інфраструктури.
також, є послуга навчання джунів девопсів, собі на підміну.
Зараз співпрацюю з кількома анонимними компаніями(NDA): highload веб-додатки, dedicated servers.
--
Дякую за ваш час,
з щірою повагоаю, Влад. -
Current freelance projects in the category DevOps
Rescue + backup for Hetzner dedicated serverWe have a dedicated server on Hetzner with a website on Docker (+ Laravel framework) Contact with the developers has been lost Need to: - restore and change server access (probably SSH) - recommend a storage location and set up a full server backup with easy recovery, probably… DevOps, System & Network Administration ∙ 3 days 1 hour back ∙ 25 proposals |
Transfer email from Google Workspace to another platform
111 USD
I'm looking for a specialist who can help transfer corporate email from Google Workspace to another email platform. We want to switch because Google Workspace is currently quite expensive for us. We have about 30 users. We haven't finalized which platform to switch to yet, so we… DevOps, System & Network Administration ∙ 5 days 9 hours back ∙ 19 proposals |
Integration of Google Analytics into CRM via n8nGood day, I need help connecting Google Analytics and CRM through n8n. All settings on the CRM side are done. Now we just need to set up the analytics through n8n to transmit sales events. DevOps, System & Network Administration ∙ 11 days 10 hours back ∙ 21 proposals |