Using Kubespray to deploy a k8s cluster to Terraform provisioned AWS EC2 instances

We did a cluster installation previously using kubeadm and a lot of manual work. I remember how much of a hassle it was so lets do one with bit less “hands on” and with more suffering.

Perhaps

In short we are going to do the following

  • Provision some servers on AWS, 3 EC2 nodes in total with 1 master, via terraform
  • Use shell scripts to run kubespray
  • Do some minor config changes

We are going to be using scripts for most of the ops and you can find them from the below repo if you are interested.

If you‘ve checked the above repo there are 2 scripts one main.sh and one with another name. Main will run first and prepare our AWS EC2 instances with terraform, next is for kubespray and it’s business. If you need info on terraform see here. Lets begin.

I’m using a centos 7+ local vm as my terminal server. I cloned my repo to any folder. Mine is under /opt/aws_tests

Notice terraformec2.pem file here. This is needed for accessing our EC2 instances when we create them and also all other ssh operations. Access your AWS EC2 dashboard and navigate to Key pairs. Create one with the name as below or alternatively change the tf files under /terra folder to use whatever key name you selected.

After creation a pem file will be downloaded, which will be a main thing for our work. Place it next to main.sh on your terminal server.

Since we will be using terraform to provision our instances we need to create an account which can do so. Navigate to IAM screen in AWS.

Create a user then take the public and private keys, put it into a secret.tfvars file under the terra folder with the variable names seen below.

Thats about it for the manual bits.

Now let’s talk about the main.sh below.

Skipping the help bits, the script first installs terraform if it doesn’t exist, checks if pem file exists, inits, plans and applies the tf config files, gets public_ip details as output to be used an input for the kubespray script.

Now let’s see the files in terra folder.

First the standard_aws.tf file. This grabs the required provider during init, Sets an access_key and secret_key variables to an aws provider. My selected region is eu-central-1, you can choose whatever region you want.

###  **************STANDARD AWS START******************  ###
terraform {
required_providers {
aws = {
source = "hashicorp/aws"
version = "3.41.0"
}
}
}
variable "access_key" {
description = "The access_key of our user defined in IMA"
type = string
sensitive = true
}
variable "secret_key" {
description = "The secret_key of our user defined in IMA"
type = string
sensitive = true
}
provider "aws" {
region = "eu-central-1"
access_key = var.access_key
secret_key = var.secret_key
}
### **************STANDARD AWS END****************** ###

Next we have vars.tf with variables for use with the datasources. Since we will be using amazon linux image default filter value is set to that.

variable "datasource_ami_owner" {
default = "amazon"
type = string
}
variable "datasource_ami_name_filter" {
default = "amzn2-ami-hvm*"
type = string
}

datasources.tf is like below. aws_ami.base_image datasource grabs the latest ami id satisfying the filter we placed.

data "aws_ami" "base_image" {
most_recent = true
owners = ["${var.datasource_ami_owner}"]
filter {
name = "name"
values = ["${var.datasource_ami_name_filter}"]
}
}

sec_group.tf creates a custom security group with ingress and egress access permissions for all. You can play around with these settings to create a more controlled access.

resource "aws_security_group" "sec_group_block" {
name = "custom_sec_group"
ingress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
egress {
from_port = 0
to_port = 0
protocol = "-1"
cidr_blocks = ["0.0.0.0/0"]
ipv6_cidr_blocks = ["::/0"]
}
tags = {
Name = "custom_sec_group"
}
}

Next, we have ec2.tf. We have 3 ec2 instances pulling ami ID from the datasource we mentioned before, security group id from the sec_group resource we created, key_name from the key pair we manually created earlier.

And then to reach these instances from a static ip, we created an eip and associate with an instance using aws_eip_association resource. Finally to be able to use private ip’s of generated objects from outside we make use of output resources for each instance.

resource "aws_instance" "test_cluster_m1" {
ami = data.aws_ami.base_image.id
instance_type = "t2.small"
key_name = "terraformec2"
vpc_security_group_ids = [aws_security_group.sec_group_block.id]
tags = {
Name = "test_cluster_m1"
}
}
resource "aws_instance" "test_cluster_w1" {
ami = data.aws_ami.base_image.id
instance_type = "t2.small"
key_name = "terraformec2"
vpc_security_group_ids = [aws_security_group.sec_group_block.id]
tags = {
Name = "test_cluster_w1"
}
}
resource "aws_instance" "test_cluster_w2" {
ami = data.aws_ami.base_image.id
instance_type = "t2.small"
key_name = "terraformec2"
vpc_security_group_ids = [aws_security_group.sec_group_block.id]
tags = {
Name = "test_cluster_w2"
}
}
##eip
resource "aws_eip" "eip_m1" {
vpc = true
}
resource "aws_eip" "eip_w1" {
vpc = true
}
resource "aws_eip" "eip_w2" {
vpc = true
}
resource "aws_eip_association" "eip_assoc_m1" {
instance_id = aws_instance.test_cluster_m1.id
allocation_id = aws_eip.eip_m1.id
}
resource "aws_eip_association" "eip_assoc_w1" {
instance_id = aws_instance.test_cluster_w1.id
allocation_id = aws_eip.eip_w1.id
}
resource "aws_eip_association" "eip_assoc_w2" {
instance_id = aws_instance.test_cluster_w2.id
allocation_id = aws_eip.eip_w2.id
}
###########
output "instance_test_cluster_w1_ip" {
value = aws_instance.test_cluster_w1.public_ip
}
output "instance_test_cluster_w2_ip" {
value = aws_instance.test_cluster_w2.public_ip
}
output "instance_test_cluster_m1_ip" {
value = aws_instance.test_cluster_m1.public_ip
}

After a successful run of main.sh, at the very end we use the outputs from the state file via terraform output… command and use them as inputs for the kubespray script.

echo " "
echo "Run kubespray setup"
echo "**************************************"
m1_ip="$(terraform output instance_test_cluster_m1_ip)"
w1_ip="$(terraform output instance_test_cluster_w1_ip)"
w2_ip="$(terraform output instance_test_cluster_w2_ip)"
cd .. && echo "moving to $(pwd)"
sh ./spray_dem_kubes.sh "${m1_ip}" \
"${w1_ip}" \
"${w2_ip}"

Moving on to kubespray script we take 3 arguments as inputs, namely master, worker1 and worker2 ips. We check for pem again, then download kubespray from repo which creates a kubespray folder. Moving into the new folder, we use pip to install all necessary requirements.

Afterwards we copy the sample inventory under /kubespray/inventory/sample to another folder in the same directory then use inventory builder available to create startup conf files.

Now we have to modify the following files as per our needs but most important ones in my opinion are the inventory.ini and hosts.yaml

vi inventory/${cluster_name}/group_vars/all/all.yml \
&& vi inventory/${cluster_name}/group_vars/k8s_cluster/k8s-cluster.yml \
&& vi inventory/${cluster_name}/inventory.ini \
&& vi inventory/${cluster_name}/hosts.yaml

Our hosts.yaml file consists of basic setup of our app. Which nodes will our cluster setup consist of, who will be the master (kube_control_plane) who will host etcd, who will run as worker nodes (we can make master run as worker as well).

And then we have the inventory.ini file, where we do essentially the same thing for some reason.

Finally we have to start the playbook with the path to our hosts.yaml, .pem file for access to our target hosts, — become to run as root, — user=ec2-user to login as this though normally you wouldn’t need it.

ansible-playbook -i inventory/${cluster_name}/hosts.yaml  --private-key=terraformec2.pem --become --become-user=root --user=ec2-user cluster.yml

At the end if all goes well our cluster will be up and running.

Thats about it, thanks for reading. I tried my best to keep it as hands free as possible. Lots of things to learn still, feel free to share your thoughts.

Problems

Etcd may create a problem occasionally if you have different public and private ips. Hop on in to the master node and check the error. We can do so with;

systemctl status etcd#and then if needed
journalctl -xe

Checking logs we can see etcd somehow managed to get wrong ip’s everywhere. Check the above etcd.env file to match yours.

Jun 06 13:37:41 node1 etcd[24785]: 2021-06-06 13:37:41.712220 C | etcdmain: --initial-advertise-peer-urls has https://18.195.248.104:2380 but missing from --initial-cluster=etcd1=https://172.31.39.183:2380,
Jun 06 13:37:41 node1 dockerd[7744]: time="2021-06-06T13:37:41.735905556Z" level=info msg="ignoring event" container=37eb7499bc5c1a26339ced02ecf9aed01ef4084f5b8b6b625e74ab856524c746 module=libcontainerd nam
Jun 06 13:37:41 node1 containerd[6640]: time="2021-06-06T13:37:41.737190264Z" level=info msg="shim disconnected" id=37eb7499bc5c1a26339ced02ecf9aed01ef4084f5b8b6b625e74ab856524c746
Jun 06 13:37:41 node1 containerd[6640]: time="2021-06-06T13:37:41.737541582Z" level=error msg="copy shim log" error="read /proc/self/fd/17: file already closed"

After your changes, restart to apply them.

systemctl restart etcd

If something goes wrong like kubectl not found, we can add it manually to the master

Just a software everything fighting battles against mostly myself, and gaining small victories lately.