Bangalore CentOS Dojo, 2014

The first CentOS Dojo in India took place in Bangalore on 15th November(Saturday) 2014 at Red Hat Bangalore office. Red Hat had sponsored the event.

I was  a co-organizer of the Dojo along with Dominic and Karanbir Singh.  Around 90 people RSVPed  for the event but around 40 (mostly system administrators and new users) attended the event.

The First talk was by Aditya Patawari on “An introduction to Docker and Project Atomic”. The talk included a demo and introduced audience to docker and Atomic host. Most of the attendees had questions on docker as they had used or have heard about it. There were some questions about differences between CoreOS and Project Atomic. The slides are available at http://www.slideshare.net/AdityaPatawari/docker-centosdojo. Overall this talk gave fair idea about Docker and Atomic project.

Second talk was “Be Secure with SELinux Gyan” by Rejy M Cyriac. This session about troubleshooting SELinux issues and introduction to creating custom SELinux policy modules.  Rejy made the talk interesting by distributing SELinux stickers to attendees who asked interesting questions or answered questions. Slides can be found here.

After these two talks we took a lunch break for around 1 hour.  During the lunch break we distributed the CentOS t shirts and got a chance to socialize with the attendees.

The first session post launch was “Scale out storage on CentOS using GlusterFS” by Raghavendra Talur. The talk introduced the audience to GlusterFS, important high level concepts and a demo was shown using packages from CentOS storage SIG. Slides can be found at slideshare.

The next session was “Network Debugging” by Jijesh Kalliyat. This talk covered all most all basic concepts/fundamental, network Diagnostic tools required to troubleshoot a network issue. Also  it included a demo of use Wireshark and Tcpdump to debug network issues. Slides are available here.

Before the next talk, we took break for some time and clicked some group pictures of all present for the Dojo.

The last session was on “Systemd on CentOS” by Saifi Khan. The talk covered a lot of areas e.g. comparison between SysVinit and systemd,  Concurrency at scale, how systemd is more scalable than other available init systems, some similarity of design principles with CoreOS and how it is suited better for Linux container technology. Saifi also talked about how systemd has saved his system from being unusable.  His liking for systemd was quite evident from the talk and enthusiasm.

Overall it was an awesome experience participating in the Dojo as it covered wide variety of topics which are important for deploying CentOS for various purposes.

Bangalore Dojo link: http://wiki.centos.org/Events/Dojo/Bangalore2014

Group Photo. You can see happy faces there 🙂

DSC07574_mod

Bangalore Dojo, 2014

GlusterFS VFS plugin for Samba

Here are the topics this blog is going to cover.

  • Samba Server
  • Samba VFS
  • Libgfapi
  • GlusterFS VFS plugin for Samba and libgfapi
  • Without GlusterFS VFS plugin
  • FUSE mount vs VFS plugin

About Samba Server:

Samba server runs on Unix and Linux/GNU operating systems. Windows clients can talk to Linux/GNU/Unix systems through Samba server. It provides the interoperability between Windows and Linux/Unix systems. Initially it was created to provide printer sharing and file sharing mechanisms between Unix/Linux and Windows. As of now Samba project is doing much more than just file and printer sharing.

Samba server works as a semantic translation engine/machine. Windows clients talk in Windows syntax e.g. SMB protocol. And Unix/Linux/GNU file-systems understand requests in  POSIX. Samba converts Windows syntax to *nix/GNU syntax and vice versa.

This article is about Samba integration with GlusterFS.  For specific details I have taken example of GlusterFS deployed on Linux/GNU.

If you have never heard of Samba project before, you should read about it more , before going further in to this blog.

Here are important link/pointers for further study:

  1. what is Samba?
  2. Samba introduction

Samba VFS:

Samba code is very modular in nature. Samba VFS code is divided in to two parts i.e. Samba VFS layer and VFS modules.

The purpose of Samba VFS layer is to act as an interface between Samba server and  below layers. When Samba server get requests from Windows clients through SMB protocol requests, it passes it to Samba VFS modules.

Samba VFS modules i.e. plugin is a shared library (.so) and it implements some or all functions which Samba VFS layer i.e. interface makes  available.  Samba VFS modules can be stacked on each other(if they are designed to be stacked).

For more about Samba VFS layer, please refer http://unix4.com/w/writing-a-samba-vfs-richard-sharpe-2-oct-2011-e6438-pdf.pdf

Samba VFS layer passes the request to VFS modules. If the Samba share is done for a native Linux/Unix file-system, the call goes to default VFS module. The default VFS module forwards call to System layer i.e. operating system. For User space file-system like GlusterFS, VFS layer calls are implemented through a VFS module i.e. VFS plugin for GlusterFS .The plugin redirects the requests (i.e fops) to GlusterFS APIs i.e. libgfapi. It implements or maps all VFS layer calls using libgfapi.

libgfapi:

libgfapi (i.e. glusterfs api) is set of APIs which can directly talk to GlusterFS. Libgfapi is another access method for GlusterFS like NFS, SMB and FUSE. Libgfapi bindings are available for C, Python, Go and more programming languages. Applications can be developed which can directly use GlusterFS without a GlusterFS volume mount.

 GlusterFS VFS plugin for Samba and libgfapi:

Here is the schematic diagram of how communication works between different layers.

gluster-samba-vfs-plugin

Samba Server:  This represents Samba Server and Samba VFS layer

VFS plugin for GlusterFS: This implements or maps relevant VFS layer fops to libgfapi calls.

glusterd: Management daemon of Glusterfs node i.e. server.

glusterfsd: Brick process of Glusterfs node i.e. server.

The client requests come to Samba server and Samba servers redirects the calls to GlusterFS’s VFS plugin through Samba VFS layer. VFS plugin calls relevant libgfapi fucntions. Libgfapi acts as a client, contacts glusterd for vol file information ( i.e. information about gluster volume, translators, involved nodes) , then forward requests to appropriate glusterfsd i.e. brick processes where requests actually get serviced.

If you want to know specifics about the setup to share GlusterFS’s volume through Samba VFS plugin, refer below link.

https://lalatendumohanty.wordpress.com/2014/02/11/using-glusterfs-with-samba-and-samba-vfs-plugin-for-glusterfs-on-fedora-20/

Without GlusterFS VFS plugin: 

Without GlusterFS VFS plugin, we can still share GlusterFS volume through Samba server. This can be done through native glusterfs mount i.e. FUSE (file system in user space). We need to mount the volume using FUSE i.e .glusterfs native mount in the same machine where Samba server is running, then share the mount point using Samba server. As we are not using the VFS plugin for GlusterFS here, Samba will treat the mounted GlusterFS volume as a native file-system. The default VFS module will be used and the file-system calls will be sent to operating system. The flow is same as any native file system shared through Samba.

FUSE mount vs VFS plugin:

If you are not familiar with file systems in user space,  please read about FUSE i.e. file system in user space.

For FUSE mounts, file system fops from Samba server goes to user space FUSE mount point -> Kernel VFS -> /dev/fuse -> GlusterFS and comes back in the same path. Refer to below diagrams for details. Consider Samba server as an application which runs on the fuse mount point.

Fuse_Mount

Fuse mount architecture for GlusterFS

You can observe the process context switches happens between user and kernel space in above architecture. It is going to be a key differentiation factor when compared with libgfapi based VFS plugin.

For Samba VFS plugin implementation, see the below diagram. With the plugin Samba calls get converted to libgfapi calls and libgfapi forward the requests  to GlusterFS.

libgfapi

Libgfapi architecture for GlusterFS

The above pictures are copied from this presentation:

Advantage of libgfapi based Samba plugin Vs FUSE mount:

  • With libgfapi , there are no kernel VFS layer context switches. This results in performance benefits compared to  FUSE mount.
  • With a separate Samba VFS module i.e. plugin , features ( e.g: more NTFS functionality) can be provided in GlusterFS and it can be supported with Samba, which native Linux file systems do not support.

 

 

 

Using GlusterFS With GlusterFS Samba vfs plugin on Fedora

This blog covers the steps and implementation details to use GlusterFS Samba VFS plugin.

Please refer below link, If you are looking for architectural information for GlusterFS Samba VFS plugin,  difference between FUSE mount vs Samba VFS plugin

https://lalatendumohanty.wordpress.com/2014/04/20/glusterfs-vfs-plugin-for-samba/

I have setup  two node GlusterFS cluster with Fedora 20 (minimal install) VMs. Each VM has 3 separate XFS partitions with each partitions 100GB each.
One of the Gluster node is used as a Samba server in this setup.

I had originally tested this with Fedora 20. But this example should work fine with latest Fedoras i.e. F21 and F22

GlusterFS Version: glusterfs-3.4.2-1.fc20.x86_64

Samba version:  samba-4.1.3-2.fc20.x86_64

Post installation “df -h” command looked like below in the VMs
$df -h
Filesystem                            Size  Used Avail Use% Mounted on
/dev/mapper/fedora_dhcp159–242-root   50G  2.2G   45G   5% /
devtmpfs                              2.0G     0  2.0G   0% /dev
tmpfs                                 2.0G     0  2.0G   0% /dev/shm
tmpfs                                 2.0G  432K  2.0G   1% /run
tmpfs                                 2.0G     0  2.0G   0% /sys/fs/cgroup
tmpfs                                 2.0G     0  2.0G   0% /tmp
/dev/vda1                             477M  103M  345M  23% /boot
/dev/mapper/fedora_dhcp159–242-home   45G   52M   43G   1% /home
/dev/mapper/gluster_vg1-gluster_lv1           100G  539M  100G   1% /gluster/brick1
/dev/mapper/gluster_vg2-gluster_lv2           100G  406M  100G   1% /gluster/brick2
/dev/mapper/gluster_vg3-gluster_lv3           100G   33M  100G   1% /gluster/brick3

You can use following commands to create xfs partitions
1. pvcreate /dev/vdb
2. vgcreate VG_NAME /dev/vdb
3. lvcreate -n LV_NAME -l 100%PVS VG_NAME /dev/vdb
4. mkfs.xfs -i size=512 LV_PATH

Following are the steps and packages need to be performed/installed on each node (which is Fedora 20 for mine)

#Change SELinux to either “permissive” or “disabled” mode

# To put SELinux in permissive mode
$setenforce 0

#To see the current mode of SELinux

$getenforce

SELinux policy rules for Gluster is present in recent Fedora releases e.g. F21, F22 or later. So SELinux should work fine with Gluster.

#Remove all iptable rules, so that it does not interfare with Gluster

$iptables -F

yum install glusterfs-server
yum install samba-vfs-glusterfs
yum install samba-client

#samba-vfs-glusterfs RPMs for CentOS, RHEL, Fedora19/18 are avialable at http://download.gluster.org/pub/gluster/glusterfs/samba/

#To start glusterd and auto start it after boot
$systemctl start glusterd
$systemctl enable glusterd
$systemctl status glusterd

#To start smb and auto start it after boot
$systemctl start smb
$systemctl enable smb
$systemctl status smb

#Create gluster volume and start it. (Running below commands from Server1_IP)

$gluster peer probe Server2_IP
$gluster peer status
Number of Peers: 1

Hostname: Server2_IP
Port: 24007
Uuid: aa6f71d9-0dfe-4261-a2cd-5f281632aaeb
State: Peer in Cluster (Connected)
$gluster v create testvol Server2_IP:/gluster/brick1/testvol-b1 Server1_IP:/gluster/brick1/testvol-b2
$gluster v start testvol

#Modify smb.conf for Samba share

$vi /etc/samba/smb.conf

#
[testvol]
comment = For samba share of volume testvol
path = /
read only = No
guest ok = Yes
kernel share modes = No
vfs objects = glusterfs
glusterfs:loglevel = 7
glusterfs:logfile = /var/log/samba/glusterfs-testvol.log
glusterfs:volume = testvol

#For debug logs you can change the log levels to 10 e.g: “glusterfs:loglevel = 10”

# Do not miss “kernel share modes = No” else you won’t be able to write anything in to the share

#verify that your changes are correctly understood by Samba
$testparm -s
Load smb config files from /etc/samba/smb.conf
rlimit_max: increasing rlimit_max (1024) to minimum Windows limit (16384)
Processing section “[homes]”
Processing section “[printers]”
Processing section “[testvol]”
Loaded services file OK.
Server role: ROLE_STANDALONE
[global]
workgroup = MYGROUP
server string = Samba Server Version %v
log file = /var/log/samba/log.%m
max log size = 50
idmap config * : backend = tdb
cups options = raw

[homes]
comment = Home Directories
read only = No
browseable = No

[printers]
comment = All Printers
path = /var/spool/samba
printable = Yes
print ok = Yes
browseable = No

[testvol]
comment = For samba share of volume testvol
path = /
read only = No
guest ok = Yes
kernel share modes = No
vfs objects = glusterfs
glusterfs:loglevel = 10
glusterfs:logfile = /var/log/samba/glusterfs-testvol.log
glusterfs:volume = testvol

#Restart the Samba service. This not a compulsory step as Samba takes latest smb.conf for new connections. But to make sure it uses the latest smb.conf, restart the service.
$systemctl  restart smb

#Set smbpasswd for root. This will be used for mounting the volume/Samba share on the client
$smbpasswd -a root

#Mount the cifs share using following command and it is ready for use 🙂
mount -t cifs -o username=root,password=<smbpassword> //Server1_IP/testvol /mnt/cifs

GlusterFS volume tuning for volume shared through Samba:

  • Gluster volume needs to have: “gluster volume set volname server.allow-insecure on”
  • /etc/glusterfs/glusterd.vol of each of gluster node
    add “option rpc-auth-allow-insecure on”
  • Restart glusterd of each node.

For setups where Samba server and Gluster nodes need to be on different machines:

# put “glusterfs:volfile_server = <server name/ip>” in the smb.conf settings for the specific  volume

e.g:

[testvol]
comment = For samba share of volume testvol
path = /
read only = No
guest ok = Yes
kernel share modes = No
vfs objects = glusterfs
glusterfs:loglevel = 7
glusterfs:logfile = /var/log/samba/glusterfs-testvol.log

glusterfs:volfile_server = <server name/ip>
glusterfs:volume = testvol

#Here are the packages that were installed on the nodes

rpm -qa | grep gluster
glusterfs-libs-3.4.2-1.fc20.x86_64
glusterfs-api-3.4.2-1.fc20.x86_64
glusterfs-3.4.2-1.fc20.x86_64
glusterfs-cli-3.4.2-1.fc20.x86_64
glusterfs-server-3.4.2-1.fc20.x86_64
samba-vfs-glusterfs-4.1.3-2.fc20.x86_64
glusterfs-devel-3.4.2-1.fc20.x86_64
glusterfs-fuse-3.4.2-1.fc20.x86_64
glusterfs-api-devel-3.4.2-1.fc20.x86_64

[root@dhcp159-242 ~]# rpm -qa | grep samba
samba-client-4.1.3-2.fc20.x86_64
samba-4.1.3-2.fc20.x86_64
samba-vfs-glusterfs-4.1.3-2.fc20.x86_64
samba-libs-4.1.3-2.fc20.x86_64
samba-common-4.1.3-2.fc20.x86_64

Note: The same smb.conf entries should work with CentOS6 too.

Does open source/community model is the better way?

After reading my previous blogs ( blog1 and blog2 ) , you might be wondering if open source/free software/community development model helps to create a better software?  and I am going to shed some light on it in this post.

Before going to further discussion, I want to to talk about community development model.  In a Community development model anybody can participate in the software development irrespective of race, religion, nationality, gender, educational qualification and social status. Anybody who uses the software, develops the software, does bug fix, creates documentation ,  maintain the infrastructure for the project and contributes to the success of the project  is part of the community. In a community project , community decides the road map for the project. This actually changes the nature of the project. We will discuss about how the nature of the project changes in further discussion. However, for the community i.e. everybody to participate in the project,  the source code must be made available to them and this is how source code availability becomes a very important and a bare necessity . Source code  access i.e. open source is a precondition for community development model.  Without access to the source code, we can’t follow a community development model.

I am not sure if all of you understand how  a  software is developed in a company .If you understand it, you can skip the below paragraph.  Else lets first discuss how typically  a software product gets developed in a proprietary company. Then we will compare how it is different from community development.

A company  sells software to solve a problem or a set of problems or a better solution over an existing one. Before selling the product , they develop/create  it. As part of the development process , they hire people to do market study/research on what are the competitive products available for solving the problem,  what they are also trying to solve ? What should be their approach to the problem?  Then they hire software engineers, put them to a RnD lab or “a development lab”  to create it. These  engineers are responsible for writing code and testing for the product. They are not allowed to share the information about the product , the code with outside world. When they are done with development of the product and it is ready,  the company starts selling it to its customers. After the 1st version of software, there might  be new requirements for new features, improvements  to be put in to the product for the consequent versions of the software to make it better or make it competitive  with  other similar products. So that the company can make more profit selling it.

However an opensource/community project usually get started by an  individual or a group of people initiative to solve a problem for themselves. However they make the  source code available for others, in the belief that it will be helpful for others too. If others find it useful, they use it. When people use a software, they might find issues with it. They report the issues to the developer group or  they fix it themselves. Some of them add new features to it according to their need. As a gratitude of the initial help they received as form of the software , they merge the new code/feature with the original software and make it available for others to use. Gradually a community is formed. Person having interest and most knowledge in the project take up  the role of maintaining the project. A maintainer essentially is a project leader whose responsibility is to oversee the project growth, to collaborate among community members, to understand expectation of the community on the project  among lots of other things.  As the project grows, community members decide which features need to be put in to the software, which hardware they want to run it, what would be the future road map in a democratic way.  This leads to development of features which people need most i.e. which solves their problems, not some fancy feature which some company executive thought would be useful for them. This also leads to better support for a wide ranging hardware as it is easier to port the code for the community members to different hardware when the source code is available. Where as proprietary companies support selective hardware which give them maximum user base and profit. But in community we need to support everybody’s hardware so that everybody should be beneficial from it, rather money or profits.

Most of the time open source/community   software has better inter portability with other open source software because the goal is to collaborate, get benefited from each other which ultimately benefits the community.  This leads to better integration between different software projects  with each-other and result in a better product or software ecosystem. However this is not the case with proprietary software. Their decision depends on profit margin, future scope, relationship with each other (i.e. if the software are from different companies)  and so on. Have you stared seeing the difference? 🙂

Even though community projects start with minimum required features but it gradually becomes a incubation ground for innovation or new ideas. Researchers, academicians, computer scientists, corporations,  governments use existing  open source projects to develop something new for their purpose. Lets take an example. A computer scientist doing research on distributed computing and he came up with a new algorithm which improves distributed computing. Now he want to implement and test his algorithm. Does he need to develop a new distributed system to implement his idea? or put his algorithm to a existing open source distributed system.The answer is pretty simple. He takes source code from a opensource project (something like Linux/GNU here), implements his algorithm into it.  However it depends on him whether he wants to merge the code into the existing code base and make it available for others or he want to keep it to himself.  But almost in most of the cases people give it back to the source, from where they took the initial code. Giving code for free doesn’t mean  they are not gaining anything. A code in a popular community project  gives far more credibility, popularity, reach, respect to the author along with his research publication and still if he wants, he can create money out of it . There are lots of examples of Phd  papers/subjects  becoming famous community/opensource/free software projects.

The graph shows how community developed software overtakes proprietary software in-terms of innovation in long run.

CommunityDevelopedSoftware

I copied the below lines form Debian Linux/GNU’s about page [1]

You may be wondering: why would people spend hours of their own time to write software, carefully package it, and then give it all away? The answers are as varied as the people who contribute. Some people like to help others. Many write programs to learn more about computers. More and more people are looking for ways to avoid the inflated price of software. A growing crowd contribute as a thank you for all the great free software they’ve received from others. Many in academia create free software to help get the results of their research into wider use. Businesses help maintain free software so they can have a say in how it develops — there’s no quicker way to get a new feature than to implement it yourself! Of course, a lot of us just find it great fun

When you are in a culture where others help you without any selfish motive, your attitude towards others also changes. You become helpful to others too. However not everybody is kind enough to give back the enhancement they make in the source code. For those we have open source licenses like GPL[2] to force them  to give it back to the the community  which gave them the initial source code if they are selling/commercializing it with enhancements.

Some times organisations  contribute to community projects or starts community projects e.g: Linux/GNU, Mozilla firefox, Fedora, Open suse, Chrome, openstack, Xen virtualization, because they understand the benefit of community development model . We have examples of individuals or group of people/companies starting in open source projects.

Following are the positive sides of a community driven/free software/opensource project.

  • More choice of hardware, platform. Most of the open-source software projects support all possible hardware.
  • The life span of the software will be very long. As it is easier to fix and contribute a feature rather then creating a new project/software.
  • It will be easier to customize open source software according to your needs and taste. You can remove unwanted  features. That will make its IT foot print optimal.
  • It wont have virus, spyware as the source code is available for everyone to see and any suspicious code  never gets into the project or can be easily removable.
  • Better inter portability as it is easier to integrate it with other software.
  • The quality of the code in open source projects are far better then closed source ones as code is reviewed/read by more people. Also the source is better modular because of its distributed way of development.
  • Helps to spread knowledge as source code is a great source of knowledge. You can learn from others work.
  • It helps to avoid vendor lock in. If any company giving you commercial support for a open source/free software, they can’t show monopoly on the software. You are always free to move the support to some other company or hire engineers to support the software as the source code is publicly available.
  • Cost is always less for community driven software when you need commercial supports for the software. This helps organisations to cut down their IT cost which in turn lowers the cost of their product or service.
  • Minimizes software piracy. The model allows everybody to use the community version of the software with no cost, so no need of piracy.
  • Does not take away freedom of users regarding how they want to use it or where they want to use it.
  • Helps to create better culture, where collaboration with others plays a key role.
  • Encourages innovation as there is no need to reinvent the wheel again and we can focus on new stuffs.

I am quoting Linus Torvalds  on open source. He has actually summarized it nicely.

“Me, I just don’t care about proprietary software. It’s not “evil” or “immoral,” it just doesn’t matter. I think that Open Source can do better, and I’m willing to put my money where my mouth is by working on Open Source, but it’s not a crusade – it’s just a superior way of working together and generating code.

It’s superior because it’s a lot more fun and because it makes cooperation much easier (no silly NDA’s or artificial barriers to innovation like in a proprietary setting), and I think Open Source is the right thing to do the same way I believe science is better than alchemy. Like science, Open Source allows people to build on a solid base of previous knowledge, without some silly hiding.

But I don’t think you need to think that alchemy is “evil.” It’s just pointless because you can obviously never do as well in a closed environment as you can with open scientific methods”

to-compete-or-collaborate

The topic is a very big one and it is hard to discuss it in a single blog post.  It is very much possible that I may have missed some obvious points.  So if you have any suggestion , kindly put them in comments. I would be happy to pick them and put it into the post.

[1] http://www.debian.org/intro/about#what

[2] http://www.gnu.org/licenses/gpl.html