GlusterFS RPMs for 3.4.6beta1 are available for testing

RPMs for 3.4.6beta1 are available for testing at http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.4.6beta1/

Above repo contains RPMs for CentOS/ EL 5, 6, 7 and Fedora 19, 20, 21, 22. Packages for other platforms/distributions will be available once they are build.

If you find any issue with the RPMs please report them to gluster community @ http://www.gluster.org/community/index.html

GlusterFS VFS plugin for Samba

Here are the topics this blog is going to cover.

  • Samba Server
  • Samba VFS
  • Libgfapi
  • GlusterFS VFS plugin for Samba and libgfapi
  • Without GlusterFS VFS plugin
  • FUSE mount vs VFS plugin

About Samba Server:

Samba server runs on Unix and Linux/GNU operating systems. Windows clients can talk to Linux/GNU/Unix systems through Samba server. It provides the interoperability between Windows and Linux/Unix systems. Initially it was created to provide printer sharing and file sharing mechanisms between Unix/Linux and Windows. As of now Samba project is doing much more than just file and printer sharing.

Samba server works as a semantic translation engine/machine. Windows clients talk in Windows syntax e.g. SMB protocol. And Unix/Linux/GNU file-systems understand requests in  POSIX. Samba converts Windows syntax to *nix/GNU syntax and vice versa.

This article is about Samba integration with GlusterFS.  For specific details I have taken example of GlusterFS deployed on Linux/GNU.

If you have never heard of Samba project before, you should read about it more , before going further in to this blog.

Here are important link/pointers for further study:

  1. what is Samba?
  2. Samba introduction

Samba VFS:

Samba code is very modular in nature. Samba VFS code is divided in to two parts i.e. Samba VFS layer and VFS modules.

The purpose of Samba VFS layer is to act as an interface between Samba server and  below layers. When Samba server get requests from Windows clients through SMB protocol requests, it passes it to Samba VFS modules.

Samba VFS modules i.e. plugin is a shared library (.so) and it implements some or all functions which Samba VFS layer i.e. interface makes  available.  Samba VFS modules can be stacked on each other(if they are designed to be stacked).

For more about Samba VFS layer, please refer http://unix4.com/w/writing-a-samba-vfs-richard-sharpe-2-oct-2011-e6438-pdf.pdf

Samba VFS layer passes the request to VFS modules. If the Samba share is done for a native Linux/Unix file-system, the call goes to default VFS module. The default VFS module forwards call to System layer i.e. operating system. For User space file-system like GlusterFS, VFS layer calls are implemented through a VFS module i.e. VFS plugin for GlusterFS .The plugin redirects the requests (i.e fops) to GlusterFS APIs i.e. libgfapi. It implements or maps all VFS layer calls using libgfapi.

libgfapi:

libgfapi (i.e. glusterfs api) is set of APIs which can directly talk to GlusterFS. Libgfapi is another access method for GlusterFS like NFS, SMB and FUSE. Libgfapi bindings are available for C, Python, Go and more programming languages. Applications can be developed which can directly use GlusterFS without a GlusterFS volume mount.

 GlusterFS VFS plugin for Samba and libgfapi:

Here is the schematic diagram of how communication works between different layers.

gluster-samba-vfs-plugin

Samba Server:  This represents Samba Server and Samba VFS layer

VFS plugin for GlusterFS: This implements or maps relevant VFS layer fops to libgfapi calls.

glusterd: Management daemon of Glusterfs node i.e. server.

glusterfsd: Brick process of Glusterfs node i.e. server.

The client requests come to Samba server and Samba servers redirects the calls to GlusterFS’s VFS plugin through Samba VFS layer. VFS plugin calls relevant libgfapi fucntions. Libgfapi acts as a client, contacts glusterd for vol file information ( i.e. information about gluster volume, translators, involved nodes) , then forward requests to appropriate glusterfsd i.e. brick processes where requests actually get serviced.

If you want to know specifics about the setup to share GlusterFS’s volume through Samba VFS plugin, refer below link.

https://lalatendumohanty.wordpress.com/2014/02/11/using-glusterfs-with-samba-and-samba-vfs-plugin-for-glusterfs-on-fedora-20/

Without GlusterFS VFS plugin: 

Without GlusterFS VFS plugin, we can still share GlusterFS volume through Samba server. This can be done through native glusterfs mount i.e. FUSE (file system in user space). We need to mount the volume using FUSE i.e .glusterfs native mount in the same machine where Samba server is running, then share the mount point using Samba server. As we are not using the VFS plugin for GlusterFS here, Samba will treat the mounted GlusterFS volume as a native file-system. The default VFS module will be used and the file-system calls will be sent to operating system. The flow is same as any native file system shared through Samba.

FUSE mount vs VFS plugin:

If you are not familiar with file systems in user space,  please read about FUSE i.e. file system in user space.

For FUSE mounts, file system fops from Samba server goes to user space FUSE mount point -> Kernel VFS -> /dev/fuse -> GlusterFS and comes back in the same path. Refer to below diagrams for details. Consider Samba server as an application which runs on the fuse mount point.

Fuse_Mount

Fuse mount architecture for GlusterFS

You can observe the process context switches happens between user and kernel space in above architecture. It is going to be a key differentiation factor when compared with libgfapi based VFS plugin.

For Samba VFS plugin implementation, see the below diagram. With the plugin Samba calls get converted to libgfapi calls and libgfapi forward the requests  to GlusterFS.

libgfapi

Libgfapi architecture for GlusterFS

The above pictures are copied from this presentation:

Advantage of libgfapi based Samba plugin Vs FUSE mount:

  • With libgfapi , there are no kernel VFS layer context switches. This results in performance benefits compared to  FUSE mount.
  • With a separate Samba VFS module i.e. plugin , features ( e.g: more NTFS functionality) can be provided in GlusterFS and it can be supported with Samba, which native Linux file systems do not support.

 

 

 

Understanding GlusterFS CLI Code – Part 1

GlusterFS CLI code follows client-server architecture, we should keep that mind while trying understand the CLI framework i.e. “glusterd” acts as  the server and gluster binary (i.e. /usr/sbin/gluster) acts as the client.

In this write up I have taken “gluster volume create” command and will provide code snippets, gdb back traces and Wireshark network traces .

  • All function calls start when  “gluster volume create <volume-name> <brick1> <brick2>” is entered on the command line.
  • “gluster” i.e. main() in cli.c  process the command line input and sends it to glusterd with relevent callback function information as mentioned below.
  • To be specific, gf_cli_create_volume() in cli/src/cli-rpc-ops.c sends the request to glusterd  with call back function information i.e. gf_cli_create_volume_cbk(). For more information look at the below code snippet  and also the gdb back trace i.e. #bt_1 (check #bt_1 below)
    • ret = cli_to_glusterd (&req, frame, gf_cli_create_volume_cbk, (xdrproc_t) xdr_gf_cli_req, dict, GLUSTER_CLI_CREATE_VOLUME, this, cli_rpc_prog, NULL);
    • CLI contacts glusterd on localhost:24007 as glusterd’s management port is 24007 for TCP.
  • glusterd uses the string passed in the above call back i.e. “GLUSTER_CLI_CREATE_VOLUME” to find out the relevant function call so that it can take the execution ahead.
  • Once the information is sent to glusterd, client just waits for the reply from it. The wait happens in “event_dispatch (ctx->event_pool);” function main():cli.c. #bt_2

There are some other important functions (in the client side in the cli framework) to checkout. If you are debugging any cli issue, there is high probability you will come across these functions. Below are those functions

  • cli_cmd_volume_create_cbk() :
    • Call to gf_cli_create_volume() goes from here and the mechanism to find gf_cli_create_volume() is little different here.
    • Check the structure of rpc_clnt_procedure_t and use of it in cli_cmd_volume_create_cbk() i.e. “proc = &cli_rpc_prog->proctable[GLUSTER_CLI_CREATE_VOLUME];”
  • parse_cmdline()  and cli_opt_parse() : These functions parse the command line input

GDB backtraces of function calls involved in the client side of the framework:

#bt_1:

Breakpoint 1, gf_cli_create_volume (frame=0x664544, this=0x3b9ac83600, data=0x679e48) at cli-rpc-ops.c:3240
3240            gf_cli_req              req = {{0,}};
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.107.el6_4.5.x86_64 keyutils-libs-1.4-4.el6.x86_64 krb5-libs-1.10.3-10.el6_4.6.x86_64 libcom_err-1.41.12-14.el6_4.4.x86_64 libselinux-2.0.94-5.3.el6_4.1.x86_64 libxml2-2.7.6-12.el6_4.1.x86_64 ncurses-libs-5.7-3.20090208.el6.x86_64 openssl-1.0.0-27.el6_4.2.x86_64 readline-6.0-4.el6.x86_64 zlib-1.2.3-29.el6.x86_64
(gdb) bt
#0  gf_cli_create_volume (frame=0x664544, this=0x3b9ac83600, data=0x679e48) at cli-rpc-ops.c:3240
##1  0x0000000000411587 in cli_cmd_volume_create_cbk (state=0x7fffffffe270, word=<value optimized out>, words=<value optimized out>,
#    wordcount=<value optimized out>) at cli-cmd-volume.c:410
#    #2  0x000000000040aa8b in cli_cmd_process (state=0x7fffffffe270, argc=5, argv=0x7fffffffe460) at cli-cmd.c:140
#    #3  0x000000000040a510 in cli_batch (d=<value optimized out>) at input.c:34
#    #4  0x0000003b99a07851 in start_thread () from /lib64/libpthread.so.0
#    #5  0x0000003b996e894d in clone () from /lib64/libc.so.6

#bt_2:

#0  gf_cli_create_volume_cbk (req=0x68a57c, iov=0x68a5bc, count=1, myframe=0x664544) at cli-rpc-ops.c:762
#1  0x0000003b9b20dd85 in rpc_clnt_handle_reply (clnt=0x68a390, pollin=0x6a3f20) at rpc-clnt.c:772
#2  0x0000003b9b20f327 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x68a3c0, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:905
#3  0x0000003b9b20ab78 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:512
#4  0x00007ffff711fd86 in socket_event_poll_in (this=0x6939b0) at socket.c:2119
#5  0x00007ffff712169d in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x6939b0, poll_in=1, poll_out=0, poll_err=0) at socket.c:2229
#6  0x0000003b9aa62327 in event_dispatch_epoll_handler (event_pool=0x662710) at event-epoll.c:384
#7  event_dispatch_epoll (event_pool=0x662710) at event-epoll.c:445
#8  0x0000000000409891 in main (argc=<value optimized out>, argv=<value optimized out>) at cli.c:666

Code flow for “volume create” command in “glusterd” i.e. server side of cli framework

  • As mentioned in the cli client’s code flow, string “GLUSTER_CLI_CREATE_VOLUME” helps glusterd to find out the relevant function for the command . To see how it is done check structure gd_svc_cli_actors in glusterd-handler.c . I have also copied a small snippet of it.
    • rpcsvc_actor_t gd_svc_cli_actors[ ] = { [GLUSTER_CLI_PROBE]   = { “CLI_PROBE”, GLUSTER_CLI_PROBE, glusterd_handle_cli_probe, NULL, 0, DRC_NA}, [GLUSTER_CLI_CREATE_VOLUME]    = { “CLI_CREATE_VOLUME”, GLUSTER_CLI_CREATE_VOLUME,    glusterd_handle_create_volume, NULL, 0, DRC_NA},
  • Hence the call goes like this glusterd_handle_create_volume-> __glusterd_handle_create_volume
  • In __glusterd_handle_create_volume() all required validations are done e.g.: if the  volume with same name already exists or brick is on a separate partition or root partition, number of bricks. The gfid* for the volume is also generated here.
  • Another important function is gd_sync_task_begin(). May be I will go to details of this function in future write-up. Because as of now I dont understand it completely.
  • Once glusterd creates the volume , it sends the data back to the cli client. This happens in glusterd-rpc-ops.c:glusterd_op_send_cli_response()
    • glusterd_to_cli (req, cli_rsp, NULL, 0, NULL, xdrproc, ctx);

Network traces captured during the “create volume” command:

::1 -> ::1 Gluster CLI 292 V2 CREATE_VOLUME Call
node1 -> node2 GlusterD Management 200 V2 CLUSTER_LOCK Call
node2->node1 GlusterD Management 168 V2 CLUSTER_LOCK Reply (Call In
node1 -> node2 GlusterD Management 540 V2 STAGE_OP Call
node2 -> node1 GlusterD Management 184 V2 STAGE_OP Reply
127.0.0.1 -> 127.0.0.1 GlusterFS Callback 112 [TCP Previous segment
127.0.0.1 -> 127.0.0.1 GlusterFS Handshake 168 V2 GETSPEC Call
127.0.0.1 -> 127.0.0.1 GlusterFS Callback 112 [TCP Previous segment
127.0.0.1 -> 127.0.0.1 GlusterFS Handshake 160 V2 GETSPEC Call
node1 -> node2 GlusterFS Callback 112 V1 FETCHSPEC Call
::1 -> ::1 GlusterFS Callback 132 V1 FETCHSPEC Call
::1 -> ::1 GlusterFS Handshake 192 V2 GETSPEC Call
::1 -> ::1 GlusterFS Callback 132 [TCP Previous segment
::1 -> ::1 GlusterFS Callback 132 [TCP Previous segment
::1 -> ::1 GlusterFS Callback 132 V1 FETCHSPEC Call
node1 -> node2 GlusterD Management 540 V2 COMMIT_OP Call
127.0.0.1 -> 127.0.0.1 GlusterFS Handshake 984 V2 GETSPEC Reply
127.0.0.1 -> 127.0.0.1 GlusterFS Handshake 1140 V2 GETSPEC Reply
::1 -> ::1 GlusterFS Handshake 1436 V2 GETSPEC Reply
::1 -> ::1 GlusterFS Handshake 180 V2 GETSPEC Call
::1 -> ::1 GlusterFS Handshake 1160 V2 GETSPEC Reply
::1 -> ::1 GlusterFS Handshake 188 V2 GETSPEC Call
::1 -> ::1 GlusterFS Handshake 1004 V2 GETSPEC Reply
node2 -> node1 GlusterFS Callback 112 V1 FETCHSPEC Call
node2 -> node1 GlusterD Management 184 V2 COMMIT_OP Reply
node1 -> node2 GlusterD Management 200 V2 CLUSTER_UNLOCK Call
node2 -> node1 GlusterD Management 168 V2 CLUSTER_UNLOCK Reply
::1 -> ::1 Gluster CLI 464 V2 CREATE_VOLUME Reply

Below function calls happened during the volume create as seen in /var/log/glusterfs/etc-glusterfs-glusterd.vol.log. This has been collected after running glusterd in DEBUG mode.

[glusterd-volume-ops.c:69:__glusterd_handle_create_volume]
[glusterd-utils.c:412:glusterd_check_volume_exists]
[glusterd-utils.c:155:glusterd_lock]
[glusterd-utils.c:412:glusterd_check_volume_exists]
[glusterd-utils.c:597:glusterd_brickinfo_new]
[glusterd-utils.c:659:glusterd_brickinfo_new_from_brick]
[glusterd-utils.c:457:glusterd_volinfo_new]
[glusterd-utils.c:541:glusterd_volume_brickinfos_delete]
[store.c:433:gf_store_handle_destroy]
[glusterd-utils.c:571:glusterd_volinfo_delete]
[glusterd-utils.c:597:glusterd_brickinfo_new]
[glusterd-utils.c:659:glusterd_brickinfo_new_from_brick]
[glusterd-utils.c:457:glusterd_volinfo_new]
[glusterd-utils.c:541:glusterd_volume_brickinfos_delete]
[store.c:433:gf_store_handle_destroy]

***************************************************
[glusterd-utils.c:5244:glusterd_hostname_to_uuid]
[glusterd-utils.c:878:glusterd_volume_brickinfo_get]
[glusterd-utils.c:887:glusterd_volume_brickinfo_get]
[glusterd-op-sm.c:4284:glusterd_op_commit_perform]
[glusterd-op-sm.c:3404:glusterd_op_modify_op_ctx]
[glusterd-rpc-ops.c:193:glusterd_op_send_cli_response]
[socket.c:492:__socket_rwv]
[socket.c:2235:socket_event_handler]

gfid*: it is an uuid number maintained by glusterfs. GlusterFS uses gfid extensively and a separate write-up on gfid would be more justifiable.

I hope it will give some insights to anyone trying to understand GlusterFS cli framework.

Using GlusterFS With GlusterFS Samba vfs plugin on Fedora

This blog covers the steps and implementation details to use GlusterFS Samba VFS plugin.

Please refer below link, If you are looking for architectural information for GlusterFS Samba VFS plugin,  difference between FUSE mount vs Samba VFS plugin

https://lalatendumohanty.wordpress.com/2014/04/20/glusterfs-vfs-plugin-for-samba/

I have setup  two node GlusterFS cluster with Fedora 20 (minimal install) VMs. Each VM has 3 separate XFS partitions with each partitions 100GB each.
One of the Gluster node is used as a Samba server in this setup.

I had originally tested this with Fedora 20. But this example should work fine with latest Fedoras i.e. F21 and F22

GlusterFS Version: glusterfs-3.4.2-1.fc20.x86_64

Samba version:  samba-4.1.3-2.fc20.x86_64

Post installation “df -h” command looked like below in the VMs
$df -h
Filesystem                            Size  Used Avail Use% Mounted on
/dev/mapper/fedora_dhcp159–242-root   50G  2.2G   45G   5% /
devtmpfs                              2.0G     0  2.0G   0% /dev
tmpfs                                 2.0G     0  2.0G   0% /dev/shm
tmpfs                                 2.0G  432K  2.0G   1% /run
tmpfs                                 2.0G     0  2.0G   0% /sys/fs/cgroup
tmpfs                                 2.0G     0  2.0G   0% /tmp
/dev/vda1                             477M  103M  345M  23% /boot
/dev/mapper/fedora_dhcp159–242-home   45G   52M   43G   1% /home
/dev/mapper/gluster_vg1-gluster_lv1           100G  539M  100G   1% /gluster/brick1
/dev/mapper/gluster_vg2-gluster_lv2           100G  406M  100G   1% /gluster/brick2
/dev/mapper/gluster_vg3-gluster_lv3           100G   33M  100G   1% /gluster/brick3

You can use following commands to create xfs partitions
1. pvcreate /dev/vdb
2. vgcreate VG_NAME /dev/vdb
3. lvcreate -n LV_NAME -l 100%PVS VG_NAME /dev/vdb
4. mkfs.xfs -i size=512 LV_PATH

Following are the steps and packages need to be performed/installed on each node (which is Fedora 20 for mine)

#Change SELinux to either “permissive” or “disabled” mode

# To put SELinux in permissive mode
$setenforce 0

#To see the current mode of SELinux

$getenforce

SELinux policy rules for Gluster is present in recent Fedora releases e.g. F21, F22 or later. So SELinux should work fine with Gluster.

#Remove all iptable rules, so that it does not interfare with Gluster

$iptables -F

yum install glusterfs-server
yum install samba-vfs-glusterfs
yum install samba-client

#samba-vfs-glusterfs RPMs for CentOS, RHEL, Fedora19/18 are avialable at http://download.gluster.org/pub/gluster/glusterfs/samba/

#To start glusterd and auto start it after boot
$systemctl start glusterd
$systemctl enable glusterd
$systemctl status glusterd

#To start smb and auto start it after boot
$systemctl start smb
$systemctl enable smb
$systemctl status smb

#Create gluster volume and start it. (Running below commands from Server1_IP)

$gluster peer probe Server2_IP
$gluster peer status
Number of Peers: 1

Hostname: Server2_IP
Port: 24007
Uuid: aa6f71d9-0dfe-4261-a2cd-5f281632aaeb
State: Peer in Cluster (Connected)
$gluster v create testvol Server2_IP:/gluster/brick1/testvol-b1 Server1_IP:/gluster/brick1/testvol-b2
$gluster v start testvol

#Modify smb.conf for Samba share

$vi /etc/samba/smb.conf

#
[testvol]
comment = For samba share of volume testvol
path = /
read only = No
guest ok = Yes
kernel share modes = No
vfs objects = glusterfs
glusterfs:loglevel = 7
glusterfs:logfile = /var/log/samba/glusterfs-testvol.log
glusterfs:volume = testvol

#For debug logs you can change the log levels to 10 e.g: “glusterfs:loglevel = 10”

# Do not miss “kernel share modes = No” else you won’t be able to write anything in to the share

#verify that your changes are correctly understood by Samba
$testparm -s
Load smb config files from /etc/samba/smb.conf
rlimit_max: increasing rlimit_max (1024) to minimum Windows limit (16384)
Processing section “[homes]”
Processing section “[printers]”
Processing section “[testvol]”
Loaded services file OK.
Server role: ROLE_STANDALONE
[global]
workgroup = MYGROUP
server string = Samba Server Version %v
log file = /var/log/samba/log.%m
max log size = 50
idmap config * : backend = tdb
cups options = raw

[homes]
comment = Home Directories
read only = No
browseable = No

[printers]
comment = All Printers
path = /var/spool/samba
printable = Yes
print ok = Yes
browseable = No

[testvol]
comment = For samba share of volume testvol
path = /
read only = No
guest ok = Yes
kernel share modes = No
vfs objects = glusterfs
glusterfs:loglevel = 10
glusterfs:logfile = /var/log/samba/glusterfs-testvol.log
glusterfs:volume = testvol

#Restart the Samba service. This not a compulsory step as Samba takes latest smb.conf for new connections. But to make sure it uses the latest smb.conf, restart the service.
$systemctl  restart smb

#Set smbpasswd for root. This will be used for mounting the volume/Samba share on the client
$smbpasswd -a root

#Mount the cifs share using following command and it is ready for use 🙂
mount -t cifs -o username=root,password=<smbpassword> //Server1_IP/testvol /mnt/cifs

GlusterFS volume tuning for volume shared through Samba:

  • Gluster volume needs to have: “gluster volume set volname server.allow-insecure on”
  • /etc/glusterfs/glusterd.vol of each of gluster node
    add “option rpc-auth-allow-insecure on”
  • Restart glusterd of each node.

For setups where Samba server and Gluster nodes need to be on different machines:

# put “glusterfs:volfile_server = <server name/ip>” in the smb.conf settings for the specific  volume

e.g:

[testvol]
comment = For samba share of volume testvol
path = /
read only = No
guest ok = Yes
kernel share modes = No
vfs objects = glusterfs
glusterfs:loglevel = 7
glusterfs:logfile = /var/log/samba/glusterfs-testvol.log

glusterfs:volfile_server = <server name/ip>
glusterfs:volume = testvol

#Here are the packages that were installed on the nodes

rpm -qa | grep gluster
glusterfs-libs-3.4.2-1.fc20.x86_64
glusterfs-api-3.4.2-1.fc20.x86_64
glusterfs-3.4.2-1.fc20.x86_64
glusterfs-cli-3.4.2-1.fc20.x86_64
glusterfs-server-3.4.2-1.fc20.x86_64
samba-vfs-glusterfs-4.1.3-2.fc20.x86_64
glusterfs-devel-3.4.2-1.fc20.x86_64
glusterfs-fuse-3.4.2-1.fc20.x86_64
glusterfs-api-devel-3.4.2-1.fc20.x86_64

[root@dhcp159-242 ~]# rpm -qa | grep samba
samba-client-4.1.3-2.fc20.x86_64
samba-4.1.3-2.fc20.x86_64
samba-vfs-glusterfs-4.1.3-2.fc20.x86_64
samba-libs-4.1.3-2.fc20.x86_64
samba-common-4.1.3-2.fc20.x86_64

Note: The same smb.conf entries should work with CentOS6 too.