个人工具

“UbuntuHelp:SettingUpGPFSHowTo”的版本间的差异

来自Ubuntu中文

跳转至: 导航, 搜索
 
(未显示2个用户的3个中间版本)
第1行: 第1行:
 
{{From|https://help.ubuntu.com/community/SettingUpGPFSHowTo}}
 
{{From|https://help.ubuntu.com/community/SettingUpGPFSHowTo}}
 
{{Languages|UbuntuHelp:SettingUpGPFSHowTo}}
 
{{Languages|UbuntuHelp:SettingUpGPFSHowTo}}
 
 
 
=== Introduction ===
 
=== Introduction ===
 
GPFS stands for the Global Parallel File System.  It is a commercial product from IBM, and is available for purchase for use on AIX and Linux platforms.  Linux packages and official support are currently only available for Red Hat and SuSE.  If you choose to install GPFS on Ubuntu, it is important for you to understand that your install will not supported by IBM.  But it may still be useful. :-)
 
GPFS stands for the Global Parallel File System.  It is a commercial product from IBM, and is available for purchase for use on AIX and Linux platforms.  Linux packages and official support are currently only available for Red Hat and SuSE.  If you choose to install GPFS on Ubuntu, it is important for you to understand that your install will not supported by IBM.  But it may still be useful. :-)
 
 
GPFS provides for incredible scalability, good performance, and fault tolerance (Ie: machines can go down, and the filesystem is still accessible to others).  For more information on GPFS, [http://www-306.ibm.com/common/ssi/OIX.wss?DocURL=http://d03xhttpcl001g.boulder.ibm.com/common/ssi/rep_ca/5/897/ENUS206-095/index.html&InfoType=AN&InfoSubType=CA&InfoDesc=Announcement+Letters&panelurl=&paneltext= click here].
 
GPFS provides for incredible scalability, good performance, and fault tolerance (Ie: machines can go down, and the filesystem is still accessible to others).  For more information on GPFS, [http://www-306.ibm.com/common/ssi/OIX.wss?DocURL=http://d03xhttpcl001g.boulder.ibm.com/common/ssi/rep_ca/5/897/ENUS206-095/index.html&InfoType=AN&InfoSubType=CA&InfoDesc=Announcement+Letters&panelurl=&paneltext= click here].
 
 
We run Ubuntu as our standard Linux distribution, and so I set forth to find a way to make GPFS work on Ubuntu.  These are the steps that I took, that hopefully will also allow you to produce a working GPFS cluster.
 
We run Ubuntu as our standard Linux distribution, and so I set forth to find a way to make GPFS work on Ubuntu.  These are the steps that I took, that hopefully will also allow you to produce a working GPFS cluster.
 
+
=== Recent Updates ===
 +
* After the initial success in getting this system running, we've run into difficulties under certain circumstances with GPFS hanging on certain nodes, requiring a reset of the node (not just a reboot).  This is a kernel plus GPFS "portability layer" related issue.  Resolution is pending, but we are also contemplating Lustre (http://lustre.org/) as an alternative.  Our interest in Lustre is not because we won't be able to make GPFS work, but because the level of effort may be significantly less with Lustre, as is is open source, and the Lustre folks are more friendly towards Ubuntu and other non-Red Hat/SuSE distributions.
 
=== Hardware Overview ===
 
=== Hardware Overview ===
 
Three machines
 
Three machines
第15行: 第12行:
 
* box2.example.com
 
* box2.example.com
 
* box3.example.com
 
* box3.example.com
 
 
Each machine will have 2 fibre channel cards connecting it to the SAN.
 
Each machine will have 2 fibre channel cards connecting it to the SAN.
 
 
We have three volumes presented from the SAN to all three machines.
 
We have three volumes presented from the SAN to all three machines.
 
 
 
=== Software Install ===
 
=== Software Install ===
 
OS is Ubuntu Dapper on amd64.
 
OS is Ubuntu Dapper on amd64.
 
 
==== Dependencies ====
 
==== Dependencies ====
 
Satisfy package dependencies for building and running:
 
Satisfy package dependencies for building and running:
第29行: 第21行:
 
apt-get install libstdc++5 imake makedepend
 
apt-get install libstdc++5 imake makedepend
 
</nowiki></pre>
 
</nowiki></pre>
 
 
Additionally, the GPFS binaries have paths to certain binaries hard coded.  Bah!  Create links so that the necessary binaries can be found:
 
Additionally, the GPFS binaries have paths to certain binaries hard coded.  Bah!  Create links so that the necessary binaries can be found:
 
<pre><nowiki>
 
<pre><nowiki>
第36行: 第27行:
 
test -e /bin/awk      || sudo ln -s /usr/bin/awk  /bin/awk
 
test -e /bin/awk      || sudo ln -s /usr/bin/awk  /bin/awk
 
</nowiki></pre>
 
</nowiki></pre>
 
 
==== Purchase ====
 
==== Purchase ====
 
Purchase licenses for use of GPFS from IBM.
 
Purchase licenses for use of GPFS from IBM.
 
 
==== Download ====
 
==== Download ====
 
Download "IBM General Parallel File System 3.1 English International(C89HWIE)" from the IBM Passport site.  The name of the downloaded file is: '''c89hwie.tar'''.  This file holds the same contents that you would find on the x86 and x86_64 CDs.
 
Download "IBM General Parallel File System 3.1 English International(C89HWIE)" from the IBM Passport site.  The name of the downloaded file is: '''c89hwie.tar'''.  This file holds the same contents that you would find on the x86 and x86_64 CDs.
 
 
==== Extract ====
 
==== Extract ====
 
<pre><nowiki>
 
<pre><nowiki>
第49行: 第37行:
 
sudo ./gpfs_install-3.1.0-0_x86_64
 
sudo ./gpfs_install-3.1.0-0_x86_64
 
</nowiki></pre>
 
</nowiki></pre>
 
 
After accepting the license, you should now have a directory full of RPMs.
 
After accepting the license, you should now have a directory full of RPMs.
 
<pre><nowiki>
 
<pre><nowiki>
第60行: 第47行:
 
status.dat
 
status.dat
 
</nowiki></pre>
 
</nowiki></pre>
 
 
==== Convert RPMs to Debs ====
 
==== Convert RPMs to Debs ====
 
Let's turn 'em into debs, eh?
 
Let's turn 'em into debs, eh?
第69行: 第55行:
 
sudo cp *.deb /usr/lpp/mmfs/3.1/
 
sudo cp *.deb /usr/lpp/mmfs/3.1/
 
</nowiki></pre>
 
</nowiki></pre>
 
 
==== Install Debs ====
 
==== Install Debs ====
 
Now we can install them.
 
Now we can install them.
第75行: 第60行:
 
sudo dpkg -i /usr/lpp/mmfs/3.1/*.deb
 
sudo dpkg -i /usr/lpp/mmfs/3.1/*.deb
 
</nowiki></pre>
 
</nowiki></pre>
 
 
==== Build GPFS Kernel Modules ====
 
==== Build GPFS Kernel Modules ====
 
They call this the "Linux portability interface".  It's an open source module that acts as a wrapper around the proprietary GPFS driver.
 
They call this the "Linux portability interface".  It's an open source module that acts as a wrapper around the proprietary GPFS driver.
 
 
Install build dependecies.
 
Install build dependecies.
 
<pre><nowiki>
 
<pre><nowiki>
第86行: 第69行:
 
sudo apt-get build-dep linux-headers-${KERNEL_VER_FULL} linux-headers-${KERNEL_VER_SHORT}
 
sudo apt-get build-dep linux-headers-${KERNEL_VER_FULL} linux-headers-${KERNEL_VER_SHORT}
 
</nowiki></pre>
 
</nowiki></pre>
 
 
Change the perms on their source tree so that you can build as a non-root user.
 
Change the perms on their source tree so that you can build as a non-root user.
 
<pre><nowiki>
 
<pre><nowiki>
 
sudo chown -R finley /usr/lpp/mmfs/src/
 
sudo chown -R finley /usr/lpp/mmfs/src/
 
</nowiki></pre>
 
</nowiki></pre>
 
 
Apply the "2.6.15.x kernel" patch:
 
Apply the "2.6.15.x kernel" patch:
 
<pre><nowiki>
 
<pre><nowiki>
 
cd /usr/lpp/mmfs/src/
 
cd /usr/lpp/mmfs/src/
wget https://svn.example.com/repos/cis_unix_pub/misc_scripts/trunk/gpfs/gpfs.with_linux-2.6.15.x.patch
+
wget http://download.systemimager.org/pub/gpfs/gpfs.with_linux-2.6.15.x.patch.bz2
 +
bunzip2 gpfs.with_linux-2.6.15.x.patch.bz2
 
patch -p5 < gpfs.with_linux-2.6.15.x.patch
 
patch -p5 < gpfs.with_linux-2.6.15.x.patch
 
</nowiki></pre>
 
</nowiki></pre>
 
 
Edit the build config file.
 
Edit the build config file.
 
<pre><nowiki>
 
<pre><nowiki>
第105行: 第86行:
 
vi config/site.mcr  # see /usr/lpp/mmfs/src/README for details
 
vi config/site.mcr  # see /usr/lpp/mmfs/src/README for details
 
</nowiki></pre>
 
</nowiki></pre>
 
 
Do the build.
 
Do the build.
 
<pre><nowiki>
 
<pre><nowiki>
第112行: 第92行:
 
make World
 
make World
 
</nowiki></pre>
 
</nowiki></pre>
 
 
Install the modules and binaries.
 
Install the modules and binaries.
 
<pre><nowiki>
 
<pre><nowiki>
 
sudo make InstallImages
 
sudo make InstallImages
 
</nowiki></pre>
 
</nowiki></pre>
 
 
==== Distribute the Install to other GPFS Clients ====
 
==== Distribute the Install to other GPFS Clients ====
 
NOTE: In GPFS vernacular, all participating machines are clients, whether or not they are directly attached to disk that is part of the GPFS filesystem.
 
NOTE: In GPFS vernacular, all participating machines are clients, whether or not they are directly attached to disk that is part of the GPFS filesystem.
 
 
NOTE: You may wish to implement "SSH for Root" below prior to doing this step for convenience.
 
NOTE: You may wish to implement "SSH for Root" below prior to doing this step for convenience.
 
 
<pre><nowiki>
 
<pre><nowiki>
 
for i in box2 box3
 
for i in box2 box3
第131行: 第107行:
 
done
 
done
 
</nowiki></pre>
 
</nowiki></pre>
 
 
==== Modify your $PATH ====
 
==== Modify your $PATH ====
 
To have the GPFS binaries appear in the $PATH, we chose to modify /etc/profile, which affects all users on the system (that are using Bourne based shells).
 
To have the GPFS binaries appear in the $PATH, we chose to modify /etc/profile, which affects all users on the system (that are using Bourne based shells).
 
 
Just add the following line to the end of <code>/etc/profile</code>.
 
Just add the following line to the end of <code>/etc/profile</code>.
 
<pre><nowiki>
 
<pre><nowiki>
 
PATH=$PATH:/usr/lpp/mmfs/bin
 
PATH=$PATH:/usr/lpp/mmfs/bin
 
</nowiki></pre>
 
</nowiki></pre>
 
 
=== Configuring the Cluster ===
 
=== Configuring the Cluster ===
 
==== SSH for Root ====
 
==== SSH for Root ====
 
Unfortunately, one of GPFS' shortcomings is a need for all cluster nodes to be able to ssh to all other cluster nodes a) as root, and b) without a password.
 
Unfortunately, one of GPFS' shortcomings is a need for all cluster nodes to be able to ssh to all other cluster nodes a) as root, and b) without a password.
 
 
There are multiple ways to accomplish this.  We have chosen to use host based authentication.
 
There are multiple ways to accomplish this.  We have chosen to use host based authentication.
 
 
===== /etc/hosts =====
 
===== /etc/hosts =====
 
First, all nodes need to know the addresses of all other nodes.
 
First, all nodes need to know the addresses of all other nodes.
 
 
GPFS seems to like the idea of a dedicated network for cluster communication, although this is not strictly necessary.  Here we're using a dedicated private network, off a secondary NIC, for each cluster client.  As this is a private network in our case, we don't keep this information in DNS.
 
GPFS seems to like the idea of a dedicated network for cluster communication, although this is not strictly necessary.  Here we're using a dedicated private network, off a secondary NIC, for each cluster client.  As this is a private network in our case, we don't keep this information in DNS.
 
 
Make sure you have entries in /etc/hosts for each machine in the cluster.
 
Make sure you have entries in /etc/hosts for each machine in the cluster.
 
 
===== /etc/ssh/sshd_config =====
 
===== /etc/ssh/sshd_config =====
 
Here are the relevant ssh server options:
 
Here are the relevant ssh server options:
第160行: 第128行:
 
HostbasedAuthentication  yes
 
HostbasedAuthentication  yes
 
</nowiki></pre>
 
</nowiki></pre>
 
 
===== /etc/ssh/ssh_config =====
 
===== /etc/ssh/ssh_config =====
 
Here are the relevant ssh client options:
 
Here are the relevant ssh client options:
第168行: 第135行:
 
EnableSSHKeysign          yes
 
EnableSSHKeysign          yes
 
</nowiki></pre>
 
</nowiki></pre>
 
 
===== /root/.shosts =====
 
===== /root/.shosts =====
 
For host based authentication of normal users, the changes to ssh_config and sshd_config are sufficient.  However, for the root user, it is also necessary to include a ".shosts" file in the root user's home directory.  It is recommended that this contain the IP addresses and base host names (as resolved by "getent hosts $ipaddress") for each GPFS client.
 
For host based authentication of normal users, the changes to ssh_config and sshd_config are sufficient.  However, for the root user, it is also necessary to include a ".shosts" file in the root user's home directory.  It is recommended that this contain the IP addresses and base host names (as resolved by "getent hosts $ipaddress") for each GPFS client.
 
<pre><nowiki>
 
<pre><nowiki>
 
root@box1:~# cat /root/.shosts
 
root@box1:~# cat /root/.shosts
 +
# Fri Apr 20 15:14:17 CDT 2007
 
box1-160
 
box1-160
 
10.221.160.41
 
10.221.160.41
第180行: 第147行:
 
10.221.160.43
 
10.221.160.43
 
</nowiki></pre>
 
</nowiki></pre>
 
 
===== /etc/shosts.equiv =====
 
===== /etc/shosts.equiv =====
 
This file allows normal users to take advantage of host based authentication without having to create their own .shosts files.  It's contents are exactly the same as a .shosts file.
 
This file allows normal users to take advantage of host based authentication without having to create their own .shosts files.  It's contents are exactly the same as a .shosts file.
 
<pre><nowiki>
 
<pre><nowiki>
 +
# Fri Apr 20 15:14:17 CDT 2007
 
box1-160
 
box1-160
 
10.221.160.41
 
10.221.160.41
第191行: 第158行:
 
10.221.160.43
 
10.221.160.43
 
</nowiki></pre>
 
</nowiki></pre>
 
 
===== /etc/ssh/ssh_known_hosts =====
 
===== /etc/ssh/ssh_known_hosts =====
 
Having this file properly populated means that user's aren't prompted to accept a hosts key when connecting to it for the first time.
 
Having this file properly populated means that user's aren't prompted to accept a hosts key when connecting to it for the first time.
 
<pre><nowiki>
 
<pre><nowiki>
 +
# Fri Apr 20 15:14:18 CDT 2007
 
box1-160 ssh-dss AAAAB3NzaC1kc3...
 
box1-160 ssh-dss AAAAB3NzaC1kc3...
 
box1-160 ssh-rsa AAAAB3NzaC1yc2...
 
box1-160 ssh-rsa AAAAB3NzaC1yc2...
第208行: 第175行:
 
10.221.160.43 ssh-rsa AAAAB3Nza...
 
10.221.160.43 ssh-rsa AAAAB3Nza...
 
</nowiki></pre>
 
</nowiki></pre>
 
 
==== iptables ====
 
==== iptables ====
 
If you use iptables on your machines, you will want to allow traffic from ssh, and from the GPFS daemon, on all of the cluster nodes to all of the cluster nodes.  I don't know the exact port ranges the GPFS daemon uses off hand, but I'm sure one could look that up if one were so motivated.  For me, I will simply allow all traffic from all nodes to all nodes for now with a rule such as this for each cluster node:
 
If you use iptables on your machines, you will want to allow traffic from ssh, and from the GPFS daemon, on all of the cluster nodes to all of the cluster nodes.  I don't know the exact port ranges the GPFS daemon uses off hand, but I'm sure one could look that up if one were so motivated.  For me, I will simply allow all traffic from all nodes to all nodes for now with a rule such as this for each cluster node:
 
<pre><nowiki>
 
<pre><nowiki>
 +
# GPFS
 +
#-A INPUT-TABLE -m state --state NEW -m tcp -p tcp --dport 1191 -j ACCEPT
 
-A INPUT-TABLE -m state --state NEW -m tcp -p tcp --source 10.221.160.0/25 -j ACCEPT
 
-A INPUT-TABLE -m state --state NEW -m tcp -p tcp --source 10.221.160.0/25 -j ACCEPT
 
</nowiki></pre>
 
</nowiki></pre>
 
 
==== Create a NodeFile ====
 
==== Create a NodeFile ====
 
The file name is actually "NodeFile".
 
The file name is actually "NodeFile".
 
 
Here are the contents:
 
Here are the contents:
 
<pre><nowiki>
 
<pre><nowiki>
第224行: 第190行:
 
box3-160:quorum
 
box3-160:quorum
 
</nowiki></pre>
 
</nowiki></pre>
 
 
==== Create the Cluster ====
 
==== Create the Cluster ====
 
<pre><nowiki>
 
<pre><nowiki>
 
mmcrcluster -N NodeFile -p box1-160 -s box2-160 -r `which ssh` -R `which scp` -C gpfs-cluster.example.com
 
mmcrcluster -N NodeFile -p box1-160 -s box2-160 -r `which ssh` -R `which scp` -C gpfs-cluster.example.com
 
</nowiki></pre>
 
</nowiki></pre>
 
 
==== Start the GPFS Cluster ====
 
==== Start the GPFS Cluster ====
 
The cluster needs to be operational prior to creating a file system.  So let's tell all the nodes to start participating in the cluster:
 
The cluster needs to be operational prior to creating a file system.  So let's tell all the nodes to start participating in the cluster:
第235行: 第199行:
 
mmstartup -a
 
mmstartup -a
 
</nowiki></pre>
 
</nowiki></pre>
 
 
Verify that they were able to do so:
 
Verify that they were able to do so:
 
<pre><nowiki>
 
<pre><nowiki>
 +
# mmgetstate -aLv
  
 
  Node number  Node name      Quorum  Nodes up  Total nodes  GPFS state  Remarks
 
  Node number  Node name      Quorum  Nodes up  Total nodes  GPFS state  Remarks
第246行: 第210行:
  
 
</nowiki></pre>
 
</nowiki></pre>
 
 
==== Create a DescFile ====
 
==== Create a DescFile ====
 
A DescFile contains information (Description) about the physical discs in the cluster.  Here are the contents of my DescFile:
 
A DescFile contains information (Description) about the physical discs in the cluster.  Here are the contents of my DescFile:
 
<pre><nowiki>
 
<pre><nowiki>
 +
# DiskName:PrimaryServer:BackupServer:DiskUsage:FailureGroup:DesiredName:StoragePool
 
/dev/sdm1:box1-160:box2-160
 
/dev/sdm1:box1-160:box2-160
 
/dev/sdn1:box2-160:box3-160
 
/dev/sdn1:box2-160:box3-160
 
/dev/sdo1:box3-160:box1-160
 
/dev/sdo1:box3-160:box1-160
 
</nowiki></pre>
 
</nowiki></pre>
 
 
==== Prepare the Physical Disks as NSDs ====
 
==== Prepare the Physical Disks as NSDs ====
 
NSD stands for Network Shared Disk.
 
NSD stands for Network Shared Disk.
第261行: 第224行:
 
mmcrnsd -F DescFile
 
mmcrnsd -F DescFile
 
</nowiki></pre>
 
</nowiki></pre>
 
 
'''NOTE:''' If mmcrnsd refuses to operate on your disks or partitions, because they were previously in use, '''and''' you know that they are currently '''NOT''' in use, then you can add the "-v no" option to the end of the mmcrnsd command above.
 
'''NOTE:''' If mmcrnsd refuses to operate on your disks or partitions, because they were previously in use, '''and''' you know that they are currently '''NOT''' in use, then you can add the "-v no" option to the end of the mmcrnsd command above.
 
 
 
After creating the NSDs, you can list them:
 
After creating the NSDs, you can list them:
 
<pre><nowiki>
 
<pre><nowiki>
第276行: 第236行:
  
 
</nowiki></pre>
 
</nowiki></pre>
 
 
NOTE:  The mmcrnsd command mangles the DescFile, which is why we create a copy of it above.  The resultant file looks like this:
 
NOTE:  The mmcrnsd command mangles the DescFile, which is why we create a copy of it above.  The resultant file looks like this:
 
<pre><nowiki>
 
<pre><nowiki>
 +
# DiskName:PrimaryServer:BackupServer:DiskUsage:FailureGroup:DesiredName:StoragePool
 +
# /dev/sdm1:box1-160:box2-160
 
gpfs1nsd:::dataAndMetadata:4001::
 
gpfs1nsd:::dataAndMetadata:4001::
 +
# /dev/sdn1:box2-160:box3-160
 
gpfs2nsd:::dataAndMetadata:4003::
 
gpfs2nsd:::dataAndMetadata:4003::
 +
# /dev/sdo1:box3-160:box1-160
 
gpfs3nsd:::dataAndMetadata:4002::
 
gpfs3nsd:::dataAndMetadata:4002::
 
</nowiki></pre>
 
</nowiki></pre>
 
 
==== Create the File System ====
 
==== Create the File System ====
 
The mangled DescFile is now in an appropriate format for feeding into other commands, such as mmcrfs.  So now we can create the filesystem:
 
The mangled DescFile is now in an appropriate format for feeding into other commands, such as mmcrfs.  So now we can create the filesystem:
第289行: 第251行:
 
mmcrfs /gpfs1 /dev/gpfs1 -F DescFile -B 256K
 
mmcrfs /gpfs1 /dev/gpfs1 -F DescFile -B 256K
 
</nowiki></pre>
 
</nowiki></pre>
 
 
Here's the output:
 
Here's the output:
 
<pre><nowiki>
 
<pre><nowiki>
 +
# mmcrfs /gpfs1 /dev/gpfs1 -F DescFile -B 256K
  
 
The following disks of gpfs1 will be formatted on node box1.example.com:
 
The following disks of gpfs1 will be formatted on node box1.example.com:
第307行: 第269行:
 
   affected nodes.  This is an asynchronous process.
 
   affected nodes.  This is an asynchronous process.
 
</nowiki></pre>
 
</nowiki></pre>
 
 
==== Mount the File System ====
 
==== Mount the File System ====
 
<pre><nowiki>
 
<pre><nowiki>
 
mmmount /gpfs1 -a
 
mmmount /gpfs1 -a
 
</nowiki></pre>
 
</nowiki></pre>
 
 
Output:
 
Output:
 
<pre><nowiki>
 
<pre><nowiki>
 +
# mmmount /gpfs1 -a
 
Fri Apr 20 16:23:13 CDT 2007: mmmount: Mounting file systems ...
 
Fri Apr 20 16:23:13 CDT 2007: mmmount: Mounting file systems ...
 
</nowiki></pre>
 
</nowiki></pre>
 
 
=== Author ===
 
=== Author ===
 
* Brian Finley
 
* Brian Finley
  
 
[[category:UbuntuHelp]]
 
[[category:UbuntuHelp]]

2007年12月6日 (四) 10:54的最新版本

Introduction

GPFS stands for the Global Parallel File System. It is a commercial product from IBM, and is available for purchase for use on AIX and Linux platforms. Linux packages and official support are currently only available for Red Hat and SuSE. If you choose to install GPFS on Ubuntu, it is important for you to understand that your install will not supported by IBM. But it may still be useful. :-) GPFS provides for incredible scalability, good performance, and fault tolerance (Ie: machines can go down, and the filesystem is still accessible to others). For more information on GPFS, click here. We run Ubuntu as our standard Linux distribution, and so I set forth to find a way to make GPFS work on Ubuntu. These are the steps that I took, that hopefully will also allow you to produce a working GPFS cluster.

Recent Updates

  • After the initial success in getting this system running, we've run into difficulties under certain circumstances with GPFS hanging on certain nodes, requiring a reset of the node (not just a reboot). This is a kernel plus GPFS "portability layer" related issue. Resolution is pending, but we are also contemplating Lustre (http://lustre.org/) as an alternative. Our interest in Lustre is not because we won't be able to make GPFS work, but because the level of effort may be significantly less with Lustre, as is is open source, and the Lustre folks are more friendly towards Ubuntu and other non-Red Hat/SuSE distributions.

Hardware Overview

Three machines

  • box1.example.com
  • box2.example.com
  • box3.example.com

Each machine will have 2 fibre channel cards connecting it to the SAN. We have three volumes presented from the SAN to all three machines.

Software Install

OS is Ubuntu Dapper on amd64.

Dependencies

Satisfy package dependencies for building and running:

apt-get install libstdc++5 imake makedepend

Additionally, the GPFS binaries have paths to certain binaries hard coded. Bah! Create links so that the necessary binaries can be found:

test -e /usr/X11R6/bin || sudo ln -s /usr/bin      /usr/X11R6/bin
test -e /bin/sort      || sudo ln -s /usr/bin/sort /bin/sort
test -e /bin/awk       || sudo ln -s /usr/bin/awk  /bin/awk

Purchase

Purchase licenses for use of GPFS from IBM.

Download

Download "IBM General Parallel File System 3.1 English International(C89HWIE)" from the IBM Passport site. The name of the downloaded file is: c89hwie.tar. This file holds the same contents that you would find on the x86 and x86_64 CDs.

Extract

tar -xf c89hwie.tar
cd linux_cd/
sudo ./gpfs_install-3.1.0-0_x86_64

After accepting the license, you should now have a directory full of RPMs.

finley@box1:~/linux_cd% ls -1 /usr/lpp/mmfs/3.1/
gpfs.base-3.1.0-0.x86_64.rpm
gpfs.docs-3.1.0-0.noarch.rpm
gpfs.gpl-3.1.0-0.noarch.rpm
gpfs.msg.en_US-3.1.0-0.noarch.rpm
license/
status.dat

Convert RPMs to Debs

Let's turn 'em into debs, eh?

cd /tmp
cp /usr/lpp/mmfs/3.1/*.rpm .
fakeroot alien *.rpm
sudo cp *.deb /usr/lpp/mmfs/3.1/

Install Debs

Now we can install them.

sudo dpkg -i /usr/lpp/mmfs/3.1/*.deb

Build GPFS Kernel Modules

They call this the "Linux portability interface". It's an open source module that acts as a wrapper around the proprietary GPFS driver. Install build dependecies.

KERNEL_VER_FULL=`uname -r`
KERNEL_VER_SHORT=`uname -r | perl -pi -e 's/(\d+\.\d+\.\d+-\d+).*/$1/'`
sudo apt-get install --reinstall linux-headers-${KERNEL_VER_FULL} linux-headers-${KERNEL_VER_SHORT}
sudo apt-get build-dep linux-headers-${KERNEL_VER_FULL} linux-headers-${KERNEL_VER_SHORT}

Change the perms on their source tree so that you can build as a non-root user.

sudo chown -R finley /usr/lpp/mmfs/src/

Apply the "2.6.15.x kernel" patch:

cd /usr/lpp/mmfs/src/
wget http://download.systemimager.org/pub/gpfs/gpfs.with_linux-2.6.15.x.patch.bz2
bunzip2 gpfs.with_linux-2.6.15.x.patch.bz2
patch -p5 < gpfs.with_linux-2.6.15.x.patch

Edit the build config file.

cd /usr/lpp/mmfs/src/
cp config/site.mcr.proto config/site.mcr
vi config/site.mcr  # see /usr/lpp/mmfs/src/README for details

Do the build.

export SHARKCLONEROOT=/usr/lpp/mmfs/src
cd $SHARKCLONEROOT
make World

Install the modules and binaries.

sudo make InstallImages

Distribute the Install to other GPFS Clients

NOTE: In GPFS vernacular, all participating machines are clients, whether or not they are directly attached to disk that is part of the GPFS filesystem. NOTE: You may wish to implement "SSH for Root" below prior to doing this step for convenience.

for i in box2 box3
do
  echo $i
  dir=/usr/lpp/mmfs/
  rsync -av --delete-after $dir/ $i:$dir/
done

Modify your $PATH

To have the GPFS binaries appear in the $PATH, we chose to modify /etc/profile, which affects all users on the system (that are using Bourne based shells). Just add the following line to the end of /etc/profile.

PATH=$PATH:/usr/lpp/mmfs/bin

Configuring the Cluster

SSH for Root

Unfortunately, one of GPFS' shortcomings is a need for all cluster nodes to be able to ssh to all other cluster nodes a) as root, and b) without a password. There are multiple ways to accomplish this. We have chosen to use host based authentication.

/etc/hosts

First, all nodes need to know the addresses of all other nodes. GPFS seems to like the idea of a dedicated network for cluster communication, although this is not strictly necessary. Here we're using a dedicated private network, off a secondary NIC, for each cluster client. As this is a private network in our case, we don't keep this information in DNS. Make sure you have entries in /etc/hosts for each machine in the cluster.

/etc/ssh/sshd_config

Here are the relevant ssh server options:

PermitRootLogin          yes
IgnoreRhosts             no
HostbasedAuthentication  yes
/etc/ssh/ssh_config

Here are the relevant ssh client options:

HostbasedAuthentication   yes
PreferredAuthentications  hostbased,publickey,keyboard-interactive,password
EnableSSHKeysign          yes
/root/.shosts

For host based authentication of normal users, the changes to ssh_config and sshd_config are sufficient. However, for the root user, it is also necessary to include a ".shosts" file in the root user's home directory. It is recommended that this contain the IP addresses and base host names (as resolved by "getent hosts $ipaddress") for each GPFS client.

root@box1:~# cat /root/.shosts
# Fri Apr 20 15:14:17 CDT 2007
box1-160
10.221.160.41
box3-160
10.221.160.42
box2-160
10.221.160.43
/etc/shosts.equiv

This file allows normal users to take advantage of host based authentication without having to create their own .shosts files. It's contents are exactly the same as a .shosts file.

# Fri Apr 20 15:14:17 CDT 2007
box1-160
10.221.160.41
box3-160
10.221.160.42
box2-160
10.221.160.43
/etc/ssh/ssh_known_hosts

Having this file properly populated means that user's aren't prompted to accept a hosts key when connecting to it for the first time.

# Fri Apr 20 15:14:18 CDT 2007
box1-160 ssh-dss AAAAB3NzaC1kc3...
box1-160 ssh-rsa AAAAB3NzaC1yc2...
10.221.160.41 ssh-dss AAAAB3Nza...
10.221.160.41 ssh-rsa AAAAB3Nza...
box3-160 ssh-dss AAAAB3NzaC1kc3...
box3-160 ssh-rsa AAAAB3NzaC1yc2...
10.221.160.42 ssh-dss AAAAB3Nza...
10.221.160.42 ssh-rsa AAAAB3Nza...
box2-160 ssh-dss AAAAB3NzaC1kc3...
box2-160 ssh-rsa AAAAB3NzaC1yc2...
10.221.160.43 ssh-dss AAAAB3Nza...
10.221.160.43 ssh-rsa AAAAB3Nza...

iptables

If you use iptables on your machines, you will want to allow traffic from ssh, and from the GPFS daemon, on all of the cluster nodes to all of the cluster nodes. I don't know the exact port ranges the GPFS daemon uses off hand, but I'm sure one could look that up if one were so motivated. For me, I will simply allow all traffic from all nodes to all nodes for now with a rule such as this for each cluster node:

# GPFS
#-A INPUT-TABLE -m state --state NEW -m tcp -p tcp --dport 1191 -j ACCEPT
-A INPUT-TABLE -m state --state NEW -m tcp -p tcp --source 10.221.160.0/25 -j ACCEPT

Create a NodeFile

The file name is actually "NodeFile". Here are the contents:

box1-160:quorum-manager
box2-160:quorum-manager
box3-160:quorum

Create the Cluster

mmcrcluster -N NodeFile -p box1-160 -s box2-160 -r `which ssh` -R `which scp` -C gpfs-cluster.example.com

Start the GPFS Cluster

The cluster needs to be operational prior to creating a file system. So let's tell all the nodes to start participating in the cluster:

mmstartup -a

Verify that they were able to do so:

# mmgetstate -aLv

 Node number  Node name       Quorum  Nodes up  Total nodes  GPFS state  Remarks
------------------------------------------------------------------------------------
       1      box1-160       2        3          3       active      quorum node
       2      box3-160       2        3          3       active      quorum node
       3      box2-160       2        3          3       active      quorum node

Create a DescFile

A DescFile contains information (Description) about the physical discs in the cluster. Here are the contents of my DescFile:

# DiskName:PrimaryServer:BackupServer:DiskUsage:FailureGroup:DesiredName:StoragePool
/dev/sdm1:box1-160:box2-160
/dev/sdn1:box2-160:box3-160
/dev/sdo1:box3-160:box1-160

Prepare the Physical Disks as NSDs

NSD stands for Network Shared Disk.

cp DescFile DescFile.orig
mmcrnsd -F DescFile

NOTE: If mmcrnsd refuses to operate on your disks or partitions, because they were previously in use, and you know that they are currently NOT in use, then you can add the "-v no" option to the end of the mmcrnsd command above. After creating the NSDs, you can list them:

root@box1:# mmlsnsd

 File system   Disk name    Primary node             Backup node
---------------------------------------------------------------------------
 (free disk)   gpfs1nsd     box1-160             box2-160
 (free disk)   gpfs2nsd     box2-160             box3-160
 (free disk)   gpfs3nsd     box3-160             box1-160

NOTE: The mmcrnsd command mangles the DescFile, which is why we create a copy of it above. The resultant file looks like this:

# DiskName:PrimaryServer:BackupServer:DiskUsage:FailureGroup:DesiredName:StoragePool
# /dev/sdm1:box1-160:box2-160
gpfs1nsd:::dataAndMetadata:4001::
# /dev/sdn1:box2-160:box3-160
gpfs2nsd:::dataAndMetadata:4003::
# /dev/sdo1:box3-160:box1-160
gpfs3nsd:::dataAndMetadata:4002::

Create the File System

The mangled DescFile is now in an appropriate format for feeding into other commands, such as mmcrfs. So now we can create the filesystem:

mmcrfs /gpfs1 /dev/gpfs1 -F DescFile -B 256K

Here's the output:

# mmcrfs /gpfs1 /dev/gpfs1 -F DescFile -B 256K

The following disks of gpfs1 will be formatted on node box1.example.com:
    gpfs1nsd: size 488281250 KB
    gpfs2nsd: size 488281250 KB
    gpfs3nsd: size 488281250 KB
Formatting file system ...
Disks up to size 2.2 TB can be added to storage pool 'system'.
Creating Inode File
Creating Allocation Maps
Clearing Inode Allocation Map
Clearing Block Allocation Map
Completed creation of file system /dev/gpfs1.
mmcrfs: Propagating the cluster configuration data to all
  affected nodes.  This is an asynchronous process.

Mount the File System

mmmount /gpfs1 -a

Output:

# mmmount /gpfs1 -a
Fri Apr 20 16:23:13 CDT 2007: mmmount: Mounting file systems ...

Author

  • Brian Finley