Centos 7, Torque 单节点部署
1.准备工作
安装Torque必须首先配置linux主机名称,服务器主机名称大多默认localhost,不建议直接使用localhost。
linux主机名称修改地址:http://www.cnblogs.com/smbin/p/8488909.html
linux系统:Centos 7
主机名称:master
系统用户:root
Torque官网下载地址:http://www.adaptivecomputing.com/support/download-center/torque-download/
作者下载的版本:http://wpfilebase.s3.amazonaws.com/torque/torque-6.1.2.tar.gz
2.安装和配置Torque
首先在/opt下创建文件夹torque,在此文件夹中下载压缩包,并解压下载并解压Torque文件
[root@mastar ]# cd /opt
[root@mastar ]# mkdir torque
[root@mastar ]# cd torque
[root@mastar torque]# wget http://wpfilebase.s3.amazonaws.com/torque/torque-6.1.2.tar.gz
......省略下载过程
[root@mastar torque]# tar -zxvf torque-6.1.2.tar.gz
......省略解压过程
[root@mastar torque]#cd torque-6.1.2/
[root@mastar torque-6.1.2]#
加载、安装和master配置。master配置就是主机和PBS之间的配置,master就是主机名
[root@master torque-6.1.2]# yum install libxml2-devel openssl-devel gcc gcc-c++ boost-devel libtool-y
Loaded plugins: fastestmirror, langpacks
base | 3.6 kB 00:00:00
extras | 3.4 kB 00:00:00
mysql-connectors-community | 2.5 kB 00:00:00
mysql-tools-community | 2.5 kB 00:00:00
mysql56-community | 2.5 kB 00:00:00
updates | 3.4 kB 00:00:00
Determining fastest mirrors
* base: mirrors.cn99.com
* extras: mirrors.tuna.tsinghua.edu.cn
* updates: mirrors.tuna.tsinghua.edu.cn
Package libxml2-devel-2.9.1-6.el7_2.3.x86_64 already installed and latest version
Package 1:openssl-devel-1.0.2k-8.el7.x86_64 already installed and latest version
Package gcc-4.8.5-16.el7_4.1.x86_64 already installed and latest version
Package gcc-c++-4.8.5-16.el7_4.1.x86_64 already installed and latest version
Package boost-devel-1.53.0-27.el7.x86_64 already installed and latest version
No package libtool-y available.
Nothing to do
[root@master torque-6.1.2]# ./configure --prefix=/usr/local/torque --with-scp--with-default-server=master
......省略加载过程
Building components: server=yes mom=yes clients=yes
gui=no drmaa=no pam=no
PBS Machine type : linux
Remote copy : /bin/scp -rpB
PBS home : /var/spool/torque
Default server : master Unix Domain sockets :
Linux cpusets : no
Tcl : disabled
Tk : disabled
Authentication : trqauthd configure: WARNING: This compilation has strict compiler options enabled that cause
the build to fail if any compiler warnings are emitted. If this build fails
because of a harmless warning, please report the problem to torqueusers@supercluster.org
and run configure again without --enable-gcc-warnings. Ready for 'make'.
[root@master torque-6.1.2]# make
......省略加载过程
[root@master torque-6.1.2]# make install
......省略加载过程
[root@master torque-6.1.2]# make packages
[root@master torque-6.1.2]# make packages
Building packages from /opt/torque/torque-6.1.2/tpackages
rm -rf /opt/torque/torque-6.1.2/tpackages
mkdir /opt/torque/torque-6.1.2/tpackages
Building ./torque-package-server-linux-x86_64.sh ...
libtool: install: warning: remember to run `libtool --finish /usr/local/torque/lib' //需要去执行命令:libtool --finish /usr/local/torque/lib
Building ./torque-package-mom-linux-x86_64.sh ...
libtool: install: warning: remember to run `libtool --finish /usr/local/torque/lib'
Building ./torque-package-clients-linux-x86_64.sh ...
libtool: install: warning: remember to run `libtool --finish /usr/local/torque/lib'
Building ./torque-package-devel-linux-x86_64.sh ...
libtool: install: warning: remember to run `libtool --finish /usr/local/torque/lib'
Building ./torque-package-doc-linux-x86_64.sh ...
Done.
The package files are self-extracting packages that can be copied
and executed on your production machines. Use --help for options.
[root@master torque-6.1.2]# libtool --finish /usr/local/torque/lib
libtool: finish: PATH="/usr/lib/jvm/java-1.7.0-openjdk/bin:/usr/local/sbin:/usr/local/bin:/sbin:/bin:/usr/sbin:/usr/bin:/root/bin:/usr/local/torque/bin:/usr/local/torque/sbin:/root/bin:/sbin" ldconfig -n /usr/l ocal/torque/lib
----------------------------------------------------------------------
Libraries have been installed in:
/usr/local/torque/lib
If you ever happen to want to link against installed libraries
in a given directory, LIBDIR, you must either use libtool, and
specify the full pathname of the library, or use the `-LLIBDIR'
flag during linking and do at least one of the following:
- add LIBDIR to the `LD_LIBRARY_PATH' environment variable
during execution
- add LIBDIR to the `LD_RUN_PATH' environment variable
during linking
- use the `-Wl,-rpath -Wl,LIBDIR' linker flag
- have your system administrator add LIBDIR to `/etc/ld.so.conf'
See any operating system documentation about shared libraries for
more information, such as the ld(1) and ld.so(8) manual pages.
配置服务:pbs_server PBS,pbs_sched,pbs_mom,trqauthd
[root@master torque-6.1.2]# cp contrib/init.d/{pbs_{server,sched,mom},trqauthd} /etc/init.d/
[root@master torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do chkconfig --add $i; chkconfig $i on; done //遇见y/n选择y回车继续
设置Torque环境变量
[root@master torque-6.1.2]# TORQUE=/usr/local/torque
[root@master torque-6.1.2]# echo "TORQUE=$TORQUE" >> /etc/profile
[root@master torque-6.1.2]# echo "export PATH=\$PATH:$TORQUE/bin:$TORQUE/sbin" >> /etc/profile
[root@master torque-6.1.2]# source /etc/profile
以root用户启动,报错服务指向的主机名和现有主机名不一致,安装过程中暂时没有找到解决方案!安装完毕后有解决方案,在本文最下方!!!
[root@master torque-6.1.2]# ./torque.setup root //尝试以root启动,报错:服务“pbs_server”已经启动
initializing TORQUE (admin: root)
pbs_server already running... run 'qterm' to stop pbs_server and rerun //运行sterm关闭服务
[root@master torque-6.1.2]# qterm //发现服务指向的主机名称和正常显示的主机名称不一致,命令qterm无法关闭
Can not resolve name for server mastar. (rc = -2 - )
Cannot resolve specified server host 'mastar'.
qterm: could not connect to server '' (15010) Access from host not allowed, or unknown host
[root@master mom_priv]# ps -e | grep pbs //查询服务,尝试以kill -9命令关闭服务
30505 ? 00:00:00 pbs_server
[root@master mom_priv]# kill -9 30505
[root@master mom_priv]# ps -e | grep pbs
[root@master torque-6.1.2]# ./torque.setup root //发现服务关闭后仍无法启动,服务指向的主机名和现有主机名不一致!经确认上边配置的时候没有配置错误:
//‘./configure --prefix=/usr/local/torque --with-scp--with-default-server=master’ configure没有错误,未找到解决方案,怀疑是系统缓存的问题。
initializing TORQUE (admin: root) //暂时只能修改/etc/hosts文件的内容
You have selected to start pbs_server in create mode.
If the server database exists it will be overwritten.
do you wish to continue y/(n)?y
Can not resolve name for server mastar. (rc = -2 - )
Cannot resolve specified server host 'mastar'.
qmgr: cannot connect to server (errno=15010) Access from host not allowed, or unknown host
ERROR: cannot set root@master in operators list
Can not resolve name for server mastar. (rc = -2 - )
Cannot resolve specified server host 'mastar'.
qterm: could not connect to server '' (15010) Access from host not allowed, or unknown host
[root@master torque-6.1.2]# vi /etc/hosts //修改/etc/hosts文件 10.131.101.142 master
10.131.101.142 mastar //添加这一行的内容
27.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
[root@master torque-6.1.2]# ./torque.setup root //此时执行成功
initializing TORQUE (admin: root)
You have selected to start pbs_server in create mode.
If the server database exists it will be overwritten.
do you wish to continue y/(n)?y //输入y
开始pbs_server,pbs_sched服务,pbs_mom和trqauthd
[root@master torque-6.1.2]# qterm //关闭服务
[root@master torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i start; done
Starting pbs_server (via systemctl): [ OK ]
Starting pbs_sched (via systemctl): [ OK ]
Starting pbs_mom (via systemctl): [ OK ]
Starting trqauthd (via systemctl): [ OK ]
指定计算节点
添加计算节点”master”,设置CPU的数量
检查CPU的数量通过使用命令“lscpu”或“nproc”
[root@master torque-6.1.2]# vi /var/spool/torque/server_priv/nodes
master np=8 //添加本行信息,注意等号前后不要有空格 master是主机名
[root@master torque-6.1.2]# vi /var/spool/torque/mom_priv/config
pbsserver master //添加这两行信息 master是主机名
logevent 255
检查PBS的信息
[root@master torque-6.1.2]# ps -e | grep pbs
11188 ? 00:00:00 pbs_sched
11215 ? 00:00:00 pbs_mom
29683 ? 00:00:00 pbs_server
[root@master torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i restart; done
Restarting pbs_server (via systemctl): [ OK ]
Restarting pbs_sched (via systemctl): [ OK ]
Restarting pbs_mom (via systemctl): [ OK ]
Restarting trqauthd (via systemctl): [ OK ]
创建队列的默认信息
[root@master torque-6.1.2]# qmgr -c 'create queue master'
[root@master torque-6.1.2]# qmgr -c 'set queue master queue_type= execution'
[root@master torque-6.1.2]# qmgr -c 'set queue master started= true'
[root@master torque-6.1.2]# qmgr -c 'set queue master enabled= true'
[root@master torque-6.1.2]# qmgr -c 'set queue master resources_default.walltime= 240:00:00'
[root@master torque-6.1.2]# qmgr -c 'set queue master resources_default.nodes= 1'
[root@master torque-6.1.2]# qmgr -c 'set server default_queue= master'
提交任务测试:
[root@master torque-6.1.2]# qnodes //查询计算节点的状态
master
state = free
power_state = Running
np = 8
ntype = cluster
status = opsys=linux,uname=Linux master 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64,sessions=3154 3489 41105 41699,nsessions=4,nusers=3,idletime=3198,
totmem=94868512kb,availmem=92195284kb,physmem=32367652kb,ncpus=56,loadave=0.85,gres=,netload=4005925534,state=free,varattr= ,cpuclock=Fixed,macaddr=68:cc:6e:c3:cf:87,version=6.1.2,rectime=1519980694,jobs=
mom_service_port = 15002
mom_manager_port = 15003 [root@master torque-6.1.2]# su master //切换用户:此master不是主机名,而是一个用户的名字
[master@master torque-6.1.2]$ echo sleep 10 | qsub
0.master
[master@master torque-6.1.2]$ qstat //查询任务状态
Job ID Name User Time Use S Queue
------------------------- ---------------- --------------- -------- - -----
0.master STDIN master 0 R master
[master@master torque-6.1.2]$ qstat -a -n //查询任务状态和每个任务占用cpu核数 master:
Req'd Req'd Elap
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------------------- ----------- -------- ---------------- ------ ----- ------ --------- --------- - ---------
0.master master master STDIN 12470 1 1 -- 240:00:00 C --
master/0
[master@master torque-6.1.2]$
主机名和现有主机名不一致的问题解决方案:
这个问题一直没有找到出现的原因,但是怀疑是之前的Torque删除时没有删除干净,在“创建队列的默认信息”这一步的缓存依然存在。
在Torque安装成功后,停止Torque
[root@master torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i stop; done //停止服务T,start改为stop
Stopping pbs_server (via systemctl): [ OK ]
Stopping pbs_sched (via systemctl): [ OK ]
Stopping pbs_mom (via systemctl): [ OK ]
Stopping trqauthd (via systemctl): [ OK ]
[root@master torque-6.1.2]# ./torque.setup root //重新运行这一步
hostname: master
Currently no servers active. Default server will be listed as active server. Error 15133
Active server name: master pbs_server port is: 15001
trqauthd daemonized - port /tmp/trqauthd-unix
trqauthd successfully started
initializing TORQUE (admin: root) You have selected to start pbs_server in create mode.
If the server database exists it will be overwritten.
do you wish to continue y/(n)?y //输入y
[root@master torque-6.1.2]# vi /var/spool/torque/server_priv/nodes
master np=8 //=前后不要带空格
[root@master torque-6.1.2]# qterm //关闭pbs_server、 pbs_sched、 pbs_mom、 trqauthd服务
[root@master torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i start; done //重启服务
Starting pbs_server (via systemctl): [ OK ]
Starting pbs_sched (via systemctl): [ OK ]
Starting pbs_mom (via systemctl): [ OK ]
Starting trqauthd (via systemctl): [ OK ]
[root@master torque-6.1.2]# qnodes //查询状态,报错服务trqauthd没有启动
socket_connect_unix failed: 15137
qnodes: cannot connect to server master, error=15137 (could not connect to trqauthd)
[root@master torque-6.1.2]# for i in pbs_server pbs_sched pbs_mom trqauthd; do service $i restart; done //重新启动服务
Restarting pbs_server (via systemctl): [ OK ]
Restarting pbs_sched (via systemctl): [ OK ]
Restarting pbs_mom (via systemctl): [ OK ]
Restarting trqauthd (via systemctl): [ OK ]
[root@master torque-6.1.2]# qnodes //查询状态,成功
master
state = free
power_state = Running
np = 8
ntype = cluster
status = opsys=linux,uname=Linux master 3.10.0-514.el7.x86_64 #1 SMP Tue Nov 22 16:42:41 UTC 2016 x86_64,sessions=3154 3489 10903 41105 41699,nsessions=5,nusers=4,idletime=5287,totmem=94868512kb,
availmem=92236268kb,physmem=32367652kb,ncpus=56,loadave=0.01,gres=,netload=8920006882,state=free,varattr= ,cpuclock=Fixed,macaddr=68:cc:6e:c3:cf:87,version=6.1.2,rectime=1519982783,jobs=
mom_service_port = 15002
mom_manager_port = 15003
Centos 7, Torque 单节点部署的更多相关文章
- Kubernetes 二进制部署(一)单节点部署(Master 与 Node 同一机器)
0. 前言 最近受“新冠肺炎”疫情影响,在家等着,入职暂时延后,在家里办公和学习 尝试通过源码编译二进制的方式在单一节点(Master 与 Node 部署在同一个机器上)上部署一个 k8s 环境,整理 ...
- Ubuntu下用devstack单节点部署Openstack
一.实验环境 本实验是在Vmware Workstation下创建的单台Ubuntu服务器版系统中,利用devstack部署的Openstack Pike版. 宿主机:win10 1803 8G内存 ...
- HyperLedger Fabric 1.4 单机单节点部署(10.2)
单机单节点指在一台电脑上部署一个排序(Orderer)服务.一个组织(Org1),一个节点(Peer,属于Org1),然后运行官方案例中的example02智能合约例子,实现转财交易和查询功能.单机单 ...
- .netcore consul实现服务注册与发现-单节点部署
原文:.netcore consul实现服务注册与发现-单节点部署 一.Consul的基础介绍 Consul是HashiCorp公司推出的开源工具,用于实现分布式系统的服务发现与配置.与其他分 ...
- 恒天云单节点部署指南--OpenStack H版本虚拟机单节点部署解决方案
本帖是openstack单节点在虚拟机上部署的实践.想要玩玩和学习openstack的小伙伴都看过来,尤其是那些部署openstack失败的小伙伴.本帖可以让你先领略一下openstack的魅力.本I ...
- Presto0.157版本单节点部署教程
因为Presto版本的更新速度较快,所以最好按照对应版本的教程进行部署,博主之前看错了版本号,拿0.100版本的教程来部署0.157版本,结果导致部署失败. 官网:https://prestodb.i ...
- MongoDB 3.2复制集单节点部署(四)
MongoDB在单节点中也可以做复制集,但是仅限于测试实验,最大的好处就是部署方便快速,可以随便添加新节点,节省资源.在这里我使用的是MongoDB 3.2版本进行复制集实验(但MongoDB配置文件 ...
- MongoDB 2.6复制集单节点部署(三)
MongoDB在单节点中也可以做复制集,但是仅限于测试实验,最大的好处就是部署方便快速,可以随便添加新节点,节省资源.在这里我使用的是MongoDB 2.6版本进行复制集实验(但MongoDB配置文件 ...
- 单节点部署Hadoop教程
搭建HDFS 增加主机名 我这里仅仅增加了master主机名 [root@10 /xinghl/hadoop/bin]$ cat /etc/hosts 127.0.0.1 localhost 10.0 ...
随机推荐
- Python3.4 远程操控电脑(开关机)
import poplib import sys import smtplib from email.mime.text import MIMEText import os from email.he ...
- ubuntu下7z文件的解压方法
apt-get install p7zip-full 控制台会打出以下信息: 正在读取软件包列表... 完成正在分析软件包的依赖关系树 正在读取状态信息... 完成 建议安装的 ...
- 【Git学习笔记】用git pull取回远程仓库某个分支的更新,再与本地的指定分支自动merge【转】
本文转载自:http://blog.csdn.net/liuchunming033/article/details/45367629 git pull的作用是,从远程库中获取某个分支的更新,再与本地指 ...
- XHprof 使用 (转)
原文地址:http://blog.csdn.net/maitiandaozi/article/details/8896293 XHProf是facebook开源出来的一个php轻量级的性能分析工具,跟 ...
- PCB 工程系统 模拟windows域帐号登入
一.需求描述: 对于PCB制造企业来说,基本都采用建立共享目享+域名管控权限,好像别的大多数行业都是这样的吧.呵呵 在实际应用中,经常会有这样的问题,自己登入的帐号没有共享目录的权限,但又想通过程序实 ...
- bzoj 1468 Tree(点分治模板)
1468: Tree Time Limit: 10 Sec Memory Limit: 64 MBSubmit: 1527 Solved: 818[Submit][Status][Discuss] ...
- 正确的缩写document。querySelector
北京的夕阳,伴随淡淡的霾殇.从写字楼望去,光线是那么昏黄.没有孤雁,也没有霞光,遥想当年,还是 jQuery 独霸一方.那时的我们,写程序都习惯了使用 $,至少在对美元符号的喜爱上,与 PHP 达成了 ...
- python 11:range(起始索引,终止索引,步数)(默认情况下步数为1,生成从起始索引,每次增加(终止索引-起始索引)/步数,到终止索引前的数字串)
squares = [] for value in range(1,11): #第三参数默认为1,生成从1开始,每次增加1步数,到11前的10为止的数字串 square = value ** 2 sq ...
- strcpy自实现
为了避免strcpy源串覆盖问题(P220),自实现strcpy. #include <stdio.h> #include <string.h> #include <as ...
- C# html生成PDF遇到的问题,从iTextSharp到wkhtmltopdf
我们的网站业务会生成一个报告,用网页展示出来,要有生成pdf并下载的功能,关键是生成pdf. 用内容一段段去拼pdf,想想就很崩溃,所以就去网上找直接把html生成pdf的方法. 网上资料大部分都是用 ...