The Health of This Role's Host Is Bad. the Following Health Tests Are Bad: Clock Offset.

最近突然遇到大批量腾讯云服务器click offset报警错误。例:

1
alert summary: ['The health of role NodeManager (hadoop0xx) has become bad.'] ,content: The health test result for NODE_MANAGER_HOST_HEALTH has become bad: The health of this role's host is bad. The following health tests are bad: clock offset.

其中前14台hadoop服务器没事儿,hadoop015-20台服务器频繁报警。
谷歌了一下,定位到时钟同步的问题。为了快速解决问题,直接hadoop015上的/etc/ntp.conf备份,又将hadoop001上的/etc/ntp.conf复制到hadoop015上。
观察了一会,发现服务器没有什么反应。然后查看ntp服务的状态

1
service ntpd status

upload successful

应该是设置没有生效

1
2
service ntpd stop
service ntpd start

过1分钟左右,刷新clouder manager的页面,发现hadoop015 agent服务状态已经正常。

同样的方法,将另外几台服务器的ntp服务配置也都改了一下。

过后检查了hadoop001上的/etc/ntp.xml文件,时间同步server配置的是 ntp.sjtu.edu.cn 也就是上海交通大学的,看来比腾讯自己的更靠谱啊。

下边两篇是非常有用的博文,请参考
https://blog.csdn.net/freedomboy319/article/details/46710255
https://segmentfault.com/a/1190000015682109

Spark解析将url中的参数解析成json DataFrame

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
import org.apache.spark.sql.types.StructType
import org.apache.spark.sql.functions.to_json
import org.apache.spark.sql.functions._

val schema = new StructType().add("sns", org.apache.spark.sql.types.StringType, true).add("tit", org.apache.spark.sql.types.StringType, true).add("e_t", org.apache.spark.sql.types.StringType, true).add("product", org.apache.spark.sql.types.StringType, true)


var dateFormat:java.text.SimpleDateFormat = new java.text.SimpleDateFormat("yyyyMMddHHmmss")
var cal:java.util.Calendar=java.util.Calendar.getInstance()
cal.add(java.util.Calendar.HOUR,-1)
val yesterday=dateFormat.format(cal.getTime())
val month=yesterday.substring(0,6)
val day=yesterday.substring(6,8)
val hour=yesterday.substring(0,10)
val path="/data/nginx/origin/q_gif/"+month+"/"+day+"/"+hour+"*"
val df=spark.read.textFile(path)
//(arr(0),uri)
val dfArr = df.flatMap{ line =>
val arr = line.split("\\|")
if(arr != null && arr.length >= 3 && arr(3).length >7 ){
val argArr = arr(3).substring(7).split("&")
val result = argArr.flatMap{ argLine =>
val pair = argLine.split("=")
if(pair.length == 2){
val p = (pair(0),pair(1))
Some(p)
}else{
None
}
}
Some(result.toMap)
}else{T
None
}
}


val jsonStringDf = dfArr.withColumn("mapfield", to_json($"value")).select("mapfield")
val dfJSON = jsonStringDf.withColumn("jsonData",from_json(col("mapfield"),schema)).select("jsonData.*")
dfJSON.repartition(10).write.mode("overwrite").format("parquet").save("/tmp/data/testgzip")

[Emerg] Duplicate Listen Options for 0.0.0.0:80

Just hit this same issue, but the duplicate default_server directive was not the only cause of this message.

You can only use the backlog parameter on one of the server_name directives.

Example
site 1:

1
2
3
4
5
6
server {
listen 80 default_server backlog=2048;
server_name www.example.com;
location / {
proxy_pass http://www_server;
}

site 2:

1
2
3
4
5
6
server {
listen 80; ## NOT NOT DUPLICATE THESE SETTINGS 'default_server backlog=2048;'
server_name blogs.example.com;
location / {
proxy_pass http://blog_server;
}

backlog 只能在其中一个server上设置。
参考 https://stackoverflow.com/questions/13676809/serving-two-sites-from-one-server-with-nginx

Linux批量修改文件名

Linux自带的有rename命令。具体可以执行谷歌其用法,不过这个命令使用起来比较局限,还有更强大的批量修改文件名的rename的命令,不过这个是perl版本的,这样就和Linux系统自带的命令冲突了。

不过,不要担心,Follow me。

1 安装perl版本的rename

1
yum -y install prename

2 使用prename批量修改文件名

1
prename 's/log/log.bak/' *

将所有文件中log文字修改成log.bak

Linux上安装mysql

查看CentOS自带mysql是否已安装。

1
输入:yum list installed | grep mysql

卸载已经安装的mysql

1
yum -y remove mysql-libs.x86_64

查看yum库上的mysql版本信息(CentOS系统需要正常连接网络)。

1
输入:yum list | grep mysql 或 yum -y list mysql*

安装

1
2
yum install mariadb-server mariadb
启动:systemctl start mariadb

修改root密码

1
2
在一个窗口执行 sudo mysqld_safe --skip-grant-tables --skip-networking &
在另一个窗口通过 mysql -u root 登录mysql

通过如下方式修改密码

1
2
3
4
5
6
update mysql.user set PASSWORD=PASSWORD('123456') where user='root' and host='localhost';
update mysql.user set authentication_string=PASSWORD('123456') where user='root' and host='localhost';
flush privileges;
exit;

ALTER USER 'root'@'localhost' IDENTIFIED BY '123456';

查看mysql的版本

1
select version();

1
create database test character set utf8;

创建用户

1
2
3
4
create user test@localhost identified by '123456';
grant all privileges on *.* to test@localhost identified by '123456';
grant all privileges on *.* to test@'%' identified by '123456';
flush privileges;

检查

1
2
[root@snails ~]# netstat -ano |grep 3306
tcp 0 0 0.0.0.0:3306 0.0.0.0:* LISTEN off (0.00/0/0)

LogServer服务效率的优化

1 TCP: out of memory – consider tuning tcp_mem

1
2018/05/08 08:56:29 [error] 27#27: *225448 readv() failed (104: Connection reset by peer) while reading upstream, client: 47.xx.xx.43, server: cdn.xxx.com, request: "GET /home/css/img/area-214f1779.png HTTP/1.1", upstream: "http://xx.xx.xx.xx:80/home/css/img/area-214f1779.png", host: "cdn.xxx.com", referrer: "http://cdn.xxx.com/home/css/index-1.0.9.css"

可能是TCP连接的内存不够引起的。
比较详细的查看方式

2 time_wait
https://blog.51cto.com/hld1992/2285410

Zabbix的安装部署

zabbix安装

按照https://www.zabbix.com/download 文中介绍,进行安装

注意

  1. yum -y install zabbix-server-mysql zabbix-web-mysql zabbix-nginx-conf zabbix-agent 进行安装的时候,可能会失败,多试几次,每次文件都会多下载一点,最终能下载完。
  2. /etc/nginx/conf.d/zabbix.conf 中listen的端口,尽量不要是80,因为很多程序的默认端口都是80,所以为了避免端口冲突,最好改成其他的端口
  3. server_name 可以先不用设置,保持注释的状态
  4. /etc/php-fpm.d/zabbix.conf, uncomment and set the right timezone for you. 一般用Asia/Shanghai
  5. 默认用户名和密码是Admin zabbix

添加钉钉报警

管理 -> 报警媒介类型 -> 创建媒体类型

参考 https://www.cnblogs.com/yinzhengjie/p/10372566.html
参考 https://blog.51cto.com/m51cto/2051945
参考 https://segmentfault.com/q/1010000003894661

Kafka为topic增加副本

1. 查看topic的原来的副本分布

[hadoop@hadoop006 ~]$ kafka-topics –zookeeper hadoop002:2181 –describe –topic tracker_view

2. 增加Topic的副本的json文件的编写

vim addReplicasToTracker_view.json

Read More