基于海明距离的加权平均值人职匹配模型(Sqlserver2014/16内存表实现)
最近给某大学网站制作一个功能,需要给全校所有的学生提供就业单位发布职位的自动匹配,学生登陆就业网,就可以查看适合自己的职位,进而可以在线投递。
全校有几万名学生,注册企业发布的职位也有上万,如何在很短的时间内(不影响学生访问网站),通过建立好的匹配模型迅速的对学生——职位进行匹配?
- 建模篇
我以前给银行开发过房地产自动估价软件,按照标准做法是用欧几里得贴近算法或者海明贴近度,但是那种算法太复杂,属于应用数学的范畴,需要依靠精准的建模。我们就业办的老师是在实战上打拼的,没有高深的理论基础,所以建模必须简单。根据调研,发现使用海明距离的结合加权的算法比较简单,也容易转化成匹配度的百分比。具体算法出自2010年3月《计算机工程》中的《模糊匹配中的匹配度计算方法》:
这种算法的基本思路是每个项目都有一定的权重分值,然后按照学生和职位每个项目的匹配度,乘以权重分值再除以总的权重,就是学生对于每个职位的匹配情况。
学生在就业网上设置过职位搜索器的,按照职位搜索器中设置的项目进行匹配,有这么些项目:
比如“薪资”一项的匹配方法如下:
如果学生没有设置过搜索器的,则使用另一套维度,其中使用了就业数据的大数据分析方法,介绍略。
- 实现篇
网站基于.net C# + Sqlserver,每次要对所有的有效职位进行匹配,如此大的计算量使用传统方法肯定会慢,甚至造成性能瓶颈。因此思考采用两种方式提高计算效率:
- 使用MongoDB。
- 使用Sqlserver2014或者SqlServer2016的内存优化表。
MongoDB是典型的NoSQL数据库,交换数据是json格式,这种数据库存取的速度非常快,没有Sqlserver那些复杂的权限、并发、锁、存储引擎,因此很适合作为高吞吐量的数据存储方式;
微软在Sqlserver2014和Sqlserver2016中开发了内存优化表和本地编译存储过程,两者也有很好的性能表现
(顺便吐槽一下网上有人说Oracle一句命令就可以把表升级为内存表,一句命令就可以把存储过程升级成本地编译存储过程,而Sqlserver这方面限制太多,内存表不能建索引、不能建Check....(2016版可以),而本地编译存储过程的限制更是多得多,不能用function,不能用游标,不能用链接数据库........。我想这是两种数据库不同的实现机制形成的,在《SQL编程风格》151页中描述:“T-SQL是一个简单的一遍扫描的编译器,以C和Algol语言模型创建......Oracle中的PL/SQL是以ADA和SQL/PSM为模型创建的,它是一种复杂语言,可以用来开发应用程序。”所以Oralce的存储过程要升级简直易如反掌。)
我们还是先选择SqlServer的内存表作为数据缓冲池,本来想使用本地编译存储过程实现模型匹配算法,但是限制实在太多,所以只好使用普通的存储过程。每天有批处理把职位数据同步到内存表里,然后学生登录后进行计算,每周还进行职位的推送。
应大家要求公布算法代码:(因为和学校签订过保密协议,所以删除部分行,请谅解)
基本的建表、初始化数据脚本:
--建表
CREATE DATABASE [DataAnalysis]
CONTAINMENT = NONE
ON PRIMARY
( NAME = N'DataAnalysis', FILENAME = N'd:\DATA\DataAnalysis.mdf' , SIZE = 5120KB , MAXSIZE = UNLIMITED, FILEGROWTH = 1024KB ),
FILEGROUP [DataAnalysisFileGroup] CONTAINS MEMORY_OPTIMIZED_DATA DEFAULT
( NAME = N'DataAnalysisContainer', FILENAME = N'd:\DATA\HashCollisionsContainer' , MAXSIZE = UNLIMITED)
LOG ON
( NAME = N'DataAnalysis_log', FILENAME = N'd:\DATA\DataAnalysis_log.ldf' , SIZE = 2304KB , MAXSIZE = 2048GB , FILEGROWTH = 10%)
GO ALTER DATABASE [DataAnalysis] SET COMPATIBILITY_LEVEL = 120
GO Use [DataAnalysis]
GO if exists(select * from sysobjects where id=object_id('EnterPrisePositions'))
DROP TABLE EnterPrisePositions
GO CREATE TABLE EnterPrisePositions
(
[ID] [int] IDENTITY(1,1) NOT NULL Primary Key NONCLUSTERED HASH WITH (BUCKET_COUNT = 4096),
[EntID] uniqueidentifier NOT NULL,
[EntUserID] uniqueidentifier NOT NULL,
[EntName] [nvarchar](80) NOT NULL,
[PosiID] [uniqueidentifier] NOT NULL,
[PosiName] [nvarchar](40) NULL,
[JobTypeID] [uniqueidentifier] NULL,
--.......
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY)
GO INSERT INTO [EnterPrisePositions] ([EntID], [EntUserID], [EntName], [OrgCode]..........)
SELECT e.[EntID], p.[EntUserID], e.[EntName], e.[OrgCode], e.[HyID], e.[SubHyID], e.[ThirdHyID],..........
FROM .....[dbo].[Position] p INNER JOIN ......[dbo].[Enterprise] e ON p.[EntUserID] = e.[EntUserID]
WHERE p.DelFlag <> 1 AND p.EffectiveDate <= GetDate() AND GetDate() <= p.ExpiryDate AND e.[CheckFlag] = 1 AND e.[DelFlag] = 0 AND e.[IsBlack] = 0
GO if exists(select * from sysobjects where id=object_id('UserPositionResult'))
DROP TABLE UserPositionResult
GO CREATE TABLE UserPositionResult
(
[ID] [int] IDENTITY(1,1) NOT NULL Primary Key NONCLUSTERED HASH WITH (BUCKET_COUNT = 4096),
[Score] FLOAT,
[PosiID] [uniqueidentifier] NOT NULL,
[PosiName] [nvarchar](40) NULL,
--............
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY)
GO --辅助函数
-- =============================================
-- Author: Ben
-- Create date: 2016-08-08
-- Description: 处理专业ID用
-- =============================================
CREATE FUNCTION [GetPositionSpecialty] (
@String NVARCHAR(150)
) RETURNS @temptable TABLE (ID INT IDENTITY(1,1), Specialty NVARCHAR(8)) AS
BEGIN DECLARE @idx INT=1
DECLARE @slice NVARCHAR(150)
DECLARE @Delimiter NVARCHAR(1) = ','
IF LEN(@String) < 1 OR LEN(ISNULL(@String,'')) = 0
BEGIN
INSERT INTO @temptable(Specialty) VALUES('')
RETURN
END
WHILE @idx != 0
BEGIN
SET @idx = CHARINDEX(@Delimiter,@String)
IF @idx != 0
SET @slice = LEFT(@String,@idx - 1)
ELSE
SET @slice = @String
IF LEN(@slice) > 0
INSERT INTO @temptable(Specialty) VALUES(@slice)
SET @String = RIGHT (@String, LEN(@String) - @idx)
IF LEN(@String) = 0
BREAK
END
RETURN
END GO -- =============================================
-- Author: Ben
-- Create date: 2016-08-08
-- Description: 处理职位搜索器用
-- =============================================
CREATE PROCEDURE [GetStudentSearch]
-- Add the parameters for the stored procedure here
@Ssqtj NVARCHAR(1000), @ProvinceId NCHAR(6) OUTPUT, @Zydm NVARCHAR(10) OUTPUT, @JobNature NVARCHAR(8) OUTPUT, @JobTypeID uniqueidentifier OUTPUT, @SubJobTypeID uniqueidentifier OUTPUT,
@Salary NVARCHAR(30) OUTPUT, @Computer NVARCHAR(50) OUTPUT, @Language NVARCHAR(50) OUTPUT, @Education NVARCHAR(4) OUTPUT, @HyID uniqueidentifier OUTPUT, @SubHyID uniqueidentifier OUTPUT,
@ThirdHyID uniqueidentifier OUTPUT
AS
BEGIN
DECLARE @idx INT=1, @StartPos INT
DECLARE @Delimiter NVARCHAR(1) = '|' IF LEN(@Ssqtj) < 1 OR LEN(ISNULL(@Ssqtj, '')) = 0
RETURN SET @idx = CHARINDEX(@Delimiter, @Ssqtj, @idx)
IF @idx != 0
SET @ProvinceId = LEFT(@Ssqtj, @idx - 1) SET @StartPos = @idx + 1
SET @idx = CHARINDEX(@Delimiter, @Ssqtj, @StartPos)
IF @idx != 0
SET @Zydm = SUBSTRING(@Ssqtj, @StartPos, @idx - @StartPos) SET @StartPos = @idx + 1
SET @ThirdHyID = SUBSTRING(@Ssqtj, @StartPos, 36)
END TRY
BEGIN CATCH
END CATCH
END
GO --重要,参数表
if exists(select * from sysobjects where id=object_id('Parameters'))
DROP TABLE Parameters
GO
CREATE TABLE [Parameters]
(
[ID] [int] IDENTITY(1,1) NOT NULL Primary Key NONCLUSTERED HASH WITH (BUCKET_COUNT = 1024),
[Type] NVARCHAR(50) NOT NULL,
[Set1] NVARCHAR(50) NULL,
[Set2] NVARCHAR(50) NULL,
[Value] FLOAT NOT NULL
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_AND_DATA)
GO INSERT INTO [Parameters] ([Type], [Set1], [Set2], [Value])
SELECT 'Salary', '面议', '面议', 10
UNION ALL
SELECT 'Salary', '面议', '1500~2000', 10
UNION ALL
SELECT 'Salary', '面议', '2000~3000', 10
UNION ALL
SELECT 'Salary', '面议', '3000~4000', 10
UNION ALL
SELECT 'Salary', '面议', '4000~5000', 10
UNION ALL
SELECT 'Salary', '面议', '5000~6000', 10
UNION ALL
SELECT 'Salary', '面议', '6000~7000', 10
UNION ALL
SELECT 'Salary', '面议', '7000~8000', 10
UNION ALL
SELECT 'Salary', '面议', '8000以上', 10 UNION ALL
SELECT 'Salary', '1500~2000', '面议', 10
UNION ALL
SELECT 'Salary', '1500~2000', '1500~2000', 10
UNION ALL
SELECT 'Salary', '1500~2000', '2000~3000', 5
UNION ALL
SELECT 'Salary', '1500~2000', '3000~4000', 2
UNION ALL
SELECT 'Salary', '1500~2000', '4000~5000', 2
UNION ALL
SELECT 'Salary', '1500~2000', '5000~6000', 2
UNION ALL
SELECT 'Salary', '1500~2000', '6000~7000', 4
UNION ALL
SELECT 'Salary', '1500~2000', '7000~8000', 3
UNION ALL
SELECT 'Salary', '1500~2000', '8000以上', 2 UNION ALL
SELECT 'Salary', '2000~3000', '面议', 10
UNION ALL
SELECT 'Salary', '2000~3000', '1500~2000', 10
UNION ALL
SELECT 'Salary', '2000~3000', '2000~3000', 10
UNION ALL
SELECT 'Salary', '2000~3000', '3000~4000', 6
UNION ALL
SELECT 'Salary', '2000~3000', '4000~5000', 5
UNION ALL
SELECT 'Salary', '2000~3000', '5000~6000', 4
UNION ALL
SELECT 'Salary', '2000~3000', '6000~7000', 5
UNION ALL
SELECT 'Salary', '2000~3000', '7000~8000', 4
UNION ALL
SELECT 'Salary', '2000~3000', '8000以上', 3 UNION ALL
SELECT 'Salary', '3000~4000', '面议', 10
UNION ALL
SELECT 'Salary', '3000~4000', '1500~2000', 10
UNION ALL
SELECT 'Salary', '3000~4000', '2000~3000', 10
UNION ALL
SELECT 'Salary', '3000~4000', '3000~4000', 10
UNION ALL
SELECT 'Salary', '3000~4000', '4000~5000', 8
UNION ALL
SELECT 'Salary', '3000~4000', '5000~6000', 6
UNION ALL
SELECT 'Salary', '3000~4000', '6000~7000', 6
UNION ALL
SELECT 'Salary', '3000~4000', '7000~8000', 5
UNION ALL
SELECT 'Salary', '3000~4000', '8000以上', 4 --........ UNION ALL
SELECT 'Weight', 'HasNoSearch', 'Education', 20
UNION ALL
SELECT 'Weight', 'HasNoSearch', 'Profession', 20
UNION ALL
SELECT 'Weight', 'HasNoSearch', 'Industry', 8
UNION ALL
SELECT 'Weight', 'HasNoSearch', 'Enterprise', 12
GO --其他临时表,大数据分析
CREATE TABLE IndutryRanking
(
Gzydm NVARCHAR(10) COLLATE Chinese_PRC_Stroke_90_BIN2 NOT NULL, SubIndustry uniqueidentifier NOT NULL, Ranking TINYINT NOT NULL,
CONSTRAINT [PK_IndutryRanking] PRIMARY KEY NONCLUSTERED HASH
(
Gzydm ,
SubIndustry
)WITH ( BUCKET_COUNT = 2048) CREATE TABLE EnterpriseRanking
(
Gzydm NVARCHAR(10) COLLATE Chinese_PRC_Stroke_90_BIN2 NOT NULL, Zzjgdm NVARCHAR(10) COLLATE Chinese_PRC_Stroke_90_BIN2 NOT NULL, Ranking FLOAT NOT NULL,
CONSTRAINT [PK_EnterpriseRanking] PRIMARY KEY NONCLUSTERED HASH
(
Gzydm ,
Zzjgdm
)WITH ( BUCKET_COUNT = 131072)
) WITH (MEMORY_OPTIMIZED = ON, DURABILITY = SCHEMA_ONLY)
GO INSERT INTO EnterpriseRanking (Gzydm, Zzjgdm, Ranking)
SELECT Gzydm, Zzjgdm, LOG(Count(*)+1, 1.8)
FROM ..........[dbo].[vm_AllSyEmployment]
WHERE Gzydm IS NOT NULL AND ZZJGDM IS NOT NULL
GROUP BY Gzydm,Zzjgdm
ORDER BY Count(*) DESC GO CREATE FUNCTION [dbo].[GetCurrentBynf]()
RETURNS char(4)
AS
BEGIN
-- Declare the return variable here
DECLARE @Bynf int -- Add the T-SQL statements to compute the return value here
SELECT @Bynf = YEAR(GETDATE()) IF MONTH(GETDATE()) >= 9 OR ( MONTH(GETDATE()) = 8 AND DAY(GETDATE()) >25 )
BEGIN
SET @Bynf = @Bynf + 1
END RETURN CAST(@Bynf AS char(4))
END GO
做个批处理,每次重启数据库的时候把数据加入,每天定时更新相关数据表
职位匹配脚本:
CREATE PROCEDURE RetirePositionsByXsxh
--参数
-- Add the parameters for the stored procedure here
@Xsxh NVARCHAR(20), @ResultType INT = 0, @PageSize INT = 99999, @StartPage INT = 0, @ReleaseDateRange SMALLINT = 9999
WITH ENCRYPTION
AS
BEGIN
-- SET NOCOUNT ON added to prevent extra result sets from
-- interfering with SELECT statements.
SET NOCOUNT ON; --声明变量
-- Insert statements for procedure here
DECLARE @Ssqtj NVARCHAR(1000), @TableRows int, @PositionRows int,
@ProvinceId NCHAR(6), @Zydm NVARCHAR(10), @JobNature NVARCHAR(8), @JobTypeID uniqueidentifier, @SubJobTypeID uniqueidentifier,@Salary NVARCHAR(30),
@Computer NVARCHAR(50), @Language NVARCHAR(50), @Education NVARCHAR(4), @HyID uniqueidentifier, @SubHyID uniqueidentifier, @ThirdHyID uniqueidentifier,
--........ DECLARE @StudentSsqtj TABLE (SsqtjID INT IDENTITY(1,1) NOT NULL Primary Key, Ssqtj NVARCHAR(1000))
DECLARE @PositionSpecialty TABLE (ID INT NOT NULL Primary Key, Specialty NVARCHAR(1000))
--........... INSERT INTO @StudentSsqtj (Ssqtj)
SELECT Ssqtj
FROM .......[dbo].[PosiSearch] WITH (SNAPSHOT)
WHERE Xsxh = @Xsxh SELECT @TableRows = Count(*) FROM @StudentSsqtj
SELECT @PositionRows = Count(*) FROM [EnterPrisePositions] IF @TableRows > 0
BEGIN
SELECT @Salary_Weight = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Weight' AND [Set1] = 'HasSearch' AND [Set2] = 'Salary'
SELECT @JobLocation_Weight = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Weight' AND [Set1] = 'HasSearch' AND [Set2] = 'JobLocation'
SELECT @Education_Weight = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Weight' AND [Set1] = 'HasSearch' AND [Set2] = 'Education'
--........ SELECT @JobLocation_Same_Province = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Province' AND [Set1] = 'SameProvince'
SELECT @JobLocation_Same_City = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Province' AND [Set1] = 'SameCity'
SELECT @JobLocation_Not_Same = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Province' AND [Set1] = 'NotSame'
SELECT @Education_Same = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Education' AND [Set1] = 'Same'
SELECT @Education_Not_Same = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Education' AND [Set1] = 'NotSame'
--.......... WHILE @TableRows != 0
BEGIN
SELECT @Ssqtj = Ssqtj
FROM @StudentSsqtj
WHERE SsqtjID = @TableRows --取职位搜索器
EXEC [GetStudentSearch] @Ssqtj, @ProvinceId OUTPUT, @Zydm OUTPUT, @JobNature OUTPUT, @JobTypeID OUTPUT , @SubJobTypeID OUTPUT, @Salary OUTPUT,
@Computer OUTPUT, @Language OUTPUT, @Education OUTPUT, @HyID OUTPUT, @SubHyID OUTPUT,@ThirdHyID OUTPUT WHILE @PositionRows != 0
BEGIN --取每个职位
SELECT @Pos_HyID = [HyID], @Pos_SubHyID = [SubHyID], @Pos_ThirdHyID = --.......
FROM [EnterPrisePositions]
WHERE ID = @PositionRows AND DATEDIFF(hour, ReleaseDate, GETDATE()) <= 24 * @ReleaseDateRange IF @@ROWCOUNT <> 0
BEGIN
SET @Salary_Score = NULL
SET @JobLocation_Score = NULL
SET @Education_Score = NULL
SET @Profession_Score = NULL
SET @Industry_Score = NULL
SET @JobNature_Score = NULL
SET @JobType_Score = NULL
SET @Computer_Score = NULL
SET @Language_Score = NULL
SET @CurrentValue = 0
SET @WeightSummary = 0
SET @ParaValue = NULL
SET @CurrentScore = NULL --计算薪资 SELECT @ParaValue = [Value] FROM [Parameters] WHERE [Type] = 'Salary' AND [Set1] = @Pos_Salary AND [Set2] = @Salary
IF @ParaValue IS NOT NULL
BEGIN
SET @Salary_Score = ABS(10 - @ParaValue)*@Salary_Weight
END IF @ProvinceId <> ''
BEGIN
IF (LEFT(@Pos_ProvinceId,4) = LEFT(@ProvinceId, 4) OR (
LEFT(@Pos_ProvinceId,2) = LEFT(@ProvinceId, 2) AND LEFT(@Pos_ProvinceId,2) IN ('','','','')))
AND @JobLocation_Same_City IS NOT NULL
BEGIN
SET @JobLocation_Score = ABS(10 - @JobLocation_Same_City)*@JobLocation_Weight
END
ELSE IF LEFT(@Pos_ProvinceId,2) = LEFT(@ProvinceId, 2) AND @JobLocation_Same_Province IS NOT NULL
BEGIN
SET @JobLocation_Score = ABS(10 - @JobLocation_Same_Province)*@JobLocation_Weight
END
ELSE IF @JobLocation_Not_Same IS NOT NULL
BEGIN
SET @JobLocation_Score = ABS(10 - @JobLocation_Not_Same)*@JobLocation_Weight
END
END --计算学历
IF @Education NOT IN ('', '不限')
BEGIN
IF @Pos_Education = @Education
BEGIN
SET @Education_Score = ABS(10 - @Education_Same) * @Education_Weight
END
ELSE
BEGIN
SET @Education_Score = ABS(10 - @Education_Not_Same)*@Education_Weight
END
END --计算专业
IF @Zydm <> ''
BEGIN
DELETE FROM @PositionSpecialty
INSERT INTO @PositionSpecialty (ID, Specialty)
SELECT ID, Specialty FROM [GetPositionSpecialty](@Pos_SpecialtyIds)
SELECT @Specialty_Rows = Count(*) FROM @PositionSpecialty
WHILE @Specialty_Rows <> 0
BEGIN
SELECT @Pos_SpecialtyId = Specialty FROM @PositionSpecialty WHERE ID = @Specialty_Rows
IF @Pos_SpecialtyId = '' OR LEFT(@Pos_SpecialtyId,2) = LEFT(@Zydm,2)
BEGIN
SET @Profession_Score = ABS(10 - @Profession_Match) * @Profession_Weight
END
ELSE
BEGIN
IF @Profession_Score <> 0 OR @Profession_Score IS NULL
SET @Profession_Score = ABS(10 - @Profession_Not_Match) * @Profession_Weight
END
SET @Specialty_Rows = @Specialty_Rows - 1
END END --其他略 --计算匹配度,加到匹配度临时表
IF @Salary_Score IS NOT NULL
BEGIN
SET @CurrentValue = @CurrentValue + @Salary_Score
SET @WeightSummary = @WeightSummary + @Salary_Weight
END
IF @JobLocation_Score IS NOT NULL
BEGIN
SET @CurrentValue = @CurrentValue + @JobLocation_Score
SET @WeightSummary = @WeightSummary + @JobLocation_Weight
END
IF @Education_Score IS NOT NULL
BEGIN
SET @CurrentValue = @CurrentValue + @Education_Score
SET @WeightSummary = @WeightSummary + @Education_Weight
END IF @WeightSummary != 0
BEGIN
SET @CurrentScore = @CurrentValue / @WeightSummary
SET @ParaValue = NULL
SELECT @ParaValue = [Score] FROM @PositionScore WHERE ID = @PositionRows
BEGIN
IF @ParaValue IS NULL
BEGIN
INSERT INTO @PositionScore(ID, Score, Salary_Score, Province_Score, Education_Score , Profession_Score , Industry_Score , JobNature_Score , JobType_Score , Computer_Score , Language_Score)
VALUES (@PositionRows, @CurrentScore, @Salary_Score, @JobLocation_Score,@Education_Score , @Profession_Score , @Industry_Score , @JobNature_Score , @JobType_Score , @Computer_Score , @Language_Score)
END
ELSE
BEGIN
IF @CurrentScore < @ParaValue
BEGIN
UPDATE @PositionScore SET Score = @CurrentScore WHERE ID = @PositionRows
END
END
END
END
END
SET @PositionRows = @PositionRows - 1
END --SELECT @ProvinceId, @Zydm, @JobNature, @JobTypeID, @SubJobTypeID,@Salary, @Computer, @Language , @Education, @HyID, @SubHyID, @ThirdHyID
SET @TableRows = @TableRows - 1
END
END
ELSE --学生没有建立职位搜索器
BEGIN
SELECT @Top1_Industry = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Industry' AND [Set1] = 'Top1'
SELECT @Top2_Industry = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Industry' AND [Set1] = 'Top2'
--.......
SELECT @Enterprise_Weight = [Value] FROM [Parameters] WITH (SNAPSHOT) WHERE [Type] = 'Weight' AND [Set1] = 'HasNoSearch' AND [Set2] = 'Enterprise' SELECT @Education = (CASE Xldm WHEN '' THEN '研究生' WHEN '' THEN '研究生' WHEN '' THEN '本科' WHEN '' THEN '高职' ELSE '' END), @Zydm = Gzydm
FROM .......[dbo].[StudentBasic] WITH (SNAPSHOT)
WHERE Xsxh = @Xsxh WHILE @PositionRows != 0
BEGIN --基本处理方法同前,也是一个个项目 SELECT @Pos_SubHyID = [SubHyID], @Pos_Education = [Education], @Pos_SpecialtyIds = [SpecialtyIds], @Pos_Zzjgdm = [OrgCode]
FROM [EnterPrisePositions] WITH(SNAPSHOT)
WHERE ID = @PositionRows
AND DATEDIFF(hour, ReleaseDate, GETDATE()) <= 24 * @ReleaseDateRange IF @@ROWCOUNT <> 0
BEGIN
SET @Education_Score = NULL
SET @Profession_Score = NULL
SET @Industry_Score = NULL
SET @Enterprise_Score = NULL --.......... IF @WeightSummary != 0
BEGIN
SET @CurrentScore = @CurrentValue / @WeightSummary
SET @ParaValue = NULL
SELECT @ParaValue = [Score] FROM @PositionScore WHERE ID = @PositionRows
BEGIN
IF @ParaValue IS NULL
BEGIN
INSERT INTO @PositionScore(ID, Score, Education_Score , Profession_Score , Industry_Score , Enterprise_Score)
VALUES (@PositionRows, @CurrentScore, @Education_Score , @Profession_Score , @Industry_Score , @Enterprise_Score)
END
ELSE
BEGIN
IF @CurrentScore < @ParaValue
BEGIN
UPDATE @PositionScore SET Score = @CurrentScore WHERE ID = @PositionRows
END
END
END
END
END SET @PositionRows = @PositionRows - 1
END --SELECT @Education, @Zydm END --根据各种输入参数输出
IF @ResultType = 1
SELECT s.*, p.*
FROM [EnterPrisePositions] P WITH (SNAPSHOT) LEFT OUTER JOIN @PositionScore S ON p.ID = s.ID
WHERE s.Score IS NOT NULL
ORDER BY Score
offset @PageSize*@StartPage rows fetch next @PageSize rows only --Sql2012的新的分页特性,效率很高
ELSE IF @ResultType = 2
BEGIN
if exists(select * from sysobjects where id=object_id('UserPositionResult'))
DROP TABLE UserPositionResult CREATE TABLE UserPositionResult
(
[ID] [int] IDENTITY(1,1) NOT NULL Primary Key NONCLUSTERED HASH WITH (BUCKET_COUNT = 4096),
[Score] FLOAT,
[PosiID] [uniqueidentifier] NOT NULL,
[PosiName] [nvarchar](40) NULL,
[EntName] [nvarchar](80) NOT NULL,
--...... WHERE s.Score IS NOT NULL
ORDER BY Score
END
ELSE
SELECT s.Score, p.PosiID, p.PosiName, p.EntName, p.EntUserID, p.JobNature, p.Number, p.Salary, p.Education, p.Specialty
FROM [EnterPrisePositions] P WITH (SNAPSHOT) LEFT OUTER JOIN @PositionScore S ON p.ID = s.ID
WHERE s.Score IS NOT NULL
ORDER BY Score
offset @PageSize*@StartPage rows fetch next @PageSize rows only --输出总记录数
SELECT Count(*) AS [TotalCount] FROM [EnterPrisePositions] P WITH (SNAPSHOT) LEFT OUTER JOIN @PositionScore S ON p.ID = s.ID
WHERE s.Score IS NOT NULL
END
GO
Sqlserver内存表的表现还是令人满意的,所有的职位匹配都计算一遍1秒还不到。想想哈希桶的威力是比较大,以前是B树索引,现在直接把时间复杂度近似降低到了O(1),对再大的数据量也是如此(需要设置合适的哈希桶数值)。
计算时的截图:
加大数据样本,到了上万条,也是1秒钟搞定。
心得:
- 对于不需要持久化的数据库,Sqlserver的内存表是最佳选择,建表的时候使用DURABILITY = SCHEMA_ONLY选项,不写日志,读写速度扛扛的;
- 长的存储过程,千万不要使用游标,性能极其低下,而且或产生一大堆的锁,影响其他进程。(改用While循环);
- 对内存表读取使用SnapShot隔离级别;对普通读取,实时数据准确度要求不高的情况下(比如数据分析)使用nolock隔离级别。
前台页面显示样式如下:
我们还做了邮件推送,定期给学生推送职位。
希望这篇文章起到抛砖引玉的作用,能够听取大家的建议。
基于海明距离的加权平均值人职匹配模型(Sqlserver2014/16内存表实现)的更多相关文章
- SpringBoot入门教程(五)Java基于MySQL实现附近的人
“附近的人”这个功能估计都不陌生,与之类似的功能最开始是在各大地图应用上接触过,比如搜附近的电影院,附近的超市等等.然而真正让附近的人火遍大江南北的应该是微信"附近的人"这个功能, ...
- 【佛山市选2013】JZOJ2020年8月7日提高组T3 海明距离
[佛山市选2013]JZOJ2020年8月7日提高组T3 海明距离 题目 描述 对于二进制串a,b,他们之间的海明距离是指两个串异或之后串中1的个数.异或的规则为: 0 XOR 0 = 0 1 XOR ...
- 海量数据相似度计算之simhash和海明距离
通过 采集系统 我们采集了大量文本数据,但是文本中有很多重复数据影响我们对于结果的分析.分析前我们需要对这些数据去除重复,如何选择和设计文本的去重算法?常见的有余弦夹角算法.欧式距离.Jaccard相 ...
- 64. 海明距离(Hamming Distance)
[本文链接] http://www.cnblogs.com/hellogiser/p/hamming-distance.html [介绍] 在信息领域,两个长度相等的字符串的海明距离是在相同位置上不同 ...
- 在线白板,基于socket.io的多人在线协作工具
首发:个人博客,更新&纠错&回复 是昨天这篇博文留的尾巴,socket.io库的使用练习,成品地址在这里. 代码已经上传到github,传送门.可以开俩浏览器看效果. 现实意义是俩人在 ...
- Matlab计算两集合间的海明距离
一.问题描述 B1[1 2 3 4 5 6 7 8 9] B2[12 13 14 21 31 41 51 1 1 81 1 1] 两个十进制矩阵,行数不一样,分别是n1和n2,列数必须一致,为nwo ...
- 使用simhash以及海明距离判断内容相似程度
算法简介 SimHash也即相似hash,是一类特殊的信息指纹,常用来比较文章的相似度,与传统hash相比,传统hash只负责将原始内容尽量随机的映射为一个特征值,并保证相同的内容一定具有相同的特征值 ...
- python海明距离 - 5IVI4I_I_60Y的日志 - 网易博客
python海明距离 - 5IVI4I_I_60Y的日志 - 网易博客 python海明距离 2009-10-01 09:50:41| 分类: Python | 标签: |举报 |字号大中小 ...
- 基于PaddlePaddle的语义匹配模型DAM,让聊天机器人实现完美回复 |
来源商业新知网,原标题:让聊天机器人完美回复 | 基于PaddlePaddle的语义匹配模型DAM 语义匹配 语义匹配是NLP的一项重要应用.无论是问答系统.对话系统还是智能客服,都可以认为是问题和回 ...
随机推荐
- React native 的弹出层(输入)效果
/*弹出层测试*/ import React,{Component} from 'react'; import { StyleSheet, View, Image, Text, TouchableOp ...
- 从DataGridView导出Excel
从DataGridView导出Excel的两种情况,不多说,直接记录代码(新建类,直接引用传入参数). using System; using System.Collections.Generic; ...
- Lua字符串库
1. 基础字符串函数: 字符串库中有一些函数非常简单,如: 1). string.len(s) 返回字符串s的长度: 2). string.rep(s,n) 返回字符串s重复n次的结 ...
- css 设置圆角
CSS3 圆角(border-radius) -moz(例如 -moz-border-radius)用于Firefox -webkit(例如:-webkit-border-radius)用于Safar ...
- JavaScript 函数声明,函数表达式,匿名函数,立即执行函数之区别
函数声明:function fnName () {-};使用function关键字声明一个函数,再指定一个函数名,叫函数声明. 函数表达式 var fnName = function () {-};使 ...
- 关于 apue.h 的使用方法
unix中有很多地方使用到apue.h .apue.h是作者自己写的一个头文件,这个文件怎么用,晚上有很多方法,但是经过尝试大多不好用. 经过本人尝试,可以把src.3e.tar.gz 的代码解压到 ...
- source insight增加注释宏
打开base项目,下main.c文件里加入以下代码 对于在一行内的局部注释, 没有找到解除注释的宏, macro Comments_orUn_gx() { //用杠星注释多行,或取消杠星注释 ,不选中 ...
- 将时间转换为xxx天前 xxx..前
<?php echo (time_fitle('2016-08-17 17:00:00')); function time_fitle($date){ date_default_timezone ...
- Java 获取汉字拼音的方法
package lius.util; import java.io.Serializable; import java.util.ArrayList; public class JString ...
- css 居中问题
水平居中有text-align:center 和 margin:0 auto,前者是针对父元素进行设置而后者则是对子元素.margin:0 auto; 这里实现的前提是子元素没有float 浮动起来, ...