POJ

排序的思想就是根据选取范围的题目的totalSubmittedNumber和totalAcceptedNumber计算一个avgAcceptRate。

每一道题都有一个value,value = acceptedNumber / avgAcceptRate + submittedNumber。

这里用到avgAcceptedRate的原因是考虑到通过的数量站的权重可能比提交的数量占更大的权重,所以给acceptedNumber乘上了一个因子。

当然计算value还有别的方法,比如POJ上volumn list页面提供的通过率:http://poj.org/problemlist

今天(2016.01.22)的时候POJ上面题号范围是[1000, 4054],但是连续取数据的话可能服务器会返回504错误,所以我就选择了每次少取一些题目爬它对应的网页。

我的程序是一个一个题目所在的页面来爬数据的,比如对于题号1000,它对应的网页是http://poj.org/problem?id=1000。这样的话我连续请求次数过多服务器就可能会返回给我一个504错误。有觉得可以的解决办法:

(1)每次少请求一些页面。比如下面的样例里面请求的页面的题号范围是[1000, 1099]。

(2)出现504错误时进行Thread.sleep(...),等待一段时间。

(3)变更User-agent。

(4)从题号页面(如http://poj.org/problemlist?volume=2)开始抓,而不是从题目页面开始抓。(这样的话只要抓31个页面就抓完了)

  1. import java.io.BufferedReader;
  2. import java.io.InputStreamReader;
  3. import java.net.HttpURLConnection;
  4. import java.net.URL;
  5. import java.util.ArrayList;
  6. import java.util.Collections;
  7. import java.util.Comparator;
  8. import java.util.List;
  9. import java.util.regex.Matcher;
  10. import java.util.regex.Pattern;
  11.  
  12. public class PojProblemSort {
  13.  
  14. private static double avgAcceptRate;
  15.  
  16. private static List<ProblemObject> problemObjectList = new ArrayList<ProblemObject>();
  17.  
  18. // 这段代码用于获取网页源代码
  19. private static String getPageContent(String urlString) throws Exception {
  20. URL url = new URL(urlString);
  21. HttpURLConnection urlConnection = (HttpURLConnection) url.openConnection();
  22. BufferedReader reader = new BufferedReader(new InputStreamReader(urlConnection.getInputStream(), "UTF-8"));
  23. String line;
  24. String ans = "";
  25. while ((line = reader.readLine()) != null){
  26. ans += line + "\n";
  27. }
  28. return ans;
  29. }
  30.  
  31. // 这段代码关于获取题号为problemId的题目对应的网址
  32. private static String getProblemUrl(int problemId) {
  33. return "http://poj.org/problem?id=" + problemId;
  34. }
  35.  
  36. // 这个类用于记录每一道题目的基本信息
  37. static class ProblemObject {
  38. int problemId;
  39. String title;
  40. int submittedNumber;
  41. int acceptedNumber;
  42. double value;
  43.  
  44. public ProblemObject(int problemId, String title, int submittedNumber, int acceptedNumber) {
  45. this.problemId = problemId;
  46. this.title = title;
  47. this.submittedNumber = submittedNumber;
  48. this.acceptedNumber = acceptedNumber;
  49. }
  50.  
  51. public void calculateValue() {
  52. value = (double) acceptedNumber / avgAcceptRate + submittedNumber;
  53. }
  54.  
  55. @Override
  56. public String toString() {
  57. return problemId + ":\t" + title + "\t" + acceptedNumber + "/" + submittedNumber +
  58. " (" + ( (double) acceptedNumber / submittedNumber ) + ")\t" + value;
  59. }
  60. }
  61.  
  62. private static ProblemObject parseProblemInformation(int problemId) throws Exception {
  63. String urlString = getProblemUrl(problemId);
  64. String content = getPageContent(urlString);
  65.  
  66. String title = null;
  67. int acceptedNumber = 0;
  68. int submittedNumber = 0;
  69.  
  70. Pattern pattern = Pattern.compile("<title>([^<]*)");
  71. Matcher matcher = pattern.matcher(content);
  72. if (matcher.find()) {
  73. title = matcher.group(1).trim();
  74. } else {
  75. return null;
  76. }
  77.  
  78. pattern = Pattern.compile("<b>Total Submissions:</b>([^<]*)");
  79. matcher = pattern.matcher(content);
  80. if (matcher.find()) {
  81. submittedNumber = Integer.parseInt(matcher.group(1).trim());
  82. } else {
  83. return null;
  84. }
  85.  
  86. pattern = Pattern.compile("<b>Accepted:</b>([^<]*)");
  87. matcher = pattern.matcher(content);
  88. if (matcher.find()) {
  89. acceptedNumber = Integer.parseInt(matcher.group(1).trim());
  90. } else {
  91. return null;
  92. }
  93.  
  94. // debug
  95. // System.out.println(problemId + "\t" + title + "\t" + submittedNumber + "\t" + acceptedNumber);
  96.  
  97. ProblemObject problemObject = new ProblemObject(problemId, title, submittedNumber, acceptedNumber);
  98. return problemObject;
  99. }
  100.  
  101. public static void main(String[] args) {
  102. int startProblemId = 1000;
  103. int endProblemId = 1099;
  104. double totalAcceptedNumber = 0;
  105. double totalSubmittedNumber = 0;
  106. for (int problemId = startProblemId; problemId <= endProblemId; problemId ++) {
  107. System.out.println("problem id: " + problemId);
  108. ProblemObject problemObject = null;
  109. try {
  110. problemObject = parseProblemInformation(problemId);
  111. } catch (Exception e) {
  112. e.printStackTrace();
  113. }
  114. if (problemObject != null) {
  115. totalAcceptedNumber += problemObject.acceptedNumber;
  116. totalSubmittedNumber += problemObject.submittedNumber;
  117. problemObjectList.add(problemObject);
  118. }
  119. }
  120. avgAcceptRate = (double) totalAcceptedNumber / (double) totalSubmittedNumber;
  121. for (ProblemObject problemObject : problemObjectList) {
  122. problemObject.calculateValue();
  123. }
  124. Collections.sort(problemObjectList, new Comparator<ProblemObject>() {
  125.  
  126. @Override
  127. public int compare(ProblemObject o1, ProblemObject o2) {
  128. return o1.value > o2.value ? -1 : 1;
  129. }
  130. });
  131. for (ProblemObject problemObject : problemObjectList) {
  132. System.out.println(problemObject);
  133. }
  134. }
  135. }

PojProblemSort.java

前100道题的排序结果(按value的计算结果从易到难):

1000: 1000 -- A+B Problem 210240/376297 (0.558707616590087) 968725.9801441709
1002: 1002 -- 487-3279 47583/267412 (0.17793891074446921) 401494.70625095174
1004: 1004 -- Financial Management 63395/169851 (0.37323889762203344) 348489.86603995296
1003: 1003 -- Hangover 55430/113487 (0.4884259871174672) 269681.53181788145
1001: 1001 -- Exponentiation 37121/152436 (0.24351859140885354) 257038.1507416846
1006: 1006 -- Biorhythms 39323/124555 (0.3157079201958974) 235362.10039102565
1011: 1011 -- Sticks 31019/132077 (0.23485542524436503) 219484.50825291115
1005: 1005 -- I Think I Need a Houseboat 41460/95226 (0.4353852939323294) 212054.88849304285
1007: 1007 -- DNA Sorting 37034/92137 (0.40194492983274904) 196493.9960552665
1088: 1088 -- 滑雪 32502/86863 (0.3741754256703084) 178449.40940185427
1061: 1061 -- 青蛙的约会 19476/101103 (0.19263523337586422) 155983.8353181501
1008: 1008 -- Maya Calendar 22303/72459 (0.30780165334879034) 135305.95369175915
1014: 1014 -- Dividing 17011/65484 (0.2597733797568872) 113418.78586963704
1050: 1050 -- To the Max 23751/44853 (0.5295297973379707) 111780.22939214329
1012: 1012 -- Joseph 19386/50923 (0.3806924179643776) 105550.22702185549
1017: 1017 -- Packets 16573/48917 (0.33879837275384833) 95617.55882767003
1013: 1013 -- Counterfeit Dollar 13654/43045 (0.3172029271692415) 81520.1964178487
1062: 1062 -- 昂贵的聘礼 12454/42515 (0.29293190638598143) 77608.75246725412
1046: 1046 -- Color Me Less 15811/32557 (0.4856405688484811) 77110.34191904246
1067: 1067 -- 取石子游戏 12876/38484 (0.3345806049267228) 74766.89358987988
1028: 1028 -- Web Navigation 14297/32029 (0.4463767210965063) 72316.08680137564
1019: 1019 -- Number Sequence 10602/36691 (0.2889536943664659) 66566.05730350314
1077: 1077 -- Eight 12333/28242 (0.4366900361164224) 62994.79020223583
1068: 1068 -- Parencodings 13885/23686 (0.5862112640378283) 62812.12437833816
1094: 1094 -- Sorting It All Out 10853/31220 (0.34762972453555413) 61802.34266316918
1042: 1042 -- Gone Fishing 9626/31817 (0.3025426658704466) 58941.81622368621
1083: 1083 -- Moving Tables 9683/29050 (0.33332185886402754) 56335.43481133946
1064: 1064 -- Cable master 6959/32595 (0.21349900291455745) 52204.55704348975
1018: 1018 -- Communication System 9182/25736 (0.3567764998445757) 51609.68196196622
1035: 1035 -- Spell checker 8515/23384 (0.36413787204926445) 47378.1626994274
1080: 1080 -- Human Gene Functions 10192/18345 (0.5555737258108476) 47064.73062038333
1015: 1015 -- Jury Compromise 6998/26666 (0.26243156078901975) 46385.45397188408
1065: 1065 -- Wooden Sticks 8579/20337 (0.4218419629247185) 44511.506376792444
1059: 1059 -- Chutes and Ladders 870/39303 (0.022135714830928938) 41754.54686418107
1032: 1032 -- Parliament 7536/17839 (0.42244520432759686) 39074.46800973399
1016: 1016 -- Numbers That Count 6556/19550 (0.3353452685421995) 38023.95545008174
1045: 1045 -- Bode Plot 8612/13702 (0.6285213837396001) 37969.49608543379
1026: 1026 -- Cipher 5680/20750 (0.2737349397590361) 36755.50136614769
1009: 1009 -- Edge Detection 4603/19800 (0.23247474747474747) 32770.655420489056
1029: 1029 -- False coin 5019/17937 (0.2798126777053019) 32079.889323361844
1010: 1010 -- STAMPS 4955/17340 (0.28575547866205303) 31302.545645996797
1020: 1020 -- Anniversary Cake 5282/16175 (0.3265533230293663) 31058.98912253382
1087: 1087 -- A Plug for UNIX 5183/15316 (0.33840428310263776) 29921.01999660977
1056: 1056 -- IMMEDIATE DECODABILITY 6049/12635 (0.47874950534230315) 29680.29538095553
1047: 1047 -- Round and Round We Go 5794/12407 (0.46699443862335777) 28733.73854145418
1036: 1036 -- Gangsters 3366/12034 (0.2797074954296161) 21518.950281417805
1054: 1054 -- The Troublesome Frog 3332/11187 (0.2978457137749173) 20576.142702817626
1051: 1051 -- P,MTHBGWB 4434/7747 (0.5723505873241255) 20241.435397446985
1038: 1038 -- Bugs Integrated, Inc. 3721/9716 (0.38297653355290245) 20201.294116802037
1023: 1023 -- The Fun Number System 3482/10241 (0.3400058588028513) 20052.823196641948
1039: 1039 -- Pipe 3032/9889 (0.3066032965921731) 18432.78171516898
1095: 1095 -- Trees Made to Order 4025/7055 (0.5705173635719348) 18396.926584285997
1091: 1091 -- 跳蚤 2818/9372 (0.3006828851899274) 17312.757543979613
1063: 1063 -- Flip and Shift 3343/7132 (0.46873247335950646) 16552.139272364744
1089: 1089 -- Intervals 3080/7780 (0.39588688946015427) 16459.039473192766
1041: 1041 -- John's trip 2713/8061 (0.33655873961047017) 15705.881198302588
1037: 1037 -- A decorative fence 2620/7021 (0.37316621563879787) 14403.819292131506
1066: 1066 -- Treasure Hunt 2558/6166 (0.41485566007135904) 13374.111354684119
1082: 1082 -- Calendar Game 2495/5259 (0.4744247955885149) 12289.585547277904
1027: 1027 -- The Same Game 1970/5254 (0.37495241720593836) 10805.203818892775
1060: 1060 -- Modular multiplication of polynomials 1978/4375 (0.4521142857142857) 9948.746778563403
1040: 1040 -- Transportation 1753/4313 (0.4064456294922328) 9252.72603782692
1099: 1099 -- Square Ice 1600/4104 (0.3898635477582846) 8612.59193412611
1079: 1079 -- Ratio 1500/4046 (0.3707365299060801) 8272.80493824323
1033: 1033 -- Defragment 1414/4067 (0.3476764199655766) 8051.468121783951
1021: 1021 -- 2D-Nim 1572/3483 (0.45133505598621876) 7912.691575278904
1084: 1084 -- Square Destroyer 1492/3487 (0.42787496415256665) 7691.261978572598
1053: 1053 -- Set Me 1397/3051 (0.4578826614224844) 6987.56433248386
1031: 1031 -- Fence 1150/3423 (0.3359626059012562) 6663.550452653142
1090: 1090 -- Chain 1104/3350 (0.3295522388059702) 6460.928434547017
1085: 1085 -- Triangle War 1197/3036 (0.3942687747035573) 6408.990340718097
1034: 1034 -- The dog task 1165/2896 (0.4022790055248619) 6178.818502035574
1049: 1049 -- Microprocessor Simulation 922/3161 (0.29167984814931985) 5759.0761020401715
1044: 1044 -- Date bugs 879/2984 (0.2945710455764075) 5460.907693810532
1043: 1043 -- What's In A Name? 908/2531 (0.3587514816278151) 5089.625922616568
1057: 1057 -- FILE MAPPING 1031/2101 (0.49071870537839124) 5006.223927552513
1024: 1024 -- Tester Program 872/2511 (0.34727200318598167) 4968.182604098731
1022: 1022 -- Packing Unit 4D Cubes 747/2266 (0.3296557811120918) 4370.948859245128
1093: 1093 -- Formatting Text 651/2461 (0.2645266151970744) 4295.433343197561
1048: 1048 -- Follow My Logic 583/2042 (0.28550440744368266) 3684.818185997202
1025: 1025 -- Department 441/1854 (0.23786407766990292) 3096.6806518435096
1069: 1069 -- The Bermuda Triangle 620/1292 (0.47987616099071206) 3039.079374473868
1078: 1078 -- Gizilch 561/1455 (0.38556701030927837) 3035.8250469029676
1096: 1096 -- Space Station Shielding 533/1511 (0.35274652547981467) 3012.9246880557607
1030: 1030 -- Rating 400/1730 (0.23121387283236994) 2857.1479835315276
1058: 1058 -- The Gourmet Club 465/1524 (0.3051181102362205) 2834.309530855401
1092: 1092 -- Farmland 502/1315 (0.3817490494296578) 2729.5707193320673
1071: 1071 -- Illusive Chase 531/1181 (0.44961896697713805) 2677.288948138103
1072: 1072 -- Puzzle Out 319/1167 (0.27335047129391604) 2065.9005168663934
1097: 1097 -- Roads Scholar 318/859 (0.370197904540163) 1755.0826469075646
1055: 1055 -- BULK MAILING 278/961 (0.2892819979188345) 1744.367848554412
1074: 1074 -- Parallel Expectations 275/894 (0.3076062639821029) 1668.9142386779254
1052: 1052 -- Plato's Blocks 309/767 (0.4028683181225554) 1637.7218172781052
1086: 1086 -- Unscrambling Images 294/749 (0.3925233644859813) 1577.4537678956729
1081: 1081 -- You Who? 268/794 (0.33753148614609574) 1549.1891489661236
1076: 1076 -- Bowl 201/915 (0.21967213114754097) 1481.3918617245927
1073: 1073 -- The Willy Memorial Program 207/710 (0.2915492957746479) 1293.2990814775656
1075: 1075 -- University Entrance Examination 185/624 (0.296474358974359) 1145.3059423833315
1098: 1098 -- Robots 178/566 (0.31448763250883394) 1067.5808526715298
1070: 1070 -- Deformed Wheel 143/652 (0.21932515337423314) 1054.9554041125211

POJ题目排序的Java程序的更多相关文章

  1. 各种排序算法及其java程序实现

    各种排序算法:冒择路(入)兮(稀)快归堆,桶式排序,基数排序 冒泡排序,选择排序,插入排序,稀尔排序,快速排序,归并排序,堆排序,桶式排序,基数排序 一.冒泡排序(BubbleSort)1. 基本思想 ...

  2. 程序员必知的8大排序(一)-------直接插入排序,希尔排序(java实现)

    http://blog.csdn.net/pzhtpf/article/details/7559896 程序员必知的8大排序(一)-------直接插入排序,希尔排序(java实现) 程序员必知的8大 ...

  3. 转载:java程序员如何拿到2万月薪

    作者:匿名用户链接:https://www.zhihu.com/question/39890405/answer/83676977来源:知乎 著作权归作者所有.商业转载请联系作者获得授权,非商业转载请 ...

  4. 20145223《Java程序程序设计》实验一实验报告

    实验一 Java开发环境的熟悉(Windows + IDE) 实验内容 1.使用JDK编译.运行简单的Java程序: 2.使用IDE 编辑.编译.运行.调试Java程序. 实验步骤 (一)命令行下Ja ...

  5. Java程序员面试题集(136-150)(转)

    转:http://blog.csdn.net/jackfrued/article/details/17740651 Java程序员面试题集(136-150) 摘要:这一部分主要是数据结构和算法相关的面 ...

  6. JAVA程序员面试宝典

    程序员面试之葵花宝典 面向对象的特征有哪些方面    1. 抽象:抽象就是忽略一个主题中与当前目标2. 无关的那些方面,3. 以便更充分地注意与当前目标4. 有关的方面.抽象并不5. 打算了解全部问题 ...

  7. Java程序员面试之葵花宝典

    程序员面试之葵花宝典 1.面向对象的特征有哪些方面   抽象:抽象就是忽略一个主题中与当前目标 无关的那些方面, 以便更充分地注意与当前目标有关的方面.抽象并不打算了解全部问题,而 只是选择其中的一部 ...

  8. 月薪3万Java程序员要达到的技术层次

    要达到月薪3万,一般要在北上广深杭知名的互联网公司,同时要在某一个知识领域达到专家级别,而不是简单的掌握SSH那么简单.虽然对部分人有点难,但目标还是要有的,万一实现呢? 首先三万的月薪在BAT实在太 ...

  9. Java程序员月薪三万,需要技术达到什么水平?

    最近跟朋友在一起聚会的时候,提了一个问题,说 Java 程序员如何能月薪达到二万,技术水平需要达到什么程度?人回答说这只能是大企业或者互联网企业工程师才能拿到.也许是的,小公司或者非互联网企业拿二万的 ...

随机推荐

  1. C++11:新式的字符串字面常量(String Literal)

    自C++11起,我们可以定义 raw string 字符串字面常量. Raw string 允许我们定义所见即所得的字符串字面常量,从而可以省下很多用来修饰特殊 字符的符号. Raw string 以 ...

  2. wp8开发笔记之开发环境的搭建

    开发工具的下载: Windows phone sdk 8.0下载地址: http://www.microsoft.com/ZH-CN/download/details.aspx?id=35471 开发 ...

  3. 华为手机调试显示log日志

    华为手机默认状态手机log为关闭状态,所以看不到详细错误信息. 手机拨号*#*#2846579#*#*,进入projectmenu--后台设置--LOG设置--LOG开关--打开 勾选AP日志   C ...

  4. 第六篇——初尝Python,意犹未尽

    作业2的要求是选一个你从来没有学过的编程语言,试一试实现基本功能.那么在这里我准备学习Python语言进行学习,并尝试用Python写一写东西. http://www.runoob.com/ Pyth ...

  5. unix下输出重定向

    > 为重定向符号 >> 重定向不覆盖原文件内容 example: 1. 标准输出重定向 echo "123" > /home/123.txt ---- 标准 ...

  6. vue吃进去的object已经变了样,不在是原来的!

    直接上代码: class data { public list:number[]; constructor() { this.list=[0,0,0]; } } class methods exten ...

  7. Maven项目下java.lang.ClassNotFoundException的解决方法

    问题背景: Maven的project下,项目中已经引用了相应的jar包.Java class中没有语法错误,在执行时报ClassNotFound.检查了Maven的pom.xml,依赖引入正常. 错 ...

  8. Java里面,反射父类里面数字类型字段,怎么set值

    Java里面,反射父类里面数字类型字段,怎么set值,我的做法是这样: /** * TODO 直接设置对象属性值, 忽略private/protected 修饰符, 也不经过setter * @aut ...

  9. Rails学习笔记一

    安装篇: 下载railsintalls 这里我下载的是rubyinstaller-1.9.3-p448 安装后,安装SQLite3数据库 下载sqlite3.def sqlite3.dll sqlit ...

  10. Python基于websocket实时通信的实现—GoEasy

    Python websocket实时消息推送 在这里我记录一下之前如何实现服务器端与客户端实时通信: 实现步骤如下: 1.        获取GoEasy appkey. 在goeasy官网上注册一个 ...