1 pagerank算法介绍

1.1 pagerank的假设

  数量假设:每个网页都会给它的链接网页投票,假设这个网页有n个链接,则该网页给每个链接平分投1/n票。

  质量假设:一个网页的pagerank值越大,则它的投票越重要。表现为将它的pagerank值作为它投票的加权值。

1.2 矩阵表示形式

  aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAXAAAAAmCAIAAABS21c2AAAIfElEQVR4nO1c27WsKBA9KRADKZCDIRiDKZiBGZiBERiBCZgAGZCD81Fr9qqhoKRt+3HO1P64654WoShg1wPw5zAYDIab8PNpAQwGw9+BEYrBYLgNRigGg+E2GKEYDIbbYIRiMBhugxGKwWC4DUYoBoPhNhihGAyG22CEYjAYboNGKNM0/dQRQhiGYdu27K2u65S3nHNd183znFI6FY4EkE28FNu2KfJ77/u+X5alKKqCruvGcYwxFhuNMQ7D4L2HbpXCNZDmW9Sli5rhzfq/AOp413X8x5TSNE1ceIzsp3q0bZv3vuu6aZrmedYLz/M8jmPXdc65Jxsdx7GxMOlnmqbLzV0nFCATVycUwHuvL5hPDb9OKEDXdZwTG3XlnFvXNWtx33fnXLHwvu/tkpPmW0q2iAr8RkKJMRI7fxuhcMUq83/fd17ycos0LTOqVUDNvZxQik/3fQd38EVSNBf8rWEY6C2ln3yNfYpQiu3GGMdxpAKcSXVdxRinaaIeOecy74ymvnMOjs+yLFTYe38qMFk88nFIgGVZ+r6XzKVAl//X4ePcUQSkohFXnBSaY1gCl1vUF+MrcJ1QCKQaPu9b+jDPM9VctMB4+oWEQgAnwsi06Apmh8+kdV2LqoAYOi9wU4axoD+HYWjqcLP8vwhfTijEF8oaoUHENLvc4u8jlGVZsqXV2Ad6K3OuYozwer7TQ4GcVAY+ReOClMqhSdP3vSwcQvgREaXEvu/LsmDyhRCmaVrXtSVLBRihvAFcKprexTEiI0Hj+L8jFDlyzxAK2GQcx4/nUPR2M/kvEwrZomLUSnW2RD0ppRCCc47+lTnjU5zKj2ickz7vCMVZ8I/oqcy+cw1M00SkSeunKPa2bX3fw7qc1nmUMkSkXmVk53nmlmwYBuk7cxWt64ry3vtpmlromwtABqDYZfLQ53muDQrF3TxZ6b3PdkiKST2un23b5nlGZE2ScHXB+ZXWDjGEjNo+QygppWxB4t2u62gsLxCK1KDEKdO9lFBo/XAZ6K1iXAPvT6+T2ISEiTHSFHmUhRsJJYQACsBUSykpmfgQAl9sVDKrB8jCNGigpc5rhALtSWSrHSqCM6iIVAQXgLqmeKb7vhcHBfk1XeZTQsl6QYuOq4tXwjWGOVZcSreFPNBmC6Egr5nZAZ73/mZCQcgDImjRFdIl4HUYgWJbkOR0r2eaJkzNZVnIj9BfkTXo8vMJSvLHGElsjCb3HXjOntsxTj3EgNRTrGp0Fkru+56KpZRgGzn1yClXHMTij9Sucw7C7/sOYTjRcw30fU9yxhj7vpfdLIILAJua0RD1mtxSOShYzCEEKIrUgpQ/r624GNGLEAIpFjSEcUFhJOYgJ/XXOVecY08RSkqJ2uMS64RCHmzL2v7mkAc6hZZPd8RQgL+lt/VODbQTisz10lSWURvWDLfDIBSZPst+l7aKgB2QrM4LhAJ6kh4iiEZqQM5buQqKyASgWZT5QSQSJc7koNBTuVF4VIy0TiiSEeTQ8ETywYxibWv5IqGklJZlkVblaD6Hkp3jkPhOQlnXtWiRGs+heO+5rn4joTy0G42xxi+YHnL0M2LStwI5LhOKjEBlYemESg3QlHiUUKiDWdSDeOdojqMz8XiXFUIJIchK6FFGFpyq4B/VxLjhYFtGsacnZYsnTSU+Tig6MkPdclJWZhN/I6GcLu9t29Z1HccR9kYSSjHTTI8wmxEPOufGcVTavUwo9GctVMkIDhqQhr3x/FgmAJwyzAr6BW7R6aBQ1DnPM89bNxJKMX1DjzJCQSACKGNxnVBqZ8mLfVjXFYmcxtOf30kotQsHxbFHNO69Lxr230goxafZ1QE5VVBSiYjlI9hGgve+yCzXCEVPYMlqFQ00uhI1Fwn2lXwW2KpatZQmq03RRkIpxiy1R3xR6Odon03KStRmDPylxhPl35yUlajpim8iSL/srqTs83iGUOTVATI26N1lQjmOg0fWgPdeXzYthHI60G8glIxBqKd6pj/bncHlMpDvKwiFOyknu5/Ks3sJ5fjv0eOH9tgam/5pwJsJ5TiOlBLWm6QG+r3ovyAB1i7GZTxDKPzqQOaxSp0/SiiEGCM59nwolZMKN3ooCA1eQSgkA8U4WbxTrBaJ6r7vM7EfzaE8RCjcVdQX71sJhUtWDOE4Ph7y3EUoB6MGORi3HGx7HpcJBV0rqusuQgHoDjFVC8P+ohxKtrpeQSjHvxNgXdfMWylWq2SRX0coEBuErkQ97yaUdt/pLxHK8e9GgBwM8mCVBfbQlZzLuEwoyovQ5DVCIY0VNxSykm/e5ZGFnyEUMrHDMGT5lGK1ynpGVuVeQsGCJVt4mpt/N6EczYHPHyOUWuBTuxwIh/zCOfoLuEwo8jIXgSePrhEKapZzF3fnanW+9BxKu3IyKFLByha/iVHrOMCPFN9LKJnjA36p7Rx/gFAOllhSfKc/RigHG/VMOU9+vuAWXCYU7H2GEKCxdV2zSz0o304omLv8IzL8ZGpLDgWHeo/KyIL1iidldX+hXXuZVNnUgqXJ8gCyWr5wSFr+PY0aoZDlBuO3EwqsGr+hil6Ua1D6/zpC0fOUmdx/hlAO5pcWv2CQ4dEPLD2DZ5KytRs34zjSAtBPtSqPap+e+vnvUi/WyV8kk14c2Qt3eS5oj1CbWqCJ0xZr0tK5BPo/n1rZl0CIU+j/LYQCNy2LJMDphU0Gpf+vI5Sjbq6BP0kosOfZIGXnOOjy6KP3cZ7Bk+dQ+O1buq1b25p9NClb1Ew2NMUXuaNBlr82snQXBoVryn8doYALskaL1aaUsq+Fwrcq6mEcx8x4NxKKcjIYDoEMfP7I9y8MBsM3wAjFYDDcBiMUg8FwG4xQDAbDbTBCMRgMt8EIxWAw3AYjFIPBcBuMUAwGw234B5KOkt5kKKXrAAAAAElFTkSuQmCCAA==" alt="" />

  .........

aaarticlea/png;base64,iVBORw0KGgoAAAANSUhEUgAAAZwAAAAyCAIAAADnQvpEAAAKTElEQVR4nO1d7aGkKgylBWuwBXuwBGuwBTuwg+nACqzABmzADujB9yPvns0lEPFjnFk359euFyGEcBICOG41GAyGB8F9WgCDwWC4EkZqBoPhUTBSMxgMj4KRmsFgeBSM1AwGw6NgpGYwGB4FIzWDwfAoGKkZDIZHwUjNYDA8CkZqBoPhUTBSMxgMj4KRmsFgeBSM1AwGw6NgpGYwGB4FIzWDwfAoREit73uXRlVVbdtO0xS8Vde18lZRFHVdv14v732quWEYNEGdc871fX+yw5nQu1PXddd1y7JEhUyhLMumafRueu+rqirLMqiTP5GADk/2Vz5/vV7OuXEc91ZIIuWMl67qALcZwGFM00SiBnNkGIbX68WffLxHdV3Xdd22bd/38zwrJed57vu+aZqqqs4IvCxL0zT54tFc29vKblIDuq6TEmyiLMuAC9BcURSS8v4Ieq8F5HSnKIqAoXI04NRx6rrO/Z4PeEvp+0lS018vy1IfGqVO6fwk/gVSoz4Gwn+8R1zzbdsqJZumOT8E4zgWRZFvpW8htegL8zxDF9yB6xLM89y2bXRKcw5VWPxmC9C7M45jWZbOuaIoOEfrQo7jCNVFy9B8CBrlUzrlTs+Q2jAMqD9agKTKdLBd17VtO88zVTtN0zRNFAvki5QKdv5efJy/oiBrhCWninnvqcDJXpxfT2RiN6kRSBF8TZRDq7SWCSZnEBim7PirSG1d12VZpIvLEZKcXtSGSKuBBrhyUvIcMxfvPTyN/jppI4diYPqwEPw3XzAjtXsAI6+qStE2+SeYymNJDe4dcUpmrCj1wpefZP3Rlc63kdq6rjTMMv+lC6nkXKItcuU454K8DOGAuUzTBMZB5anC4zjmDO66rsuyjOPIx7Tv+2EYZP5Rl81I7QbAyCnaCBJKALlhsoEnk5o0u/OkBq1FlfuFpCYVdYbUiGLkNgLqJHcaLHhTkmwCpDMMQ35snskyRPekw9RUUbBJaijgvacspHOuqiqsAJZl6bqO54zKspQbXLzjPDlQlmXf99K5LsvSti2cQVVVcr8oEF5mDGFUKWuhXA1vRXoy3so8z03TkGcqiqJt20wXAiOnZUcqmi6KghYWKYGHYWiaBgK7H6KEAiEtB9dP3/fwskVRkM0EcxB5vSAJ473HsoBavJXUaHHuEqTG5ZbWnFJoSngdm5PzfaQG7uYy8FkalEed8zzDYjYl2YRzrm1bai7ndeKOnMwaGG1dVyLiva4on9T48hkr+mEYEHtKcLeBjgfLcEJVVXw45nmOVlsUBZ9mJ0mNpzgVYaIaSImUAjdy4gX5FpkrjbsU2Huv7PNA5k1SAykTiMSDOei9pzKB/cOrwVrOLj+h6BwWQPPRnNq6rsuykNxVVQXvRi1AEV7HJaRG9sRFzRESARd/SJqJNsfrhAKDMxYHSI0785zXwcWb26DjOMKa53muqmrvKjKf1NwPNXvvSSfchGBm3vvX64VYRnac5hWVp2MHfHYRaNrz7mA7j8c4UeGjhiEfQsl1XZMwJDk95JbGNVBVFfXde48e5bgfbuRkWjKsJiMnTyAFhkHyuIzvJXIFRs0MJYuiII1N00TGKecgaAfVwtNzyY+QGuI93p7OAtM0wVBSu5/0XwzhZ/e/N0kN2uQiKULSrCNGk2VIn9EXeXloPjhj8dYjHZBfztU3IZ/U5IqJ7Cd6BkX6VHRcDnRg4dgXCtwJaAh1niG1lEdHK5jMaEX2FLGb1FsAbuRkz1KfJNL/y7qEwNJuYS2cW3VSk4mX6Bykh0jCIETgSthHat77YRgwM3nAlXnaqK7rYAxkc6iK1x81i/dBIbVlWeD2g22NHA24mD+k59EzrkHHYc2b5pKPzNdTFnw58kltV8IO3US1PJ8bFCYfDAOAD4tu1GwKH7Xe4GE0LwGQQYLvFA0gLNDlXIWR0/jyScfXnqlepECF+QzSSU06oegcRCRO54Siw3f88G3ArJs3ClKH6WVXo8mjXQo9j8zDt0EOQi8fzVWvW3NYdlxmHu8hNdKJfkrzEuST2ibFLMsyTdPr9eJZG6m31N4Lt0C83ratcsXiMKlRi6nzYgFVoRU5p/AnRS2EgDUoxOMsSU+g5GgvgqbHcey6DnFPJqlFe50KLKAKglxo7ya11A2hqARIOiBHoDQXfZiv0GuhkJpy5UsKyXMiSMwH2EtqyJgiTryT1A6c8N6LfFJTjlYpI5jjDOSfEEnBDKLsdpjUgthQr1bRQP6BmGBAqYN8BUpmhske7UWwIxwgk9Sivd78k0vkGQ5uFORLgD2j1I5MqrngBENUoUptOi7ZKJBICYkEZ7CHRdhLaiubYORXAx1GN5uUXj+M1IINQVy5ldtku0ht/Z0aBoqi4AHjYVLbzErzat9BautvFiMb0/fB5I4wBT3RxdaFpIZAIb63Jh9dS2rr77SucqE99RbVGTULRXgdN5MaF0xGywdIbWXOapomIzUA/qNpmtSptMOkRqDMctu2fD7LpOebIjWKDN5EaiQDcXSw9oz2AttW8nw1FX4HqSGtRpApiDtIbWUbT3JKK83xEwxRs3gfLie19Sf2lMNwjNT42YV/NqcmC5CSowN3Falx4EwcVm1vyqml/Na1pEYugaKzYO0pBdZ3Nt5HakjD4R9BqHQTqeEsglSB7hLhCh5AaojJI8PgnEt8fEmpE0H45j0nHX/v7ueuQ9o8tqUn+aQGVW8uNW7e/byW1HAUg/obnPAIBFa0FyyzlPJ7SQ0DQUltsskgVLqJ1Nb0IlRvLlhJ/dWktqYXocrJe73OIB2+S1oplVIGB7W+6pyaLEBeUIaT/KT+AVJD9+ViJ7gCfPM5tWtJDQ9JjcF5kUBgeQGcQN8EpD9dS2ryUgE4ju/Y3EdqK8vg8oHcbO6SbzntxZtIbf2ZdYHN0UJbWvNmnYj+DgwZkDPimFfHmtiFM6TGbYx8J90D5Vo6tvzk1wyoZn6CPyenRp5M2UyEknEdIrhRIG9TXk5q/MBE9MQSBAbRB7csgkugUqXzPNMlkJQMKfHwhRtOo3IReiupgWhd7FR3zlsPILVoxApuyjkmEoDvjeySVtaglCGyuCGhtp4jNR4mcJRlKeOdXaSWqjlQS1Q27pgR00VHdu/dz8tJDVQlbxdIgVPS0mf13O8UYeCAKdmST2qpxDqq/XNIWNb1PlJbmRZQMqc5vPUAUlsTEStNGOUrHUqF8I27pAVyhoD8yg1rz/X0kQ76TlzwlYtoXLB3o4BCM1Bb9Eh5VDbvPXgN8zw1stM08a3Vuq6VE7aXk9r6Y4rSgUUF5l83obN7yqGTYRgwLtFb6ynxENlEVzMYLFqE3rGaMOSAbzx9G1LfejMYvhBGal+EXd8suxN19pdvDYaPw0jtizDFfqPg45j2/EaBwfBxGKl9F+SvSX0cVVVFP7drMHwnjNS+C/J3Pz+Lw7/7aTB8CkZqBoPhUTBSMxgMj4KRmsFgeBSM1AwGw6NgpGYwGB4FIzWDwfAoGKkZDIZHwUjNYDA8CkZqBoPhUfgPJcBIAxRwYdgAAAAASUVORK5CYIIA" alt="" />

最终PR值会收敛为稳定值。

1.3 deadends和spider traps

deadends:一个网页没有链接,则最终PR值会收敛为全为0;

spider traps:一个网页只有指向自身的链接,则最终PR值会收敛为该网页为1,其他全为0。

解决方法:

aaarticlea/png;base64," alt="" />

2 mapReduce流程

2.1 输入数据格式

aaarticlea/png;base64," alt="" />

aaarticlea/png;base64," alt="" />

2.2 总体流程

2.3 MR1

  maper1负责读入relation.txt,将数据分割为小单元,计算小单元的转移概率,以小单元的列号为key发送。

  maper2负责读入PR.txt,分割为小单元,按行号为key发送。

  reducer负责将接收到的pr值与转移概率值一一相乘,再乘以beta-1,然后按行号写入HDFS,

    

2.4 MR2

  maper1从HDFS读入数据,发给reducer。

  maper2读取pr.txt,每个单元乘以beta后发往reducer。

  每个reducer将接收到的所有乘积相加,得到一行的结果。

    

2.5 主要代码

UnitMultiplication.java
 import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.chain.ChainMapper;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException;
import java.util.ArrayList;
import java.util.List; public class UnitMultiplication { public static class TransitionMapper extends Mapper<Object, Text, Text, Text> { @Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString().trim();
String[] fromTo = line.split("\t"); if(fromTo.length == 1 || fromTo[1].trim().equals("")) {
return;
}
String from = fromTo[0];
String[] tos = fromTo[1].split(",");
for (String to: tos) {
context.write(new Text(from), new Text(to + "=" + (double)1/tos.length));
}
}
} public static class PRMapper extends Mapper<Object, Text, Text, Text> { @Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String[] pr = value.toString().trim().split("\t");
context.write(new Text(pr[0]), new Text(pr[1]));
}
} public static class MultiplicationReducer extends Reducer<Text, Text, Text, Text> { float beta; @Override
public void setup(Context context) {
Configuration conf = context.getConfiguration();
beta = conf.getFloat("beta", 0.2f);
} @Override
public void reduce(Text key, Iterable<Text> values, Context context)
throws IOException, InterruptedException {
List<String> transitionUnit = new ArrayList<String>();
double prUnit = 0;
for (Text value: values) {
if(value.toString().contains("=")) {
transitionUnit.add(value.toString());
}
else {
prUnit = Double.parseDouble(value.toString());
}
}
for (String unit: transitionUnit) {
String outputKey = unit.split("=")[0];
double relation = Double.parseDouble(unit.split("=")[1]);
//transition matrix * pageRank matrix * (1-beta)
String outputValue = String.valueOf(relation * prUnit * (1-beta));
context.write(new Text(outputKey), new Text(outputValue));
}
}
} public static void main(String[] args) throws Exception { Configuration conf = new Configuration();
conf.setFloat("beta", Float.parseFloat(args[3]));
Job job = Job.getInstance(conf);
job.setJarByClass(UnitMultiplication.class); ChainMapper.addMapper(job, TransitionMapper.class, Object.class, Text.class, Text.class, Text.class, conf);
ChainMapper.addMapper(job, PRMapper.class, Object.class, Text.class, Text.class, Text.class, conf); job.setReducerClass(MultiplicationReducer.class); job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class); MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, TransitionMapper.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, PRMapper.class); FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.waitForCompletion(true);
} }
UnitSum.java
 import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.DoubleWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.chain.ChainMapper;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.MultipleInputs;
import org.apache.hadoop.mapreduce.lib.input.TextInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import java.io.IOException;
import java.text.DecimalFormat; public class UnitSum {
public static class PassMapper extends Mapper<Object, Text, Text, DoubleWritable> { @Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String[] pageSubrank = value.toString().split("\t");
double subRank = Double.parseDouble(pageSubrank[1]);
context.write(new Text(pageSubrank[0]), new DoubleWritable(subRank));
}
} //add a new mapper to read pageRanki.txt, which will add beta*e to result sum
public static class BetaMapper extends Mapper<Object, Text, Text, DoubleWritable> { float beta;
@Override
public void setup(Context context) {
Configuration conf = context.getConfiguration();
beta = conf.getFloat("beta", 0.2f);
} @Override
public void map(Object key, Text value, Context context) throws IOException, InterruptedException {
String[] pageRank = value.toString().split("\t");
double betaRank = Double.parseDouble(pageRank[1]) * beta;
context.write(new Text(pageRank[0]), new DoubleWritable(betaRank));
}
} public static class SumReducer extends Reducer<Text, DoubleWritable, Text, DoubleWritable> { @Override
public void reduce(Text key, Iterable<DoubleWritable> values, Context context)
throws IOException, InterruptedException { double sum = 0;
for (DoubleWritable value: values) {
sum += value.get();
}
DecimalFormat df = new DecimalFormat("#.0000");
sum = Double.valueOf(df.format(sum));
context.write(key, new DoubleWritable(sum));
}
} public static void main(String[] args) throws Exception { Configuration conf = new Configuration();
conf.setFloat("beta", Float.parseFloat(args[3]));
Job job = Job.getInstance(conf);
job.setJarByClass(UnitSum.class); ChainMapper.addMapper(job, PassMapper.class, Object.class, Text.class, Text.class, DoubleWritable.class, conf);
ChainMapper.addMapper(job, BetaMapper.class, Text.class, DoubleWritable.class, Text.class, DoubleWritable.class, conf); job.setReducerClass(SumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(DoubleWritable.class); MultipleInputs.addInputPath(job, new Path(args[0]), TextInputFormat.class, PassMapper.class);
MultipleInputs.addInputPath(job, new Path(args[1]), TextInputFormat.class, BetaMapper.class); FileOutputFormat.setOutputPath(job, new Path(args[2]));
job.waitForCompletion(true);
}
}
Driver.java
 public class Driver {

     public static void main(String[] args) throws Exception {
UnitMultiplication multiplication = new UnitMultiplication();
UnitSum sum = new UnitSum(); //args0: dir of transition.txt
//args1: dir of PageRank.txt
//args2: dir of unitMultiplication result
//args3: times of convergence
//args4: value of beta
String transitionMatrix = args[0];
String prMatrix = args[1];
String unitState = args[2];
int count = Integer.parseInt(args[3]);
String beta = args[4];
for(int i=0; i<count; i++) {
String[] args1 = {transitionMatrix, prMatrix+i, unitState+i, beta};
multiplication.main(args1);
String[] args2 = {unitState + i, prMatrix+i, prMatrix+(i+1), beta};
sum.main(args2);
}
}
}

mapReduce编程之google pageRank的更多相关文章

  1. MapReduce编程之wordcount

    实践 MapReduce编程之wordcount import org.apache.hadoop.conf.Configuration; import org.apache.hadoop.fs.Fi ...

  2. mapReduce编程之auto complete

    1 n-gram模型与auto complete n-gram模型是假设文本中一个词出现的概率只与它前面的N-1个词相关.auto complete的原理就是,根据用户输入的词,将后续出现概率较大的词 ...

  3. mapReduce编程之Recommender System

    1 协同过滤算法 协同过滤算法是现在推荐系统的一种常用算法.分为user-CF和item-CF. 本文的电影推荐系统使用的是item-CF,主要是由于用户数远远大于电影数,构建矩阵的代价更小:另外,电 ...

  4. MapReduce编程之Reduce Join多种应用场景与使用

    在关系型数据库中 Join 是非常常见的操作,各种优化手段已经到了极致.在海量数据的环境下,不可避免的也会碰到这种类型的需求, 例如在数据分析时需要连接从不同的数据源中获取到数据.不同于传统的单机模式 ...

  5. MapReduce编程之Semi Join多种应用场景与使用

    Map Join 实现方式一 ● 使用场景:一个大表(整张表内存放不下,但表中的key内存放得下),一个超大表 ● 实现方式:分布式缓存 ● 用法: SemiJoin就是所谓的半连接,其实仔细一看就是 ...

  6. MapReduce编程之Map Join多种应用场景与使用

    Map Join 实现方式一:分布式缓存 ● 使用场景:一张表十分小.一张表很大. ● 用法: 在提交作业的时候先将小表文件放到该作业的DistributedCache中,然后从DistributeC ...

  7. Hadoop基础-Map端链式编程之MapReduce统计TopN示例

    Hadoop基础-Map端链式编程之MapReduce统计TopN示例 作者:尹正杰 版权声明:原创作品,谢绝转载!否则将追究法律责任. 一.项目需求 对“temp.txt”中的数据进行分析,统计出各 ...

  8. Casbin入选2022 Google编程之夏

    Casbin入选2022 Google编程之夏! Google编程之夏(Google Summer of Code,GSoC),是由Google公司所主办的年度开源程序设计项目,第一届从2005年开始 ...

  9. Android网络编程之HttpClient运用

    Android网络编程之HttpClient运用 在 Android开发中我们经常会用到网络连接功能与服务器进行数据的交互,为此Android的SDK提供了Apache的HttpClient来方便我们 ...

随机推荐

  1. CodeForces 689B Mike and Shortcuts (BFS or 最短路)

    题目链接:http://codeforces.com/problemset/problem/689/B 题目大意: 留坑 明天中秋~

  2. Using Friendly URLs in ASP.NET Web Forms

    Introduction Websites often need to generate SEO friendly URLs. In ASP.NET Web Forms applications, a ...

  3. Java中hashCode的作用

    转  http://blog.csdn.net/fenglibing/article/details/8905007 Java中hashCode的作用 2013-05-09 13:54 64351人阅 ...

  4. Python 对象的引用计数和拷贝

    Python 对象的引用计数和拷贝 Python是一种面向对象的语言,包括变量.函数.类.模块等等一切皆对象. 在python中,每个对象有以下三个属性: 1.id,每个对象都有一个唯一的身份标识自己 ...

  5. svn branch and merge(svn切换分支和合并)详解

    下文的实践主要是参考了TortoiseSVN的帮助文档和Subversion的在线文档,Subversion的在线文档:http://svnbook.red-bean.com/en/1.5/svn-b ...

  6. Scala 中object和class的区别

    Scala中没有静态类型,但是有有“伴侣对象”,起到类似的作用. Scala中类对象中不可有静态变量和静态方法,但是提供了“伴侣对象”的功能:在和类的同一个文件中定义同名的Object对象:(须在同一 ...

  7. 完全迁移到red hat来的相关问题解决和配置

    默认从光盘iso镜像安装iso-1 时, yum.repos.d只有 packagekit-media.repo, 要从网上下载一个 CentOS-Base.repo文件放到这里. redhat上下载 ...

  8. Linux关于vm虚拟机复制后无法启动网卡

    1.一个月前由于自己一直在开发PHP站点,所以把Linux抛出去很长时间没有碰,最近几天把Linux的一些捡起来, 但在我设置vm虚拟机由于在家里做的实验未做完,复制到U盘想到公司接着做没成像,系统是 ...

  9. ACM3 求最值

    /*2*2014.11.18*求最值*描述:给定N个整数(1<=N<=100),求出这N个数中的最大值,最小值.*输入:多组数据,第一行为一个整数N,第二行为N个不超过100的正整数,用空 ...

  10. 淘宝(阿里百川)手机客户端开发日记第十四篇 jsp提交含有上传控件表单乱码问题

    今天我来总结昨天开发的一个简单的jsp web 应用程序时,在做一个调教表单,从servlet端获取数据,这个表单里含有上传文件控件.如果我们在测试的时候,获取数据的是乱码,这时,大家可以先去掉上传控 ...