Assignment #4
Course: ISA 414
Points:100
Due date: November 18th
, 2019, before 11:59 pm
Submission instructions: this assignment is to be done individually. All your
answers should be in a single R script. Your code must be well formulated (i.e., no
errors) and sound (i.e., it does what the question asks it to do). In particular, the
grader must be able to open your .R file using RStudio and run the code without
running into errors. Code with errors may receive zero points. Submit the final
document on Canvas before the due date.
Question 1: suppose you are responsible for developing the code that computes the
number of times each word appears on Twitter every day. One can use these
frequencies, for example, as input when calculating the daily trending words.
Clearly, Twitter’s data are massive. Being an expert in Hadoop, you quickly realize
that you can use MapReduce to complete your task. I highly encourage you to use
Remote Desktop Connection to complete this question since all of the required
libraries are already installed there, and these libraries are not straightforward to
install. Also, make sure you set the version of R to 3.4.3.
a) To test your solution, you will be working with a sample of Twitter’s data. Start
by loading the file tweets_asst_4.csv (available on Canvas) to R using the
read.csv command (remember to set the argument stringAsFactors = FALSE).
Next, upload the resulting data frame to HDFS using the command to.dfs. [10
points]
b) Define a map function to solve your task. Hint: you might want to consider keys
created by combining (“pasting”) the date a tweet was tweeted with each word in
the tweet. For example, for a tweet tweeted on April 10th, 2017 containing the
word “spider”, a possible key returned by the map function would be 2017-04-
10_spider. [20 points]
c) Define a reduce function that counts the number of times each word appears on
Twitter per day. Hint: see the reduce function in the word-counting example we
covered in class. [10 points]
d) Run the mapreduce function using the data in a), the map function in b), and the
reduce function in c). Thereafter, retrieve the final output from HDFS and display
the same as a data frame (table). [10 points]
Question 2 – real-life case study: bol.com
A 2015 study sponsored by the Dutch electronic-commerce company bol.com, led
by Arthur Carvalho (previously: Rotterdam School of Management – Erasmus
University; currently: Farmer School of Business – Miami University) and Esther
Hundepool (PwC), investigated some of the factors that affect customers’
willingness-to-buy in B2C e-commerce environments. The case below is an
adaptation of the above study.
Business Understanding:
Over the past 20 years, the Internet has changed the way consumers buy goods and/or
ISA 414作业代写、R程序语言作业调试
services. Ranging from groceries to vacation packages and clothing, more and more
people are using the Internet to shop online. The online selling of products and/or
services by businesses to consumers is often defined as business-to-consumer (B2C)
electronic commerce (e-commerce).
E-commerce makes up a big share of the retail industry, often providing more
product choices and faster delivery time than "bricks-and-mortar" retailers do. The
transactions related to B2C e-commerce in Western Europe totaled 177.7 billion
euros in 2013, an increase of 12 percent when compared to the previous year.
Another interesting fact is that 95 million consumers in Western Europe bought
goods and/or services online in 2013. The total e-commerce sales in the United
States amounted to 1,233 billion US dollars in 2013. It is clear that e-commerce is a
booming business, which creates an extensive array of research opportunities, e.g.,
understanding the factors that influence customers' willingness-to-buy in B2C ecommerce
environments.
One can argue that trust perception is one of the biggest barriers for consumers to
engage in electronic commerce. A potential lack of trust will likely discourage
consumers to participate in online shopping. Therefore, it is interesting to study how
to manage trust in e-commerce environments as well as to study the influence of
different types of trust on consumers' willingness-to-buy online.
In addition to trust perception, risk perception can be another challenging factor in
e-commerce. Different types of risk perception are likely to influence consumers'
attitude towards online transactions.
Finally, consumers' demographic traits might also be of influence when it comes to
online shopping behavior.
The goal of this study is to investigate the variables that either positively or
negatively significantly influence customers’ willingness-to-buy in B2C ecommerce
environments. Following the above background sketch, one can
formulate the underlying business problem as:
What are the determinants of customers' willingness-to-buy in B2C e-commerce
environments?
In particular, this study aims at measuring the effects of perceived risk and perceived
trust on consumers' willingness-to-buy online. As e-commerce sales are expected to
continue growing over the years, understanding these factors, and how to effectively
deal with them, will play a crucial role in online strategies of companies engaging
in e-commerce.
Data Understanding:
The data in this study were collected by means of an electronic survey developed in
partnership with PwC and bol.com. To illustrate the process of online shopping, the
survey started by showing the respondents a 5-minute video containing an actual
browsing and shopping behavior on bol.com, the number one online retailer in the
Netherlands. Specifically, after exhibiting some features of the website, the video
showed a search for and a purchase of a digital camera.
When the video was over, the survey showed a web page from bol.com containing
a detailed description of the purchased camera. Following the video and product
description, the survey measured three dimensions of perceived risk and three
dimensions of perceived trust using five question-items per dimension. The six
dimensions are: Perceived Product Risk (PPR), Perceived Informational Risk (PIR),
Perceived Economic Risk (PER), Perceived Integrity (PI), Perceived Safety (PS),
and Perceived Benevolence (PB).
Next, the survey measured the main dimension of interest, Willingness-to-Buy
(WTB), using five question-items. All the question-items used a 0-100 scale. Think
about a chosen scale-value as the likelihood (represented in percentage values) that
the respondent agrees with a statement in the question-item. At the end, the survey
collected demographic information, such as respondents' age, income, and gender.
The survey was available from March 17th, 2015 to April 18th, 2015. We invited
participants via social networks and by sending emails to subject pools from
Rotterdam School of Management at Erasmus University, and the office of the
company PricewaterhouseCoopers (PwC) located in Rotterdam (the Netherlands).
In total, 360 participants started the survey.
After the data collection phase, we prepared the resulting data set for posterior
analysis by removing all incomplete survey responses, which resulted in a total of
199 full observations in the data set, a completion rate of 55.27%. We show below
the structure of the survey we used to collect data (translated from Dutch):
 Perceived Product Risk (PPR)
- PPR_1: I think this product will perform as expected.
- PPR_2: The product purchased will likely not perform as expected.
- PPR_3: I think it is difficult to judge the quality of this product adequately.
- PPR_4: In case of a product purchase on this website, it is likely to fail the
performance requirements originally intended.
- PPR_5: I believe the likelihood is high that something is wrong with the
performance of this product.
 Perceived Informational Risk (PIR)
- PIR_1: It is clear to me whether Bol.com intends to give my personal
information to third parties.
- PIR_2: I believe this website will protect my personal information from
exposure to third parties.
- PIR_3: I believe Bol.com does not intend to misuse the personal
information provided by me.
- PIR_4: I believe Bol.com will protect and store my personal information
correctly.
- PIR_5: I believe Bol.com is likely to misuse my personal information.
 Perceived Economic Risk (PER)
- PER_1: Purchasing from this website would involve economic risk (fraud,
hard to return).
- PER_2: I believe I can return this product and get a refund easily.
- PER_3: I believe there is a high chance that I stand to lose money if I
purchase this product.
- PER_4: When I purchase this item from Bol.com I have the chance of
financial loss.
- PER_5: I believe there is a great chance I do not receive the intended
product.
 Perceived Integrity (PI)
- PI_1: Bol.com acts sincere in dealing with their customers.
- PI_2: I believe this online shop is honest to their customers.
- PI_3: I believe Bol.com would keep its promise.
- PI_4: I would characterize Bol.com as honest.
- PI_5: Bol.com acts truthful in dealing with their customers.
 Perceived Safety (PS)
- PS_1: I believe this online shop has sufficient technical capacity to ensure
my data cannot be intercepted by hackers.
- PS_2: I believe this online shop shows great concern for the security of
any of the transactions.
- PS_3: I think this online shop has mechanisms to ensure the safe
transmission of my information.
- PS_4: I believe to have a safe transaction when purchasing from Bol.com.
- PS_5: Purchasing from this online shop is safe.
 Perceived Benevolence (PB)
- PB_1: When problems occur, I believe this website will be prepared to
solve my problems.
- PB_2: In case of a problem, I believe it will be easy to report a complaint
to this website.
- PB_3: I believe, when required, Bol.com would do its best to offer help.
- PB_4: In case of a problem, I believe this website will make all the
necessary efforts to solve it.
- PB_5: I believe this online shop keeps the well-being of the consumer
needs in mind.
 Willingness to Buy (WTB)
- WTB_1: The likelihood that I would shop at this online shop is high.
- WTB_2: I would consider buying this product at this price.
- WTB_3: I would be willing to recommend this online shop to friends.
- WTB_4: I would be willing to buy at this online shop.
- WTB_5: It is likely that I will purchase at this online shop.
 Demographics:
- Gender: What is your gender?
 Male
 Female
- Age: What is your age?
 Below 18 years old
 Between 18 and 25 years old
 Between 26 and 35 years old
 Between 36 and 45 years old
 Between 46 and 55 years old
 Above 55 years old
- Income: What is your current yearly income?
 Less than $20.000
 Between $20.000 and $35.000
 Between $35.000 and $50.000
 Between $50.000 and $65.000
 More than $65.000
 I prefer not to say
Data Preparation:
It is now time to analyze our data in order to provide an answer to the business
problem. From now on, you will be using the Spark technology in conjunction with
R programming language. I highly encourage you to use Remote Desktop
Connection to complete this question. Make sure you set the version of R to 3.6.1.
Then, run the following commands to install the required libraries:
install.packages("sparklyr")
spark_install(version = "2.0.2")
a) Start by downloading the data set bol.csv from Canvas. Next, run the following
commands to load the data locally, connect to a Spark cluster, and send the survey
data to the Spark cluster. [0 points]
library("sparklyr")
library("dplyr")
survey_data <- read.csv("bol.csv")
sc <- spark_connect(master = "local", version = "2.0.2")
survey_tbl <- copy_to(sc, survey_data, "survey", overwrite = TRUE)
Unless otherwise stated, all the following questions must be answered with code that
is executed on the Spark cluster. You should expect to use functions from the R
package dplyr in conjunction with Spark.
b) Note that the scales of PPR_1, PIR_5, and PER_2 are different from the scales of
the other items in their dimensions (constructs). For example, the scale of PPR_1 is
increasing in positivity, whereas the scales of PPR_2, PPR_3, PPR_4, and PPR_5
are decreasing in positivity. Hence, you have to transform the scales for the sake of
consistency. The goal of this preprocessing step is to have all risk-related variables
using scales in increasing negativity, and all trust-related variables using scales in
increasing positivity. To do so, transform (mutate) the variables PPR_1, PIR_1,
PIR_2, PIR_3, PIR_4, and PER_2 by subtracting their original values from 100, e.g.,
the new values of PPR_1 must be equal to 100 minus the old values. These
transformations should change the data set in the Spark cluster. [10 points]
c) After fixing the scales, it is now time to create our variables. Remember that we
measured each risk and trust dimensions using five question-items. Since the
question-items are highly subjective, one should expect that the respondents’
answers contain some “random component”. A common approach to eliminate some
of this “randomness” is by averaging the values of the question-items across each
dimension. In practice, one would have to perform reliability analysis and check for
internal consistency before doing so (e.g., performing a confirmatory factor analysis
and calculating Cronbach’s alpha), but this is beyond the scope of this assignment.
Using the mutate function from dplyr, add the following features to the data set in
the Spark cluster: [10 points]
PPR = (PPR_1 + PPR_2 + PPR_3 + PPR_4 + PPR_5)/5
PIR = (PIR_1 + PIR_2 + PIR_3 + PIR_4 + PIR_5)/5
PER = (PER_1 + PER_2 + PER_3 + PER_4 + PER_5)/5
PI = (PI_1 + PI_2 + PI_3 + PI_4 + PI_5)/5
PS = (PS_1 + PS_2 + PS_3 + PS_4 + PS_5)/5
PB = (PB_1 + PB_2 + PB_3 + PB_4 + PB_5)/5
WTB = (WTB_1 + WTB_2 + WTB_3 + WTB_4 + WTB_5)/5
Data Modeling:
d) Next, you will build an explanatory model that tries to relate the risk and trust
dimensions to willingness-to-buy. To simplify the analysis, ignore the demographic
variables in the data set. Using the ml_linear_regression function from the sparklyr
package, build a linear regression model where the dependent variable is WTB and
the independent variables are PPR, PIR, PER, PI, PS, and PB. Apply the summary
function to your model to retrieve coefficients and associated p-values. [10 points]
Conclusion:
e) Given the coefficients and p-values from above, which actions would you suggest
bol.com to take to increase consumers’ willingness-to-buy? List and carefully
explain at least three features that bol.com could add to its website to alleviate some
significant risk and trust perception issues, e.g., money back guarantees to reduce
perceived economic risks, online reviews to decrease perceive product risk, etc.
(sloppy answers will receive zero points) [20 points]

因为专业,所以值得信赖。如有需要,请加QQ:99515681 或邮箱:99515681@qq.com

微信:codehelp

Course: ISA 414的更多相关文章

  1. Is-A,Has-A,Use-A(转载)

    原文地址:http://blog.csdn.net/loveyou128144/article/details/4749576 而Is-A,Has-A,Use-A则是用来描述类与类之间关系的.简单的说 ...

  2. 《Entity Framework 6 Recipes》中文翻译系列 (10) -----第二章 实体数据建模基础之两实体间Is-a和Has-a关系建模、嵌入值映射

    翻译的初衷以及为什么选择<Entity Framework 6 Recipes>来学习,请看本系列开篇 2-11 两实体间Is-a和Has-a关系建模 问题 你有两张有Is-a和Has-a ...

  3. 【深入理解计算机系统02】ISA 与内存模型

    第二篇:认识ISA(Instruction Set Architecture) 重要概念: [ISA] [IA-32]:Intel把32位x86架构的名称x86-32改称为IA-32,一种身边很常见的 ...

  4. iOS NSObject 的 isa 属性的类型 Class

    以前对NSObject的isa属性也知道点,但是了解不深,今天看了这篇博文,感觉很好,总结一下: http://chun.tips/blog/2014/11/05/bao-gen-wen-di-obj ...

  5. Effective C++ -----条款32:确定你的public继承塑模出is-a关系

    “public继承”意味is-a.适用于base classes身上的每一件事情一定也适用于derived classes身上,因为每一个derive class对象也都是一个base class对象 ...

  6. isa指针

    转载自 http://www.cnblogs.com/zhangdashao/p/4438540.html 可以去这里看详细的. 每个Objective-C对象都有一个隐藏的数据结构,这个数据结构是O ...

  7. nginx 414 Request-URI Too Large

    症状 nginx 414 Request-URI Too Large #客户端请求头缓冲区大小,如果请求头总长度大于小于128k,则使用此缓冲区, #请求头总长度大于128k时使用large_clie ...

  8. 关于oc运行时 isa指针详解

    Cocoa框架是iOS应用程序的基础,了解Cocoa框架,对开发iOS应用有很大的帮助. 1.Cocoa是什么? Cocoa是OS X和 iOS操作系统的程序的运行环境. 是什么因素使一个程序成为Co ...

  9. isa class 帮助确定对象或变量的数据类型

    isa class 帮助确定对象或变量的数据类型

随机推荐

  1. 11-scrapy(递归解析,post请求,日志等级,请求传参)

    一.递归解析: 需求:将投诉_阳光热线问政平台中的投诉标题和状态网友以及时间爬取下来永久储存在数据库中 url:http://wz.sun0769.com/index.php/question/que ...

  2. Protractor-引入Cucumber

    上一篇博文中我们已经在package.json中写入了cucumber依赖库,在执行 npm install 之后,cucumber就已经下载好了.接下来要做的是修改conf.js,请参考下图: 去年 ...

  3. 【面试】Java中sleep和wait的区别

    1.sleep方法是Thread类的静态方法: wait方法是Object类的成员方法 2.sleep方法使当前线程暂停执行指定的时间,让出cpu给其他线程,但是它的监控状态依然保持着,当指定的时间到 ...

  4. sed命令:删除匹配行和替换

    删除以a开头的行 sed -i '/^a.*/d' tmp.txt -i 表示操作在源文件上生效.否则操作内存中数据,并不写入文件中.在分号内的/d表示删除匹配的行 替换匹配行: sed -i 's/ ...

  5. python基础(11):函数(一)

    1. 什么是函数 1.我们到⽬前为⽌,已经可以完成⼀些软件的基础功能了.那么我们来完成这样⼀个功能: 约会: print("拿出⼿机") print("打开陌陌" ...

  6. Linux问题记录——主机名变成了bogon

    Linux问题记录——主机名变成了bogon 摘要:本文主要记录了主机名变成bogon的原因以及解决办法. 问题重现 主机名在一次登录后,变成了bogon,此后每次登录Linux系统时都是bogon. ...

  7. 基于token机制鉴权架构

    常见的鉴权方式有两种,一种是基于session,另一种是基于token方式的鉴权,我们来浅谈一下两种 鉴权方式的区别. 两种鉴权方式对比 session 安全性:session是基于cookie进行用 ...

  8. 协议分析中的TCP/IP网络协议

    - 链路层 第一层包含源mac及目的mac,随着传输路径的变化会发生变化,在mac之后,是下层网络协议的类型,图中,下层为IP协议. 在协议解析中,需要关注的是图中type这个字段的内容. - 网络层 ...

  9. 微信小程序踩坑日记4——真机端解析json数组和开发平台不一样

    0. 引言 环境:访问服务器端php,获取json数组,并渲染在前台 问题描述:保证在开发平台上的正常运行,但是在真机端却出现了无法正确解析wx.request()返回的数据(特指无法解析res.da ...

  10. 函数的名称空间,函数的嵌套(global, nonlocal),函数名的运用

    一 函数的名称空间 内置空间:存放python解释器内置函数的空间 全局空间:py文件运行时开辟的,存放的是执行的py文件(出去函数内部)的所有的变量与值的对用关系,整个py文件结束后才会消失. 局部 ...