In two previous blog posts I discussed some techniques for visualizing relationships involving two or three variables and a large number of cases. In this tutorial I will extend that discussion to show some techniques that can be used on large datase…
The R qgraph Package: Using R to Visualize Complex Relationships Among Variables in a Large Dataset, Part One A Tutorial by D. M. Wiig, Professor of Political Science, Grand View University In my most recent tutorials I have discussed the use of the …
factoextra is an R package making easy to extract and visualize the output of exploratory multivariate data analyses, including: Principal Component Analysis (PCA), which is used to summarize the information contained in a continuous (i.e, quantitati…
http://www.clemson.edu/economics/faculty/wilson/R-tutorial/Introduction.html https://www.youtube.com/watch?v=cX532N_XLIs&list=PLqzoL9-eJTNBDdKgJgJzaQcY6OXmsXAHU…
A Complete Tutorial on Tree Based Modeling from Scratch (in R & Python) MACHINE LEARNING PYTHON R   SHARE      MANISH SARASWAT, APRIL 12, 2016 / 52     Introduction Tree based learning algorithms are considered to be one of the best and mostly used s…
Today is a good day to start parallelizing your code. I’ve been using the parallel package since its integration with R (v. 2.14.0) and its much easier than it at first seems. In this post I’ll go through the basics for implementing parallel computat…
It has been possible for some years to launch a web map from within R. A number of packages for doing this are available, including: RgoogleMaps, an interface to the Google Maps api leafletR, an early package for creating Leaflet maps with R rCharts,…
ABSTRACT Recent technological advancement have led to a deluge of data from distinctive domains (e.g., health care and scientific sensors, user-generated data, Internet and financial companies, and supply chain systems) over the past two decades. The…
文章标题 A Tale of Three Apache Spark APIs: RDDs, DataFrames, and Datasets 且谈Apache Spark的API三剑客:RDD.DataFrame和Dataset When to use them and why 什么时候用他们,为什么? tale [tel] 传说,传言;(尤指充满惊险的)故事;坏话,谣言;〈古〉计算,总计 作者介绍 Jules S. Damji是Databricks在Apache Spark社区的布道者.他也是…
最近打算开始写一个多组学(包括宏基因组/16S/转录组/蛋白组/代谢组)关联分析的R包,避免重复造轮子,在开始之前随便在网上调研了下目前已有的R包工具,部分罗列如下: 1. mixOmics 应该是在多组学领域知名度最高的一个R包,有专门的团队,做了十余年了,引用量也比较高. 官网:http://mixomics.org/ 文章:mixOmics: An R package for 'omics feature selection and multiple data integration Gi…