• k6_855698
    了解作者
  • 11.6KB
    文件大小
  • zip
    文件格式
  • 0
    收藏次数
  • VIP专享
    资源类型
  • 0
    下载次数
  • 2022-04-22 02:39
    上传日期
eda.utilsR 由于很少有数据准备立即用于机器学习并进行分析,因此该软件包旨在帮助加快清理过程和进行初始探索性​​数据分析(EDA)的速度。 该软件包专注于处理离群值和缺失值,缩放和关联可视化的任务。 安装 您可以使用以下方法从安装eda.utilsR的发行版本: install.packages( " eda.utilsR " ) 以及来自的开发版本,其中包括: # install.packages("devtools") devtools :: install_github( " UBC-MDS/eda.utilsR " ) 功能 此软件包中包含的四个功能如下: cor_map :用于绘制数据cor_map数字列的相关矩阵的函数 outlier_identifier :识别和处理异常值的功能 scale缩放数据集中数值的函数 imputer :插补缺失值的功能 我们在R
eda_utilsR-main.zip
  • eda.utilsR-main
  • R
  • cor_map.R
    1KB
  • imputer.R
    412B
  • scale.R
    604B
  • man
  • outlier.R
    1KB
  • imputer.Rd
    560B
  • CONTRIBUTING.rst
    3KB
  • NAMESPACE
    62B
  • DESCRIPTION
    579B
  • .Rbuildignore
    56B
  • LICENSE
    41B
  • LICENSE.md
    1KB
  • README.md
    3KB
  • CONDUCT.rst
    3.5KB
  • eda.utilsR.Rproj
    303B
  • .gitignore
    29B
  • README.Rmd
    3.1KB
内容介绍
<!-- README.md is generated from README.Rmd. Please edit that file --> # eda.utilsR <!-- badges: start --> <!-- badges: end --> As data rarely comes ready to be used and analyzed for machine learning right away, this package aims to help speed up the process of cleaning and doing initial exploratory data anslysis (EDA). The package focuses on the tasks of dealing with outlier and missing values, scaling and correlation visualization. ## Installation You can install the released version of eda.utilsR from [CRAN](https://CRAN.R-project.org) with: ``` r install.packages("eda.utilsR") ``` And the development version from [GitHub](https://github.com/) with: ``` r # install.packages("devtools") devtools::install_github("UBC-MDS/eda.utilsR") ``` ## Functions The four functions contained in this package are as follows: - `cor_map`: A function to plot a correlation matrix of numeric columns in the dataframe - `outlier_identifier`: A function to identify and deal with outliers - `scale` A function to scale numerical values in the dataset - `imputer`: A function to impute missing values ## Our Place in the R Ecosystem While R packages with similar functionalities exist, this package aims to simplify the amount of code necessary for these functions and outputs. Packages with similar functionality are as follows: - [scaler](https://www.rdocumentation.org/packages/coop/versions/0.6-2/topics/scaler) - [ggcorr](https://www.rdocumentation.org/packages/GGally/versions/1.5.0/topics/ggcorr) - [mice](https://cran.r-project.org/web/packages/mice/mice.pdf) - [OutlierDetection](https://cran.r-project.org/web/packages/OutlierDetection/OutlierDetection.pdf) ## Dependencies - TBD ## Usage The eda.utilsR package help you to build exploratory data analysis. eda.utilsR includes multiple custom functions to perform initial exploratory analysis on any input data describing the structure and the relationships present in the data. The generated output can be obtained in both object and graphical form. The eda.utilsR is capable of : - Diagnose data quality : Resolve skewed data by identifying missing data and outlier and provide corresponding remedy. - Discover data: Plot correlation matrix to help explore data to understand the data and find scenarios for performing the analysis. - Machine learning perpetration : Perform column transformations, derive scaler automatically to fulfill further machine learning need ## Documentation The official documentation is hosted on Read the Docs: <https://eda.utilsR.readthedocs.io/en/latest/> ## Contributors This package is authored by Chuang Wang, Fatime Selimi, Jiacheng Wang, and Micah Kwok as part of the course project in DSCI-524 (UBC-MDS program). You can see the list of all contributors in the [contributors tab](https://github.com/UBC-MDS/eda.utilsR/graphs/contributors). We welcome and recognize all contributions. If you wish to participate, please review our [contributing guidelines](https://github.com/UBC-MDS/eda.utilsR/blob/main/CONTRIBUTING.rst).
评论
    相关推荐
    • Video Titling Dataset-数据集
      Video Titling Dataset video tilting dataset.xlsx
    • ImageNet-数据集
      ImageNet is provided by Standford University.本数据集由斯坦福大学提供。 imagenet_ILSVRC2017_datasets.zip
    • NarrativeQA Dataset叙事问答数据集-数据集
      数据集由Deep Mind提供。 narrativeQA_qaps_datasets.txt narrativeQA_thirdpartywikipediasummaries_datasets.txt narrativeQA _compare_datasets.sh narrativeQA _documents_datasets.csv narrativeQA _...
    • WikiMovies 数据集
      WikiMovies, a QA dataset that contains raw text alongside a preprocessed KB, in the domain of movies.
    • Video Games视频游戏-数据集
      数据集是从gamepot.com上抓取的。GameSpot是一个视频游戏网站,可提供有关视频游戏的新闻,评论,预览,下载和其他...该数据集包括14802个游戏速率和具有“游戏名称,控制台,得分”等属性的评论作业。 Games.csv
    • CIFAR-100 Dataset-数据集
      CIFAR-100 Dataset is provided by Canadian Institute for Advanced Research.本数据集由加拿大高级研究所提供。 cifar100_test.zip cifar100_valid.zip cifar100_train.zip
    • Uplift Dataset增量数据集-数据集
      Uplift Dataset is provided by Alimama, Alibaba. Data is collected from ads on Alimama ... 本数据集由阿里妈妈提供,来源于阿里妈妈品牌广告业务场景,采样了部分优酷资源位的广告数据日志。 Terms of Use.pdf
    • 数据集插值生成DEM
      supermap iDesktop 本插件通过矢量点数据集插值生成DEM
    • Spooky Author Identification-数据集
      Share code and discuss insights to identify horror authors from their writings
    • FIRE 视网膜图像数据数据集
      FIRE 是一个视网膜眼底图像数据集,包含 129张 眼底视网膜图像,由不同特征组合成 134对 图像组合。这些图像组合 根据特质被划分为3类。眼底图像由 Nidek AFC-210 眼底照相机采集,分辨率为2912x2912,视觉仰角为40...