around-dataengineering

所属分类:人工智能/神经网络/深度学习
开发工具:Python
文件大小:11815KB
下载次数:0
上传日期:2022-10-17 16:10:25
上 传 者sh-1993
说明:  围绕数据工程,数据工程和机器学习知识中心
(around-dataengineering,A Data Engineering & Machine Learning Knowledge Hub)

文件列表:
.DS_Store (8196, 2022-10-18)
A Data Engineering Story.pptx (762204, 2022-10-18)
WhyDataOrchestration.pdf (650273, 2022-10-18)
dags (0, 2022-10-18)
dags\Dockerfile (168, 2022-10-18)
dags\example_pod.py (1947, 2022-10-18)
dags\python_print.yaml (227, 2022-10-18)
dags\test.py (57, 2022-10-18)
docs (0, 2022-10-18)
docs\2021_technologies (0, 2022-10-18)
docs\2021_technologies\index.md (176209, 2022-10-18)
docs\2phasecommit_ds.md (1724, 2022-10-18)
docs\Algorithms behind Data Storage System.md (8593, 2022-10-18)
docs\Azure_Csomos_DB.md (1324, 2022-10-18)
docs\Dapper_DistributedTracing.md (2310, 2022-10-18)
docs\Large_scale_in-memory.md (6336, 2022-10-18)
docs\Metadata_management.md (5925, 2022-10-18)
docs\Paxos_vs_Raft.md (2813, 2022-10-18)
docs\Reading_ardound_database.md (1020, 2022-10-18)
docs\RocksDB_WTH.md (4331, 2022-10-18)
docs\SQL_Preparation.md (1138, 2022-10-18)
docs\ZippyDB.md (4851, 2022-10-18)
docs\_config.yml (48, 2022-10-18)
docs\amundsen_review.md (3110, 2022-10-18)
docs\apache_iceberg_read.md (1083, 2022-10-18)
docs\apacheparquet.md (163, 2022-10-18)
docs\blog1 (0, 2022-10-18)
docs\blog1\index.md (8914, 2022-10-18)
docs\cockroachdb.md (5822, 2022-10-18)
docs\consistent_hashing.md (3019, 2022-10-18)
docs\dblog_netflix_cdc.md (3164, 2022-10-18)
docs\de_getting_complicated.md (2211, 2022-10-18)
... ...

# A very Long never ending Learning around Data Engineering & Machine Learning ![](https://github.com/abhishek-ch/around-dataengineering/blob/master/sketchnotes/background.jpeg) ## New Tech * [Dragonfly is a faster Redis or Memcached alternative, that I recently tried.](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_github-dragonflydbdragonfly-a-modern-activity-6***7806780424617***4-sfOk?utm_source=share&utm_medium=member_desktop) ## Interesting Reads * [How to choose a Distributed Database](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://github.com/abhishek-ch/around-dataengineering/blob/master/docs/how_to_choose_db.md) * [Cockroach DB Architecture](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_cockroachdbarchitecture-activity-6834117247510855680-VeCF) * [Amundsen Review](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://github.com/abhishek-ch/around-dataengineering/blob/master/docs/amundsen_review.md) * [Deep Dive - Foundation DB](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://github.com/abhishek-ch/around-dataengineering/blob/master/docs/foundationdb.md) * [The What, Why, and When of Single-Table Design with DynamoDB](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.alexdebrie.com/posts/dynamodb-single-table/) * [How To Manage And Monitor Apache Spark On Kubernetes](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.lightbend.com/blog/how-to-manage-monitor-spark-on-kubernetes-deep-dive-kubernetes-operator-for-spark) * [Git is hard: screwing up is easy, and figuring out how to fix your mistakes is ***ing impossible](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://ohshitgit.com/) * [8 Practical Use Cases of Change Data Capture](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://medium.com/event-driven-utopia/8-practical-use-cases-of-change-data-capture-8f059da4c3b7) * [Apache Iceberg- Links](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://github.com/abhishek-ch/around-dataengineering/blob/master/docs/apache_iceberg_read.md) * [Kubernetes Port Forwarding Manager](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_kubernetes-kuberforwarder-dataengineering-activity-6821345311655555072-aEqx) * [Querying Parquet with Precision using DuckDB - Much faster compared to Pandas](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_querying-parquet-with-precision-using-duckdb-activity-6821024627578466304-TgRS) * [What is Apache Pinot - Usecases & Architecture](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_building-latency-sensitive-user-facing-analytics-activity-6818810335286370304-gZAQ) * [Change Data Streaming Patterns in Distributedsystems](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_cdcpatterns-activity-6828573352563654656-CDS1) * [Cuckoo Hashing - An alternative to chaining and linear probing for collision handling ](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_cuckoo-hashing-activity-6832911622126784512-3rZB) * [Riak Database](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://github.com/abhishek-ch/around-dataengineering/blob/master/docs/riakdb.md) * [Database Indexing](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_databaseindexes-activity-6844509295476924416-Jzto) * [Parallel Databases using Map Reduce](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_parallel-databases-using-map-reduce-activity-6858620336083136512-NtTz) * [REST vs GraphQL](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_restapi-vs-graphql-activity-68797129732843***288-O8ad) * [Linux Namespace & Control Group(cgroup)](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_namespaces-and-cgroups-the-basis-of-linux-activity-6897116562046672897-B5va) * [SQL Lexical Structure](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_database-golang-dataengineering-activity-6911963969658220544-LCxB) * [Everything about the Linux kernel](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_github-0xaxlinux-insides-a-little-bit-activity-6932325041955061760-QlwG?utm_source=linkedin_share&utm_medium=member_desktop_web) ## Weekly Digest * [How #dataengineering get complicated over time](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-realtime-streamprocessing-activity-6808383006567383040-xY-g) * [What is eBPF - Sandboxing Programs inside #linux Kernel](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_facebook-google-isovalent-microsoft-and-activity-6831817260462608384-p6QN0) * [Absolute Basic Explanation of SSTable & Log Structured Merge Trees - Sorted String Table & Faster Random Writes](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_datastructures-cassandra-dataengineering-activity-6833629108484784128-u-bN) ## The Data Engineering #### Level 0 * [Getting started with #dataengineering Volume 6 ‰](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-datapipelines-dag-activity-677025***33363***0288-7biE) * [Getting started with Dataengineering Volume 5 ‰](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-infrastructure-docker-activity-67634295247***975104-lVvy) * [Getting started with Data Engineering, volume 4 ‰’](https://github.com/abhishek-ch/around-dataengineering/blob/master/sketchnotes/Getting_started_with_de_vol4.PNG) * [Getting started with Data Engineering, volume 3 ‰’](https://github.com/abhishek-ch/around-dataengineering/blob/master/sketchnotes/Getting_started_with_de_vol3.png) * [Getting started with Data Engineering, volume 2 ‰’](https://github.com/abhishek-ch/around-dataengineering/blob/master/sketchnotes/Getting_started_with_de_vol2.png) * [Getting started with Data Engineering, volume 1 ‰’](https://github.com/abhishek-ch/around-dataengineering/blob/master/sketchnotes/gettingstart_dataengg.png) * [Getting started with #dataengineering from basics](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-mapreduce-docker-activity-6803350978432180224-GnWW) * [Apache Airflow 2.0](https://github.com/abhishek-ch/around-dataengineering/blob/master/sketchnotes/Airflow_2_0.jpg) * [Some Interesting essentials while learning Apache Airflow](https://github.com/abhishek-ch/around-dataengineering/blob/master/sketchnotes/airflow_checklist.png) * [Dagster Release 0.10.0 - Everything about Exactly-once, Fault-Tolerant Scheduling - Extremely Important Release ‰‰‰](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dagster-machinelearning-kubernetes-activity-6757931107041255424-KAG1) * [#getdbt or Data Build Tools interface across all major Data Workflow Management Platform ’¨”](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_getdbt-apacheairflow-dagster-activity-6750389785275240448-N8t_) * [Apache Superset - An #opensource Fully Featured Business Intelligence Application ](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apache-superset-10-is-out-activity-6759044550578229248-rerO) * [The Hop Orchestration Platform, or Apache #Hop (Incubating), aims to facilitate all aspects of data and metadata orchestration ’’](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_hop-apachespark-apacheflink-activity-6761***1585437396993-ydHg) * [Apache Iceberg Partitioning is way better than Hive ! Hidden Partitioning makes everything easier! ‰](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apacheiceberg-dataengineering-bigdata-activity-67***56***90650238976-0skJ) * [Trino aka #prestosql is different from Apache Spark SQL - Exclusively designed for Distributed SQL ‰](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_prestosql-bigdata-apachespark-activity-67***906382325010432-qg00) * [Apache Spark is NOT a Map but an MPP/MPI Engine](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apachespark-mapreduce-dataengineering-activity-676555056052447***16-2c7f) * [Apache Hudi - Design Principles](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apachehudi-hadoop-streamprocessing-activity-6767445258792947712-m9c2) * [OpenTelemetry specification V1.0](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_opentelemetry-specification-v100-tracing-activity-6767783538310873089-N9q5) * [Everything Around PySpark Pandas UDF “–](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_revisitingpandasudf-activity-6775378803897221120-rS3z) * [Important skill-set of a Dataengineer - Reduce Cost](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineer-cloud-dataengineering-activity-6783023471485091840-2QfG) * [Everything on PyFlink - Python with Apache Flink](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apacheflink-dataengineering-python-activity-6785834728302948353-0I0R) * [Delta Lake Cheat Sheet](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_deltalakecheatsheet-activity-6787337208899678208-LSG3) * [Dataengineering schedule breakdown, a very flexible estimate](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-distributedsystems-mlops-activity-6791336683653***8384-RY3_) * [Parquet - Introduction & Design, An OpenSource File Format](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apacheparquet-protocolbuffers-dataengineering-activity-6823244946166857729-0CZf) * [SQL - Avoiding Antipatterns](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_sqlantipattern-activity-682***91257410396160-xX5P) * [Explaining Apache Kafka - In children's book format](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apachekafka-streamprocessing-dataengineering-activity-68115384087631216***-5qpL) * [The Perfect #dataengineering: Top INVALID Reasons behind #datapipelines failures](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-datapipelines-data-activity-6839946812015603712-FnZq) * [What is ETL](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_etl-introduction-ugcPost-68***433492600635392-sbBl) * [What is Proxy & Reverse Proxy](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-activity-6907655591100354560-5DDk) #### Level 1 * [DataEngg Skills to work with DataScience](https://github.com/abhishek-ch/around-dataengineering/blob/master/sketchnotes/DE_skills_work_with_DS.jpg) * [Data Quality, A necessity for Data Driven Projects](https://github.com/abhishek-ch/around-dataengineering/blob/master/sketchnotes/arond_dq.png) * [Essential Cloud Skills for Data Engineering](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-machinelearnig-aws-activity-6758329553036345346-b-40) * [Open Source Technologies in Data Engineering](https://github.com/abhishek-ch/around-dataengineering/blob/master/sketchnotes/DE_OS.jpg) * [Kubernetes Fundamentals Required as a Data Engineer](https://github.com/abhishek-ch/around-dataengineering/blob/master/sketchnotes/kubernetes_fundamentals.png) * [Apache Superset, OSS Business Intelligence for 2021](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apache-superset-10-is-out-activity-6759044550578229248-rerO) * [#apachekafka as a Database - Summary on both the sides , Arguments, Trade-offs & exceptional ’ quotes ’](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_kafka-as-a-database-yes-or-no-a-summary-activity-6757228852923158528-qVBc) * [Processing Guarantees in #apachekafka ’”‰ - The best resource](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_processing-guarantees-in-kafka-activity-6756149463359791104-sJcD) * [Change Data Analysis with Debezium and Apache Pinot ‰’](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apachepinot-debezium-eventsourcing-activity-6754311102718382080-gWL1) * [Optimizing Apache Kafka Producers & Consumers “‰](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apachekafka-strimzi-streamprocessing-activity-6753977207800037376-kNWi) * [Redpanda -A NON-JVM Streaming Platform for mission critical workloads ’‰”](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_redpanda-activity-6749***0173656559616-Ol6C) * [Apache Hudi - Turn Batch Jobs to Incremental Model | Complete file management on a Data Lake](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_building-a-large-scale-transactional-data-activity-6760855681277***0672-kiZF) * [Apache Iceberg - an open table format for huge analytic datasets](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apache-iceberg-activity-6760465145954131968-v0tp) * [Ballista - Distributed computing platform built primarily on Rust and powered by Apache Arrow](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_ballista-a-distributed-compute-platform-activity-6763021778580246528-Zwj8) * [ZooKeeper, a distributed, open-source coordination service for distributed applications](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_zookeeperpaper-activity-67***214157366620160-84rM) * [Apache Iceberg - Partition Evolution, its simple but its so amazing](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apacheiceberg-dataengineering-datascience-activity-6766299661784403968-T_-i) * [ApacheKafka without ZooKeeper Sneak Peak”](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apachekafka-distributedsystems-zookeeper-activity-6782957154392432***0-3s5r) * [Why Data Discovery is important for Data Engineering](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-bigdata-datagovernance-activity-679752***72681771008-C-XM) * [Queue vs Log - Event driven Architecture](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_eventsourcing-distributedsystems-bigdata-activity-67***5545668005232***-xa88) * [Database Indexing](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_postgres-indexes-for-newbies-activity-688***45127477563392-04Kp) #### Level 1.1 * [Multiple criteria search at scale with Apache Pinot & Theta Sketches](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_solving-for-the-cardinality-of-set-intersection-activity-678***46326893969408-4exe) * [VM vs Containers - Similar but Different](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_docker-dataengineering-bigdata-activity-6790586***2504724480-Zq0e) * [State of Trino aka PrestoSQL](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_trino-aka-prestosql-state-activity-6790920739471069184-Ipcq) * [ETL is an extremely important component for any modern business](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-datawarehouse-bigdata-activity-6789114504010625024-V-lz) * [Top 5 ways to complicate a #dataengineering pipeline/application ’](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-functionalprogramming-bigdata-activity-6802551801179631616-u6a0) * [Leader election is commonly used aka Master/Namenode/Leader/Driver](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_leaderelection-activity-6802979035270938624-VYYF) * [Dagster vs Airflow - A comparison](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_moving-past-airflow-why-dagster-is-the-next-generation-activity-6800808471768969216-__wj) * [About Single Source of Truth in DataEngineering](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-machinelearning-dataanalytics-activity-6801541***6638733312-81gM) * [Change Data Capture for Distributed Databases](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_change-data-capture-for-distributed-databases-activity-6821429243944206336-HiJL) * [Deep Dive on Why Apache Iceberg for Change Data Capture, using Apache Flink ‰](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apacheiceberg-deltalake-realtime-activity-6813803694627336192-W7M7) * [OpenMetadata is an Open Standard for Metadata. A Single place to Discover, Collaborate, and Get your data right](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_metadata-datamesh-amundsen-activity-683322376769***45120-8U5h) * [About Lakehouse](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_datawarehouse-lakehouse-machinelearning-activity-6839506317***8941056-d23r) * [etcd - A distributed, reliable key-value store for the most critical data of a distributed system](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_etcd-distributed-reliable-key-value-store-activity-6854393688827711488-1mts) * [What is Redis](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_what-is-redis-data-store-activity-6859358023492***8960-YxoZ) * [What is Hive](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_what-is-apache-hive-activity-6862273737388032000-nRSq) * [What is Data Warehouse - An Introduction](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_what-is-data-warehouse-activity-6862625346521583616-x6Yl) * [Fundamentals of Designing Data Warehouse](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_fundamentals-of-designing-data-warehouse-activity-6865875658413793280-3Cln) * [Database Relational Model - A way of looking at Data](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_database-relational-model-activity-6888768611230511104-Eyu4) * [Data Engineering Infrastructure Notes](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_data-engineering-infrastructure-notes-activity-6901541636677910528-jjxL) #### Dataengineering Core * [A Data Engineering Story - The Beginning](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://github.com/abhishek-ch/around-dataengineering/blob/master/docs/blog1/index.md) * [Data Engineering - More towards Data Science or Data Analytics or ...](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://github.com/abhishek-ch/around-dataengineering/tree/blog2) * [Data Engineering Interview Patterns](https://github.com/abhishek-ch/around-dataengineering/blob/master/sketchnotes/DE_Interview.jpg) * [Basic Checklists while learning Apache Spark](https://github.com/abhishek-ch/around-dataengineering/blob/master/sketchnotes/spark_checklist.png) * [#apachespark for Distributed Analytics or #businessinteligence Platform - Worth or not ?](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apachespark-businessinteligence-analytics-activity-6754429***80213872***-kA7V) * [Apache Beam for Search: An Introduction & Addressing the challenge of the Time Problem ”’”’](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_apachebeam-apachekafka-dataengineering-activity-6753591516708573184-1-p6) * [Nextflow is a Workflow Manager exclusively for #bioinformatics ’](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_bioinformatics-distributedsystems-kubernetes-activity-6749***4503558238209-ufTg) * [#apachespark Project Zen Update - Making PySpark Better ’”—’](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_project-zen-improving-apache-spark-for-python-activity-6748904711069474816-3iQN) * [Design - Exactly Once Delivery & Transactional Messaging in #apachekafka “](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_exactlyoncekafka-activity-6762359693802299392-Gch3) * [underrated but important skill of a Data Engineer](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineer-datascience-dataengineering-activity-6775777891335***7232-6573) * [Fallacies of Distributed Systems](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-distributedsystems-cloud-activity-677***397685919***976-aE6h) * [As a Data Engineer, some Essentials I did which really helped Data Scientists and the Team](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineer-datascientists-kubernetes-activity-6779784194651504***0-zIUP) * [A very normal Data Engineering work ‰](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-python-kubernetes-activity-6772461203500384256-yrAH) * [What can go wrong in Distributed Data Systems](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineers-dataengineering-machinelearning-activity-6771387632715935744-wZCk) * [Architect and build an #machinelearning use case end to end using Amazon SageMaker ‰](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_architect-and-build-the-full-machine-learning-activity-6775727947325214721-iYAW) * [Around Data Discovery or Metadata Management Platforms](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_datadiscovery-metadatamanagement-datascientists-activity-6777238261258682368-fkGa) * [Amazon S3 Object Lambda - Provide Different Views of Data to Multiple Applications](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_introducing-amazon-s3-object-lambda-use-activity-6778582018067378176-fzvy) * [Full Stack Data Engineer](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_hadoop-apachespark-kubernetes-activity-6781152322971074560-VWvK) * [Data cleaning is Hard but why](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_whys-it-hard-to-teach-data-cleaning-activity-6782684736637693953-JiLg) * [Most exciting things about #dataengineering](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineering-distributedsystems-machinelearning-activity-6783666378680401920-E23H) * [The real impact of Disks on #rocksdb State Backend in Apache Flink](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_the-impact-of-disks-on-rocksdb-state-backend-activity-6784115938338906112-FTVY) * [Tips for Distributed System High Availability](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_tips-for-high-availability-activity-6785942817693855744-8KFF) * [interesting way of collaboration between a Dataengineer & Datascientis](https://github.com/abhishek-ch/around-dataengineering/blob/master/https://www.linkedin.com/posts/iamabhishekchoudhary_dataengineer-datascientist-datascience-activity-6786211019***0369152-QrFF) * [Building Dis ... ...

近期下载者

相关文件


收藏者