M1 DS21-Pengantar Sains Data Dan Analisis Big Data [PDF]

  • 0 0 0
  • Suka dengan makalah ini dan mengunduhnya? Anda bisa menerbitkan file PDF Anda sendiri secara online secara gratis dalam beberapa menit saja! Sign Up
File loading please wait...
Citation preview

UG



Team Teaching PENGANTAR SAINS DATA DAN ANALISIS BIG DATA UNIVERSITAS GUNADARMA



Agenda 1) 2) 3)



GAMBARAN UMUM TENTANG SAINS DATA PROFIL LULUSAN SAINS DATA & TIM SAINS DATA HUBUNGAN ANTARA SAINS DATA, BIG DATA, AI, MACHINE LEARNING & DEEP LEARNING



DEFINISI DATA SCIENCE DARI NIST Definisi Data Science dari NIST (2018). Data science is the extraction of useful knowledge directly from data through a process of discovery, or of hypothesis formulation and hypothesis testing.



GAMBARAN UMUM TENTANG SAINS DATA



SESSION 1



APA ITU SAINS DATA



Programmer



Statistian Business Analyst



Business Analyst



Data Scientist



Programmer



APA ITU SAINS DATA



SAINS DATA: MULTI-DISIPLIN



SIKLUS HIDUP-NYA



KOMPONEN-KOMPONEN-NYA



SET KETRAMPILAN DAN PERAN DATA SCIENTIST



PENERAPAN UTAMA SAINS DATA



PENERAPAN UTAMA SAINS DATA



PROSES SAINS DATA



DEFINISI DATA SCIENTIST DARI NIST Definitions by NIST Big Data WG (NIST SP1500 - 2015) • A Data Scientist is • a practitioner who has sufficient knowledge in the overlapping regimes of expertise in business needs, domain knowledge, analytical skills, and programming and systems engineering expertise to manage the end-to-end scientific method process through each stage in the big data lifecycle. • Data science is the empirical synthesis of actionable knowledge and technologies required to handle data from raw data through the complete data lifecycle process.



PERAN DATA SCIENTIST



CIRI-CIRI DATA SCIENTIST



MODERN DATA SCIENTIST



MODERN DATA SCIENTIST



PILIHAN KARIR DATA SCIENTIST



TIPIKAL PROYEK DATA SCIENTIST



JENJANG KARIR



DATA SCIENTIST VS DATA ANALYST



DATA SCIENTIST VS STATISTIAN



PROFIL LULUSAN SAINS DATA DAN TIM SAINS DATA



SESSION 2



DAFTAR PROFIL LULUSAN PRODI SAINS DATA Profil Profesional Sains Data tergolong keluarga pekerjaan (okupasi) terkait data. Profil ini didefinisikan sebagai perluasan dari taksonomi pekerjaan (okupasi) ESCO (European Skills, Competences, Qualiications and Occupations) Pekerjaan baru yang diusulkan ditempatkan dalam empat kelompok klasifikasi teratas: 1) Manager, untuk peran manajerial 2) Professional, untuk pengembang aplikasi dan insinyur/perekayasa infrastruktur (infrastructure engineers) 3) Teknisi dan Profesional Madya (associate professionals), untuk operator dan teknisi 4) Pekerja pendukung klerikal (Clerical support workers) , untuk kurator dan pengurus (stewards) data



DAFTAR PROFIL LULUSAN PRODI SAINS DATA 1. Manager (S2)



Peran/Deskripsi Tugas



A. Data science (group) manager data atau analytics department manager



Proposes, plans and manages functional and technical evolutions of the data science operations within the relevant domain (technical, research, business)



B. Data science infrastructure manager atau research infrastructure data storage facilities manager



Proposes, plans and manages functional and technical evolutions of the big data infrastructure within the relevant domain (technical research business)



C. Research infrastructure manager atau research infrastructure data storage facilities manager)



Proposes, plans and manages functional and technical evolutions of the research infrastructure within the relevant scientific domain.



DAFTAR PROFIL LULUSAN PRODI SAINS DATA 2. Profesional (Data science professionals)



Peran/Deskripsi Tugas



A. Data scientist (S2)



Data scientists find and interpret rich data sources, manage large amounts of data, merge data sources, ensure consistency of datasets and create visualizations to aid in understanding data. Build mathematical models, present and communicate data insights and findings to specialists and scientists and recommend ways to apply the data.



B. Data science researcher (S2)



Data science researcher applies scientific discovery research/process, including hypothesis and hypothesis testing, to obtain actionable knowledge related to scientific problem, business process, or reveal hidden relations between multiple processes.



C. Data science architect atau system architect atau applications architect (S1 atau S2)



Designs and maintains the architecture of data science applications and facilities. Creates relevant data models and processes worklows.



DAFTAR PROFIL LULUSAN PRODI SAINS DATA 2. Profesional (Data science professionals)



Peran/Deskripsi Tugas



D. Data science (application) programmer/ engineer atau scientific programmer, data engineer) (S1 atau S2)



Designs/develops/codes large data analytics applications to support scientific or enterprise/business processes



E. (Big) Data analyst (S1 atau S2)



Analyses a large variety of data to extract information about system, service or organization performance and presents them in usable/actionable form.



F. Business analyst (S1)



Analyses a large variety of data Information system for improving business performance.



DAFTAR PROFIL LULUSAN PRODI SAINS DATA 2. Profesional (Data science technology professionals)



Peran/Deskripsi Tugas



A.



Data steward (S1)



Plans, implements and manages (research) data input, storage, search, presentation; creates data model for domain specific data; supports and advises domain scientists/researchers. Creates data model for domain-specific data, supports and advises domain scientists/researchers during the whole research cycle and data management life cycle



B.



Digital data curator atau digital curator, digital archivist, digital librarian (S1)



Finds, selects, organizes, shares (exhibits) digital data collections, maintains their integrity, up-to-date status and freshness, discoverability.



C.



Data librarian (S1)



Data librarians perform or support one or more of the following: acquisition (collection development), organization (cataloguing and metadata) and the implementation of appropriate user services. Data librarians apply traditional librarianship principles and practices to data management, including data citation, digital object identifiers (DOIs), ethics and metadata.



DAFTAR PROFIL LULUSAN PRODI SAINS DATA 2. Profesional (Data science technology professionals)



D. Data archivist atau digital archivists (S1)



Peran/Deskripsi Tugas



Maintain historically signiicant collections of datasets, documents and records and other electronic data and seek out new items for archiving.



DAFTAR PROFIL LULUSAN PRODI SAINS DATA 2. Profesional (Database and network professionals)



Peran/Deskripsi Tugas



Large-scale (cloud) data storage designers and administrators A. Large-scale (cloud) database designer (data engineer, data architect) (S1)



Designs/develops/codes large-scale databases and their use in domain/subject-specific applications according to the customer needs



B. Large-scale (cloud) database administrator Designs and implements or monitors and maintains large-scale cloud databases. C. Scientific database administrator (S1)



Designs and implements or monitors and maintains large-scale scientiic databases



DAFTAR PROFIL LULUSAN PRODI SAINS DATA 3. Teknisi dan Profesional Madya (Technicians and associate professionals)



Peran/Deskripsi Tugas



Data infrastructure engineers and technicians A. Big data facilities operators (D3 atau S1)



Manages daily operation of facilities and resources and responds to customer requests. Includes all operations related to data management and data life cycle.



B. Large-scale (cloud) data storage operators (D3 atau S1)



Manages daily operation of cloud storage, including related to data life cycle, and responds to requests from storage users



C. Scientific database operator (D3 – S1)



Manages daily operation of scientific databases, including related to data life cycle, and responds to requests from database users.



DAFTAR PROFIL LULUSAN PRODI SAINS DATA 4. Pekerja pendukung klerikal (Clerical support worker)



Peran/Deskripsi Tugas



Data and information entry and access



A. Data entry/access desk/terminal workers(D3)



Enter data into data management systems directly reading them from source, documents or obtained from people/users



B. Data entry ield workers (D3)



The same work done on field when collecting data from disconnected sensors or doing direct counting or reading



C. User support data services (D3 )



User support data services. Support users to entry their data into governmental service and user facing applications.



DATA SCIENCE PROFESSIONS FAMILY (EDISON DATA SCIENCE FRAMEWORK) Managers: Chief Data Officer (CDO), Data Science (group/dept) manager, Data Science infrastructure manager, Research Infrastructure manager Professionals: Data Scientist, Data Science Researcher, Data Science Architect, Data Science (applications) programmer/engineer, Data Analyst, Business Analyst, etc. Professional (database): Large scale (cloud) database designers and administrators, scientific database designers and administrators Professional and clerical (data handling/management): Data Stewards, Digital Data Curator, Digital Librarians, Data Archivists Technicians and associate professionals: Big Data facilities operators, scientific database/infrastructure operators Icons used: Credit to [ref] https://www.datacamp.com/community/tutorials/data-science-industry-infographic



EDISON – Education for Data Intensive Science to Open New science frontiers



MEMBANGUN TIM SAINS DATA



HUBUNGAN ANTARA SAINS DATA, BIG DATA, AI, MACHINE LEARNING DAN DEEP LEARNING



SESSION 3



HUBUNGAN DS-BD-AI-ML-DL DEWASA INI



Source: adaptation from Ian Goodfellow, et.al 2016 & and Matthew Mayo, 2016



MACHINE LEARNING TECHNIQUES Machine learning mainly has three types of learning techniques: • Supervised learning • Unsupervised learning • Reinforcement learning



MACHINE LEARNING TASKS CATEGORIES



1. 2. 3. 4. 5. 6. 7. 8. 9.



Classification Regression Clustering Anomaly detection Association Recommendation Dimensionality reduction Computer Vision Text Analytics



PROSES MACHINE LEARNING



TOOL IMPLEMENTASI: MATLAB • Matlab https://www.mathworks.com/products/matlab.html • Komersial versi terakhir R2020a • Tersedia Toolbox: AI, Data Science, and Statistics • • • • •



Statistics and Machine Learning Toolbox Deep Learning Toolbox Reinforcement Learning Toolbox Text Analytics Toolbox Predictive Maintenance Toolbox



• Link buku Matlab: https://drive.google.com/drive/folders/1qHLqc2kYrI7REC2UClijIZhrzICmm8AF?us p=sharing • Link buku Deep Learning with Matlab: https://drive.google.com/drive/folders/1QuU9tAMPFXPwM4WmSBRiSYQoj8aA9Wg?usp=sharing



TOOL IMPLEMENTASI: RAPIDMINER • RapidMiner https://rapidminer.com/ • platform perangkat lunak data science • yang dikembangkan oleh perusahaan bernama sama dengan yang menyediakan lingkungan terintegrasi untuk data preparation, machine learning, deep learning, text mining, and predictive analytics. • Digunakan untuk bisnis dan komersial, juga untuk penelitian, pendidikan, pelatihan, rapid prototyping, dan pengembangan aplikasi serta mendukung semua langkah dalam proses machine learning termasuk data preparation, results visualization, model validation and optimization. • RapidMiner dikembangkan pada open core model. Dengan RapidMiner Studio Free Edition, yang terbatas untuk 1 prosesor logika dan 10.000 baris data, tersedia di bawah lisensi AGPL. RapidMiner Studio 9.7 (https://my.rapidminer.com/nexus/account/index.html#downloads) Harga komersial dimulai dari $2.500 dan tersedia dari pengembang.



• Link buku RapidMiner: https://drive.google.com/drive/folders/1ln2R4ryr2qj_IwbkZZT_T9wTyvpuhaN?usp=sharing



TOOL IMPLEMENTASI: R-STUDIO



MENGAPA PAKAI R LANGUAGE ? • R is a free, open-source software and programming language developed in 1995 at the University of Auckland as an environment for statistical computing and graphics (Ikaha and Gentleman, 1996). • Since then R has become one of the dominant software environments for data analysis and is used by a variety of scientific disiplines, including soil science, ecology, and geoinformatics (Envirometrics CRAN Task View; Spatial CRAN Task View). • R is particularly popular for its graphical capabilities, but it is also prized for it’s GIS capabilities which make it relatively easy to generate raster-based models. • More recently, R has also gained several packages which are designed specifically for analyzing soil data.



MENGAPA PAKAI R LANGUAGE ?



BUKU-BUKU R LANGUAGE



TOOL IMPLEMENTASI: PYTHON, JUPYTER, ANACONDA • Python



• Versi 3.8.X • Tersedia IDE: Spyder https://www.spyder-ide.org/ • Tool interactive: Jupyter (Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages.) https://jupyter.org/ • Toolkit: Anaconda (the open-source Individual Edition (Distribution) is the easiest way to perform Python/R data science and machine learning on a single machine. Developed for solo practitioners, it is the toolkit that equips you to work with thousands of open-source packages and libraries) https://www.anaconda.com/ • Google Colab Colaboratory, or "Colab" for short, allows you to write and execute Python in your browser, with • Zero configuration required • Free access to GPUs • Easy sharing



• Whether you're a student, a data scientist or an AI researcher, Colab can make your work easier https://colab.research.google.com/notebooks/intro.ipynb



TOOL IMPLEMENTASI: ANACONDA



TOOL IMPLEMENTASI: JUPYTER



LINK BUKU-BUKU • Big-data dan Data Science: https://drive.google.com/drive/folders/18jbNHjUWsRor8W64o NDxggOd_yWqMzHs?usp=sharing • Deep Learning dan Machine Learning: https://drive.google.com/drive/folders/1hJE5OJhg35R7LC7_bHy99ccoY7CJ3nO?usp=sharing • Python: https://drive.google.com/drive/folders/1zqr5GPjQhP96XqKcW MxcmeWiMAAZ1iVx?usp=sharing



LINK BUKU-BUKU • scikit-learn user guide, Mar 01, 2019 : https://drive.google.com/drive/folders/1rRsU6WdnPUlT3d9Nc sTuk2f6N92PLZkc?usp=sharing



Terima Kasih