」工欲善其事,必先利其器。「—孔子《論語.錄靈公》
首頁 > 程式設計 > 使用 Python 掌握機器學習:基礎與關鍵概念

使用 Python 掌握機器學習:基礎與關鍵概念

發佈於2024-11-04
瀏覽:835

In today's era of Artificial Intelligence (AI), scaling businesses and streamlining workflows has never been easier or more accessible. AI and machine learning equip companies to make informed decisions, giving them a superpower to predict the future with just a few lines of code. Before taking a significant risk, wouldn't knowing if it's worth it be beneficial? Have you ever wondered how these AIs and machine learning models are trained to make such precise predictions?
In this article, we will explore, hands-on, how to create a machine-learning model that can make predictions from our input data. Join me on this journey as we delve into these principles together.
This is the first part of a series on mastering machine learning, focusing on the foundations and key concepts. In the second part, we will dive deeper into advanced techniques and real-world applications.

Introduction:

Machine Learning (ML) essentially means training a model to solve problems. It involves feeding large amounts of data (input-data) to a model, enabling it to learn and discover patterns from the data. Interestingly, the model's accuracy depends solely on the quantity and quality of data it is fed.

Machine learning extends beyond making predictions for enterprises; it powers innovations like self-driving cars, robotics, and much more. With continuous advancements in ML, there's no telling what incredible achievements lie ahead - it's simply amazing, right?

There's no contest as to why Python remains one of the most sought-after programming languages for machine learning. Its vast libraries, such as Scikit-Learn and Pandas, and its easy-to-read syntax make it ideal for ML tasks. Python offers a simplified and well-structured environment that allows developers to maximize their potential. As an open-source programming language, it benefits from contributions worldwide, making it even more suitable and advantageous for data science and machine learning.

Fundamentals Of Machine Learning

Machine Learning (ML) is a vast and complex field that requires years of continuous learning and practice. While it's impossible to cover everything in this article, let's look into some important fundamentals of machine learning, specifically:

  • Supervised Machine Learning From its name, we can deduce that supervised machine learning involves some form of monitoring or structure. It entails mapping one function to another; that is, providing labeled data input (i) to the machine, explaining what should be done (algorithms), and waiting for its output (j). Through this mapping, the machine learns to predict the output (j) whenever an input (i) is fed into it. The result will always remain output (j). Supervised ML can further be classified into:

Regression: When a variable input (i) is supplied as data to train a machine, it produces a continuous numerical output (j). For example, a regression algorithm can be used to predict the price of an item based on its size and other features.

Classification: This algorithm makes predictions based on grouping by determining certain attributes that make up the group. For example, predicting whether a product review is positive, negative, or neutral.

  • Unsupervised Machine Learning Unsupervised Machine Learning tackles unlabeled or unmonitored data. Unlike supervised learning, where models are trained on labeled data, unsupervised learning algorithms identify patterns and relationships in data without prior knowledge of the outcomes. For example, grouping customers based on their purchasing behavior.

Setting Up Your Environment

When setting up your environment to create your first model, it's essential to understand some basic steps in ML and familiarize yourself with the libraries and tools we will explore in this article.

Steps in Machine Learning:

  1. Import the Data: Gather the data you need for your analysis.
  2. Clean the Data: Ensure your data is in good and complete shape by handling missing values and correcting inconsistencies.
  3. Split the Data: Divide the data into training and test sets.
  4. Create a Model: Choose your preferred algorithm to analyze the data and build your model.
  5. Train the Model: Use the training set to teach your model.
  6. Make Predictions: Use the test set to make predictions with your trained model.
  7. Evaluate and Improve: Assess the model's performance and refine it based on the outputs.

Common Libraries and Tools:

  • NumPy: Known for providing multidimensional arrays, NumPy is fundamental for numerical computations.

  • Pandas: A data analysis library that offers data frames (two-dimensional data structures similar to Excel spreadsheets) with rows and columns.

  • Matplotlib: Matplotlib is a two-dimensional plotting library for creating graphs and plots.

  • Scikit-Learn: The most popular machine learning library, providing all common algorithms like decision trees, neural networks, and more.

Recommended Development Environment:

Standard IDEs such as VS Code or terminals may not be ideal when creating a model due to the difficulty in inspecting data while writing code. For our learning purposes, the recommended environment is Jupyter Notebook, which provides an interactive platform to write and execute code, visualize data, and document the process simultaneously.

Step-by-Step Setup:

Download Anaconda:
Anaconda is a popular distribution of Python and R for scientific computing and data science. It includes the Jupyter Notebook and other essential tools.

Download Anaconda from this link.
Install Anaconda:
Follow the installation instructions based on your operating system (Windows, macOS, or Linux).
After the installation is complete, you will have access to the Anaconda Navigator, which is a graphical interface for managing your Anaconda packages, environments, and notebooks.
Launching Jupyter Notebook:

Mastering Machine Learning with Python: Foundations and Key Concepts

Open the Anaconda Navigator
In the Navigator, click on the "Environments" tab.
Select the "base (root)" environment, and then click "Open with Terminal" or "Open Terminal" (the exact wording may vary depending on the OS).
In the terminal window that opens, type the command jupyter notebook and press Enter.

Mastering Machine Learning with Python: Foundations and Key Concepts

This command will launch the Jupyter Notebook server and automatically open a new tab in your default web browser, displaying the Jupyter Notebook interface.

Using Jupyter Notebook:

The browser window will show a file directory where you can navigate to your project folder or create new notebooks.
Click "New" and select "Python 3" (or the appropriate kernel) to create a new Jupyter Notebook.
You can now start writing and executing your code in the cells of the notebook. The interface allows you to document your code, visualize data, and explore datasets interactively.

Mastering Machine Learning with Python: Foundations and Key Concepts

Building Your First Machine Learning Model

In building your first model, we have to take cognizance of the steps in Machine Learning as discussed earlier, which are:

  1. Import the Data
  2. Clean the Data
  3. Split the Data
  4. Create a Model
  5. Train the Model
  6. Make Predictions
  7. Evaluate and Improve

Now, let's assume a scenario involving an online bookstore where users sign up and provide their necessary information such as name, age, and gender. Based on their profile, we aim to recommend various books they are likely to buy and build a model that helps boost sales.

First, we need to feed the model with sample data from existing users. The model will learn patterns from this data to make predictions. When a new user signs up, we can tell the model, "Hey, we have a new user with this profile. What kind of book are they likely to be interested in?" The model will then recommend, for instance, a history or a romance novel, and based on that, we can make personalized suggestions to the user.

Let's break down the process step-by-step:

  1. Import the Data: Load the dataset containing user profiles and their book preferences.
  2. Clean the Data: Handle missing values, correct inconsistencies, and prepare the data for analysis.
  3. Split the Data: Divide the dataset into training and testing sets to evaluate the model's performance.
  4. Create a Model: Choose a suitable machine learning algorithm to build the recommendation model.
  5. Train the Model: Train the model using the training data to learn the patterns and relationships within the data.
  6. Make Predictions: Use the trained model to predict book preferences for new users based on their profiles.
  7. Evaluate and Improve: Assess the model's accuracy using the testing data and refine it to improve its performance.

By following these steps, you will be able to build a machine-learning model that effectively recommends books to users, enhancing their experience and boosting sales for the online bookstore. You can gain access to the datasets used in this tutorial here.

Let's walk through a sample code snippet to illustrate the process of testing the accuracy of the model:

  • Import the necessary libraries:
import pandas as pd
from sklearn.tree import DecisionTreeClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

We start by importing the essential libraries. pandas is used for data manipulation and analysis, while DecisionTreeClassifier, train_test_split, and accuracy_score are from Scikit-learn, a popular machine learning library.

  • Load the dataset:
book_data = pd.read_csv('book_Data.csv')
Read the dataset from a `CSV file` into a pandas DataFrame.
  • Prepare the data:
X = book_data.drop(columns=['Genre'])
y = book_data['Genre']

Create a feature matrix X by dropping the 'Genre' column from the dataset and a target vector y containing the 'Genre' column.

  • Split the data:
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)

Split the data into training and testing sets with 80% for training and 20% for testing.

  • Initialize and train the model:
model = DecisionTreeClassifier()
model.fit(X_train, y_train)

Initialize the DecisionTreeClassifier model and train it using the training data.

  • Make predictions and evaluate the model:
predictions = model.predict(X_test)
score = accuracy_score(y_test, predictions)
print(score)

Make predictions on the test data and calculate the accuracy of the model by comparing the test labels to the predictions. Finally, print the accuracy score to the console.

In this example, we start by importing the essential libraries. Pandas is used for data manipulation and analysis, while DecisionTreeClassifier, train_test_split, and accuracy_score are from Scikit-learn, a popular machine learning library. We then read the dataset from a CSV file into a pandas DataFrame, prepare the data by creating a feature matrix X and a target vector y, split the data into training and testing sets, initialize and train the DecisionTreeClassifier model, make predictions on the test data, and calculate the accuracy of the model by comparing the test labels to the predictions.

Depending on the data you're using, the results will vary. For instance, in the output below, the accuracy score displayed is 0.7, but it may show 0.5 when the code is run again with a different dataset. The accuracy score will vary, a higher score indicates a more accurate model.
Output:

Mastering Machine Learning with Python: Foundations and Key Concepts

Data Preprocessing:

Now that you've successfully created your model, it's important to note that the kind of data used to train your model is crucial to the accuracy and reliability of your predictions. In Mastering Data Analysis: Unveiling the Power of Fairness and Bias in Information, I discussed extensively the importance of data cleaning and ensuring data fairness. Depending on what you intend to do with your model, it is essential to consider if your data is fair and free of any bias. Data cleaning is a very vital part of machine learning, ensuring that your model is trained on accurate, unbiased data. Some of these ethical considerations are:

  1. Removing Outliers: Ensure that the data does not contain extreme values that could skew the model's predictions.

  2. Handling Missing Values: Address any missing data points to avoid inaccurate predictions.

  3. Standardizing Data: Make sure the data is in a consistent format, allowing the model to interpret it correctly.

  4. Balancing the Dataset: Ensure that your dataset represents all categories fairly to avoid bias in predictions.

  5. Ensuring Data Fairness: Check for any biases in your data that could lead to unfair predictions and take steps to mitigate them.

By addressing these ethical considerations, you ensure that your model is not only accurate but also fair and reliable, providing meaningful predictions.

Conclusion:

Machine learning is a powerful tool that can transform data into valuable insights and predictions. In this article, we explored the fundamentals of machine learning, focusing on supervised and unsupervised learning, and demonstrated how to set up your environment and build a simple machine learning model using Python and its libraries. By following these steps and experimenting with different algorithms and datasets, you can unlock the potential of machine learning to solve complex problems and make data-driven decisions.

In the next part of this series, we will dive deeper into advanced techniques and real-world applications of machine learning, exploring topics such as feature engineering, model evaluation, and optimization. Stay tuned for more insights and practical examples to enhance your machine-learning journey.

Additional Resources:

  • Programming with Mosh

  • Machine Learning Tutorial geeksforgeeks

版本聲明 本文轉載於:https://dev.to/eztosin/mastering-machine-learning-with-python-foundations-and-key-concepts-54di?1如有侵犯,請聯絡[email protected]刪除
最新教學 更多>
  • 如何使用組在MySQL中旋轉數據?
    如何使用組在MySQL中旋轉數據?
    在關係數據庫中使用mySQL組使用mySQL組進行查詢結果,在關係數據庫中使用MySQL組,轉移數據的數據是指重新排列的行和列的重排以增強數據可視化。在這裡,我們面對一個共同的挑戰:使用組的組將數據從基於行的基於列的轉換為基於列。 Let's consider the following ...
    程式設計 發佈於2025-06-09
  • 查找當前執行JavaScript的腳本元素方法
    查找當前執行JavaScript的腳本元素方法
    如何引用當前執行腳本的腳本元素在某些方案中理解問題在某些方案中,開發人員可能需要將其他腳本動態加載其他腳本。但是,如果Head Element尚未完全渲染,則使用document.getElementsbytagname('head')[0] .appendChild(v)的常規方...
    程式設計 發佈於2025-06-09
  • Java數組中元素位置查找技巧
    Java數組中元素位置查找技巧
    在Java數組中檢索元素的位置 利用Java的反射API將數組轉換為列表中,允許您使用indexof方法。 (primitives)(鏈接到Mishax的解決方案) 用於排序陣列的數組此方法此方法返回元素的索引,如果發現了元素的索引,或一個負值,指示應放置元素的插入點。
    程式設計 發佈於2025-06-09
  • 如何從PHP中的數組中提取隨機元素?
    如何從PHP中的數組中提取隨機元素?
    從陣列中的隨機選擇,可以輕鬆從數組中獲取隨機項目。考慮以下數組:; 從此數組中檢索一個隨機項目,利用array_rand( array_rand()函數從數組返回一個隨機鍵。通過將$項目數組索引使用此鍵,我們可以從數組中訪問一個隨機元素。這種方法為選擇隨機項目提供了一種直接且可靠的方法。
    程式設計 發佈於2025-06-09
  • 切換到MySQLi後CodeIgniter連接MySQL數據庫失敗原因
    切換到MySQLi後CodeIgniter連接MySQL數據庫失敗原因
    無法連接到mySQL數據庫:故障排除錯誤消息要調試問題,建議將以下代碼添加到文件的末尾.//config/database.php並查看輸出: ... ... 迴聲'... echo '<pre>'; print_r($db['default']); echo '</pr...
    程式設計 發佈於2025-06-09
  • 如何避免Go語言切片時的內存洩漏?
    如何避免Go語言切片時的內存洩漏?
    ,a [j:] ...雖然通常有效,但如果使用指針,可能會導致內存洩漏。這是因為原始的備份陣列保持完整,這意味著新切片外部指針引用的任何對象仍然可能佔據內存。 copy(a [i:] 對於k,n:= len(a)-j i,len(a); k
    程式設計 發佈於2025-06-09
  • 在PHP中如何高效檢測空數組?
    在PHP中如何高效檢測空數組?
    在PHP 中檢查一個空數組可以通過各種方法在PHP中確定一個空數組。如果需要驗證任何數組元素的存在,則PHP的鬆散鍵入允許對數組本身進行直接評估:一種更嚴格的方法涉及使用count()函數: if(count(count($ playerList)=== 0){ //列表為空。 } 對...
    程式設計 發佈於2025-06-09
  • C++20 Consteval函數中模板參數能否依賴於函數參數?
    C++20 Consteval函數中模板參數能否依賴於函數參數?
    [ consteval函數和模板參數依賴於函數參數在C 17中,模板參數不能依賴一個函數參數,因為編譯器仍然需要對非contexexpr futcoriations contim at contexpr function進行評估。 compile time。 C 20引入恆定函數,必須在編譯時進...
    程式設計 發佈於2025-06-09
  • 如何使用Depimal.parse()中的指數表示法中的數字?
    如何使用Depimal.parse()中的指數表示法中的數字?
    在嘗試使用Decimal.parse(“ 1.2345e-02”中的指數符號表示法表示的字符串時,您可能會遇到錯誤。這是因為默認解析方法無法識別指數符號。 成功解析這樣的字符串,您需要明確指定它代表浮點數。您可以使用numbersTyles.Float樣式進行此操作,如下所示:[&& && && ...
    程式設計 發佈於2025-06-09
  • 如何使用“ JSON”軟件包解析JSON陣列?
    如何使用“ JSON”軟件包解析JSON陣列?
    parsing JSON與JSON軟件包 QUALDALS:考慮以下go代碼:字符串 } func main(){ datajson:=`[“ 1”,“ 2”,“ 3”]`` arr:= jsontype {} 摘要:= = json.unmarshal([] byte(...
    程式設計 發佈於2025-06-09
  • Python環境變量的訪問與管理方法
    Python環境變量的訪問與管理方法
    Accessing Environment Variables in PythonTo access environment variables in Python, utilize the os.environ object, which represents a mapping of envir...
    程式設計 發佈於2025-06-09
  • Python元類工作原理及類創建與定制
    Python元類工作原理及類創建與定制
    python中的metaclasses是什麼? Metaclasses負責在Python中創建類對象。就像類創建實例一樣,元類也創建類。他們提供了對類創建過程的控制層,允許自定義類行為和屬性。 在Python中理解類作為對象的概念,類是描述用於創建新實例或對象的藍圖的對象。這意味著類本身是使用...
    程式設計 發佈於2025-06-09
  • 人臉檢測失敗原因及解決方案:Error -215
    人臉檢測失敗原因及解決方案:Error -215
    錯誤處理:解決“ error:( - 215)!empty()in Function openCv in Function MultSiscale中的“檢測”中的錯誤:在功能檢測中。”當Face Cascade分類器(即面部檢測至關重要的組件)未正確加載時,通常會出現此錯誤。 要解決此問題,必...
    程式設計 發佈於2025-06-09
  • 在UTF8 MySQL表中正確將Latin1字符轉換為UTF8的方法
    在UTF8 MySQL表中正確將Latin1字符轉換為UTF8的方法
    在UTF8表中將latin1字符轉換為utf8 ,您遇到了一個問題,其中含義的字符(例如,“jáuòiñe”)在utf8 table tabled tablesset中被extect(例如,“致電。為了解決此問題,您正在嘗試使用“ mb_convert_encoding”和“ iconv”轉換受...
    程式設計 發佈於2025-06-09
  • input: Why Does "Warning: mysqli_query() expects parameter 1 to be mysqli, resource given" Error Occur and How to Fix It?

output: 解決“Warning: mysqli_query() 參數應為 mysqli 而非 resource”錯誤的解析與修復方法
    input: Why Does "Warning: mysqli_query() expects parameter 1 to be mysqli, resource given" Error Occur and How to Fix It? output: 解決“Warning: mysqli_query() 參數應為 mysqli 而非 resource”錯誤的解析與修復方法
    mysqli_query()期望參數1是mysqli,resource給定的,嘗試使用mysql Query進行執行MySQLI_QUERY_QUERY formation,be be yessqli:sqli:sqli:sqli:sqli:sqli:sqli: mysqli,給定的資源“可能發...
    程式設計 發佈於2025-06-09

免責聲明: 提供的所有資源部分來自互聯網,如果有侵犯您的版權或其他權益,請說明詳細緣由並提供版權或權益證明然後發到郵箱:[email protected] 我們會在第一時間內為您處理。

Copyright© 2022 湘ICP备2022001581号-3