Technology

Alt Full Text
Importing ML Libraries

ML Libraries Explained

A Python library is a collection of related modules. It contains bundles of code that can be used repeatedly in different programs. It makes Python Programming simpler and convenient for the programmer. As we don’t need to write the same code again and again for different programs.

Python libraries play a very vital role in fields of Machine Learning, Data Science, Data Visualization, etc.

Commonly used libraries

  1. TensorFlow: This library was developed by Google in collaboration with the Brain Team. It is an open-source library used for high-level computations. It is also used in machine learning and deep learning algorithms. It contains a large number of tensor operations. Researchers also use this Python library to solve complex computations in Mathematics and Physics.
  2. Matplotlib: This library is responsible for plotting numerical data. And that’s why it is used in data analysis. It is also an open-source library and plots high-defined figures like pie charts, histograms, scatterplots, graphs, etc.
  3. Pandas: Pandas are an important library for data scientists. It is an open-source machine learning library that provides flexible high-level data structures and a variety of analysis tools. It eases data analysis, data manipulation, and cleaning of data. Pandas support operations like Sorting, Re-indexing, Iteration, Concatenation, Conversion of data, Visualizations, Aggregations, etc.
  4. Numpy: The name “Numpy” stands for “Numerical Python”. It is the commonly used library. It is a popular machine learning library that supports large matrices and multi-dimensional data. It consists of in-built mathematical functions for easy computations. Even libraries like TensorFlow use Numpy internally to perform several operations on tensors. Array Interface is one of the key features of this library.
  5. SciPy: The name “SciPy” stands for “Scientific Python”. It is an open-source library used for high-level scientific computations. This library is built over an extension of Numpy. It works with Numpy to handle complex computations. While Numpy allows sorting and indexing of array data, the numerical data code is stored in SciPy. It is also widely used by application developers and engineers.
  6. Scrapy: It is an open-source library that is used for extracting data from websites. It provides very fast web crawling and high-level screen scraping. It can also be used for data mining and automated testing of data.
  7. Scikit-learn: It is a famous Python library to work with complex data. Scikit-learn is an open-source library that supports machine learning. It supports variously supervised and unsupervised algorithms like linear regression, classification, clustering, etc. This library works in association with Numpy and SciPy.
  8. PyGame: This library provides an easy interface to the Standard Directmedia Library (SDL) platform-independent graphics, audio, and input libraries. It is used for developing video games using computer graphics and audio libraries along with Python programming language.
  9. PyTorch: PyTorch is the largest machine learning library that optimizes tensor computations. It has rich APIs to perform tensor computations with strong GPU acceleration. It also helps to solve application issues related to neural networks.
  10. PyBrain: The name “PyBrain” stands for Python Based Reinforcement Learning, Artificial Intelligence, and Neural Networks library. It is an open-source library built for beginners in the field of Machine Learning. It provides fast and easy-to-use algorithms for machine learning tasks. It is so flexible and easily understandable and that’s why is really helpful for developers that are new in research fields.

Specialized Machine Learning Libraries (sklearn libraries)

  1. sklearn Model Selection has the below classes, functions, hyper-parameter optimizers and validation features
    1. Splitter Classes
    2. Splitter Functions
      1. train_test_split - Split arrays or matrices into random train and test subsets
      2. check_cv - Input checker utility for building a cross-validator.
    3. Hyper-parameter optimizers
      1. GridSearchCV - Exhaustive search over specified parameter values for an estimator.
    4. Model validation
      1. cross_val_score - Evaluate a score by cross-validation.
    5. Visualization
  2. sklearn preprocessing has the below capabilities
    1. OneHotEncoder - Encode categorical features as a one-hot numeric array.
    2. StandardScaler - Standardize features by removing the mean and scaling to unit variance.
  3. sklearn impute has the below capabilities
    1. SimpleImputer - Univariate imputer for completing missing values with simple strategies.
  4. sklearn pipeline has the below capabilities
    1. Pipeline - Pipeline of transforms with a final estimator.
  5. sklearn feature selection has the below capabilities
    1. SelectKBest - Select features according to the k highest scores.
  6. sklearn compose has the below capabilities
    1. ColumnTransfer - Applies transformers to columns of an array or pandas DataFrame.
  7. sklearn metrics has the below capabilities
    1. accuracy_score - Accuracy classification score.
    2. classification_report - Build a text report showing the main classification metrics.
    3. roc_auc_score - Compute Area Under the Receiver Operating Characteristic Curve (ROC AUC) from prediction scores.
    4. plot_roc_curve -

sklearn Algorithms

  1. neighbors - module implements the k-nearest neighbors algorithm
  2. svm - module includes Support Vector Machine algorithms
  3. svc - C-Support Vector Classification
  4. DecisionTreeClassifier - A decision tree classifier.
  5. RandomForestClassifier - A random forest classifier.
  6. AdaBoostClassifier - An AdaBoost classifier.
  7. GradientBoostingClassifier - Gradient Boosting for classification

Sources

  1. Python Libraries
  2. scikit-learn

Related Articles