Electronics and Telecommunication¶

- Jayabharathi Hari(https://www.jayabharathi-hari.com/)¶

DataSet:

• CONTEXT: A communications equipment manufacturing company has a product which is responsible for emitting informative signals. Company wants to build a machine learning model which can help the company to predict the equipment’s signal quality using various parameters.

• DATA DESCRIPTION: The data set contains information on various signal tests performed:

Parameters: Various measurable signal parameters.
Signal_Quality: Final signal strength or quality

• PROJECT OBJECTIVE: To build a classifier which can use the given parameters to determine the signal strength or quality.

1. Exploratory data analysis

In [ ]:

from google.colab import drive
drive.mount('/content/drive')

Drive already mounted at /content/drive; to attempt to forcibly remount, call drive.mount("/content/drive", force_remount=True).

In [ ]:

import numpy as np
import pandas as pd
import seaborn as sns
import scipy.stats as stats
import matplotlib.pyplot as plt
from tensorflow import keras
%matplotlib inline
import tensorflow as tf
tf.__version__

Out[ ]:

'2.13.0'

In [ ]:

import h5py
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler, OneHotEncoder
from sklearn.metrics import confusion_matrix
from keras import backend as k
from tensorflow.keras.utils import to_categorical
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Flatten, Dense,Dropout,ReLU,Reshape,BatchNormalization
from tensorflow.keras import regularizers, optimizers
from sklearn.metrics import r2_score
from tensorflow.keras.models import load_model
from tensorflow.keras.optimizers import Adam, SGD
from tensorflow.keras.callbacks import ModelCheckpoint
import math
# intialize random no: generator
import random
random.seed(7)
import warnings
warnings.filterwarnings("ignore")

In [ ]:

df= pd.read_csv('/content/drive/MyDrive/GL_files/GL_Dataset/Signal.csv')
df.sample(4)

In [ ]:

print("Shape:Columns&Rows",df.shape)
print("\nSize:",df.size)
print("\nUnique Values in Signal Strength:",df['Signal_Strength'].unique())

Shape:Columns&Rows (1599, 12)

Size: 19188

Unique Values in Signal Strength: [5 6 7 4 8 3]

In [ ]:

#Since it has only 6 classes converting the values from 0 to 5 inplace of 3 to 8
to_replace = {3:0,4:1,5:2,6:3,7:4,8:5}
df=df.replace({'Signal_Strength': to_replace})

In [ ]:

df['Signal_Strength'].unique()

Out[ ]:

array([2, 3, 4, 1, 5, 0])

In [ ]:

df.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1599 entries, 0 to 1598
Data columns (total 12 columns):
 #   Column           Non-Null Count  Dtype  
---  ------           --------------  -----  
 0   Parameter 1      1599 non-null   float64
 1   Parameter 2      1599 non-null   float64
 2   Parameter 3      1599 non-null   float64
 3   Parameter 4      1599 non-null   float64
 4   Parameter 5      1599 non-null   float64
 5   Parameter 6      1599 non-null   float64
 6   Parameter 7      1599 non-null   float64
 7   Parameter 8      1599 non-null   float64
 8   Parameter 9      1599 non-null   float64
 9   Parameter 10     1599 non-null   float64
 10  Parameter 11     1599 non-null   float64
 11  Signal_Strength  1599 non-null   int64  
dtypes: float64(11), int64(1)
memory usage: 150.0 KB

In [ ]:

df.describe().T

Out[ ]:

	count	mean	std	min	25%	50%	75%	max
Parameter 1	1599.0	8.319637	1.741096	4.60000	7.1000	7.90000	9.200000	15.90000
Parameter 2	1599.0	0.527821	0.179060	0.12000	0.3900	0.52000	0.640000	1.58000
Parameter 3	1599.0	0.270976	0.194801	0.00000	0.0900	0.26000	0.420000	1.00000
Parameter 4	1599.0	2.538806	1.409928	0.90000	1.9000	2.20000	2.600000	15.50000
Parameter 5	1599.0	0.087467	0.047065	0.01200	0.0700	0.07900	0.090000	0.61100
Parameter 6	1599.0	15.874922	10.460157	1.00000	7.0000	14.00000	21.000000	72.00000
Parameter 7	1599.0	46.467792	32.895324	6.00000	22.0000	38.00000	62.000000	289.00000
Parameter 8	1599.0	0.996747	0.001887	0.99007	0.9956	0.99675	0.997835	1.00369
Parameter 9	1599.0	3.311113	0.154386	2.74000	3.2100	3.31000	3.400000	4.01000
Parameter 10	1599.0	0.658149	0.169507	0.33000	0.5500	0.62000	0.730000	2.00000
Parameter 11	1599.0	10.422983	1.065668	8.40000	9.5000	10.20000	11.100000	14.90000
Signal_Strength	1599.0	2.636023	0.807569	0.00000	2.0000	3.00000	3.000000	5.00000

In [ ]:

df['Signal_Strength'].value_counts().sort_values()

Out[ ]:

0     10
5     18
1     53
4    199
3    638
2    681
Name: Signal_Strength, dtype: int64

In [ ]:

duplicates = df.duplicated()

In [ ]:

duplicates

Out[ ]:

0       False
1       False
2       False
3       False
4        True
        ...  
1594    False
1595    False
1596     True
1597    False
1598    False
Length: 1599, dtype: bool

In [ ]:

num_duplicates =duplicates.sum()
print(f"No: of duplicate : {num_duplicates}")

No: of duplicate : 240

In [ ]:

df.drop_duplicates(inplace=True)

In [ ]:

duplicates=df.duplicated()

In [ ]:

duplicates

Out[ ]:

0       False
1       False
2       False
3       False
5       False
        ...  
1593    False
1594    False
1595    False
1597    False
1598    False
Length: 1359, dtype: bool

In [ ]:

num_duplicates_postimputation =duplicates.sum()
print(f"No: of duplicate : {num_duplicates_postimputation}") #We have removed the duplicates

No: of duplicate : 0

In [ ]:

df.describe().T

Out[ ]:

	count	mean	std	min	25%	50%	75%	max
Parameter 1	1359.0	8.310596	1.736990	4.60000	7.1000	7.9000	9.20000	15.90000
Parameter 2	1359.0	0.529478	0.183031	0.12000	0.3900	0.5200	0.64000	1.58000
Parameter 3	1359.0	0.272333	0.195537	0.00000	0.0900	0.2600	0.43000	1.00000
Parameter 4	1359.0	2.523400	1.352314	0.90000	1.9000	2.2000	2.60000	15.50000
Parameter 5	1359.0	0.088124	0.049377	0.01200	0.0700	0.0790	0.09100	0.61100
Parameter 6	1359.0	15.893304	10.447270	1.00000	7.0000	14.0000	21.00000	72.00000
Parameter 7	1359.0	46.825975	33.408946	6.00000	22.0000	38.0000	63.00000	289.00000
Parameter 8	1359.0	0.996709	0.001869	0.99007	0.9956	0.9967	0.99782	1.00369
Parameter 9	1359.0	3.309787	0.155036	2.74000	3.2100	3.3100	3.40000	4.01000
Parameter 10	1359.0	0.658705	0.170667	0.33000	0.5500	0.6200	0.73000	2.00000
Parameter 11	1359.0	10.432315	1.082065	8.40000	9.5000	10.2000	11.10000	14.90000
Signal_Strength	1359.0	2.623252	0.823578	0.00000	2.0000	3.0000	3.00000	5.00000

In [ ]:

Correlation=df.corr()
Correlation

Out[ ]:

	Parameter 1	Parameter 2	Parameter 3	Parameter 4	Parameter 5	Parameter 6	Parameter 7	Parameter 8	Parameter 9	Parameter 10	Parameter 11	Signal_Strength
Parameter 1	1.000000	-0.255124	0.667437	0.111025	0.085886	-0.140580	-0.103777	0.670195	-0.686685	0.190269	-0.061596	0.119024
Parameter 2	-0.255124	1.000000	-0.551248	-0.002449	0.055154	-0.020945	0.071701	0.023943	0.247111	-0.256948	-0.197812	-0.395214
Parameter 3	0.667437	-0.551248	1.000000	0.143892	0.210195	-0.048004	0.047358	0.357962	-0.550310	0.326062	0.105108	0.228057
Parameter 4	0.111025	-0.002449	0.143892	1.000000	0.026656	0.160527	0.201038	0.324522	-0.083143	-0.011837	0.063281	0.013640
Parameter 5	0.085886	0.055154	0.210195	0.026656	1.000000	0.000749	0.045773	0.193592	-0.270893	0.394557	-0.223824	-0.130988
Parameter 6	-0.140580	-0.020945	-0.048004	0.160527	0.000749	1.000000	0.667246	-0.018071	0.056631	0.054126	-0.080125	-0.050463
Parameter 7	-0.103777	0.071701	0.047358	0.201038	0.045773	0.667246	1.000000	0.078141	-0.079257	0.035291	-0.217829	-0.177855
Parameter 8	0.670195	0.023943	0.357962	0.324522	0.193592	-0.018071	0.078141	1.000000	-0.355617	0.146036	-0.504995	-0.184252
Parameter 9	-0.686685	0.247111	-0.550310	-0.083143	-0.270893	0.056631	-0.079257	-0.355617	1.000000	-0.214134	0.213418	-0.055245
Parameter 10	0.190269	-0.256948	0.326062	-0.011837	0.394557	0.054126	0.035291	0.146036	-0.214134	1.000000	0.091621	0.248835
Parameter 11	-0.061596	-0.197812	0.105108	0.063281	-0.223824	-0.080125	-0.217829	-0.504995	0.213418	0.091621	1.000000	0.480343
Signal_Strength	0.119024	-0.395214	0.228057	0.013640	-0.130988	-0.050463	-0.177855	-0.184252	-0.055245	0.248835	0.480343	1.000000

In [ ]:

import seaborn as sns
import matplotlib.pyplot as plt

plt.subplots(figsize=(20,8))
sns.heatmap(Correlation)
plt.show()

No description has been provided for this image

In [ ]:

sns.pairplot(df,diag_kind ='kde')
plt.show()
#Checking on the corelation

In [ ]:

df.plot(kind='box')
plt.show()

In [ ]:

##Performing univariate, bivariate and multivariate analysis
# Feature Importance

# Independent variables
X=df.drop('Signal_Strength',axis=1)

# Target variable
Y=df['Signal_Strength']


from sklearn.ensemble import ExtraTreesClassifier
import matplotlib.pyplot as plt
model = ExtraTreesClassifier()
model.fit(X,Y)

#using inbuilt class "feature_importances" of tree based classifiers
print(model.feature_importances_)

#plotting graph of feature importances
feat_importances = pd.Series(model.feature_importances_, index=X.columns)
feat_importances.nlargest(10).plot(kind='barh')
plt.show()

#Observation: Most Effective - Parameter 11

[0.07673646 0.0960267  0.08050729 0.07999156 0.07986296 0.07372861
 0.10095128 0.084817   0.07564383 0.09979721 0.15193711]

In [ ]:

out_df = df.drop('Signal_Strength',axis=1)
out_df.shape

Out[ ]:

(1359, 11)

In [ ]:

#  function to treat outliers
def detect_treat_outliers(df,operation):
    cols=[]
    IQR_list=[]
    lower_boundary_list=[]
    upper_boundary_list=[]
    outliers_count=[]
    for col in df.columns:
        print('col',col)
        if((df[col].dtype =='int64' or df[col].dtype =='float64') and (col != 'HR')):
            #print('Inside if')
            IQR = df[col].quantile(0.75) - df[col].quantile(0.25)
            lower_boundary = df[col].quantile(0.25) - (1.5 * IQR)
            upper_boundary = df[col].quantile(0.75) + (1.5 * IQR)
            up_cnt = df[df[col]>upper_boundary][col].shape[0]
            #print('Upper count=',up_cnt)
            lw_cnt = df[df[col]<lower_boundary][col].shape[0]
            #print('lower count=',lw_cnt)
            if(up_cnt+lw_cnt) > 0:
                cols.append(col)
                IQR_list.append(IQR)
                lower_boundary_list.append(lower_boundary)
                upper_boundary_list.append(upper_boundary)
                outliers_count.append(up_cnt+lw_cnt)
                if operation == 'update':
                    df.loc[df[col] > upper_boundary,col] = upper_boundary
                    df.loc[df[col] < lower_boundary,col] = lower_boundary
                else:
                    pass
            else:
                pass
    newdf = pd.DataFrame(list(zip(cols,IQR_list,lower_boundary_list,upper_boundary_list,outliers_count)),columns=['Features','IQR','Lower Boundary','Upper Boundary','Outlier Count'])
    #print('Df=',newdf)
    #print('Columns having outliers=',cols)
    if operation == 'update':
        return (len(cols),df)
    else:
        return (len(cols),newdf)

In [ ]:

#Removing outliers by replacing the data below lower and above upper whisker
count,new_out_df=detect_treat_outliers(out_df,'update')
if count>0:
    print('Updated dataset')
    newdf=df

col Parameter 1
col Parameter 2
col Parameter 3
col Parameter 4
col Parameter 5
col Parameter 6
col Parameter 7
col Parameter 8
col Parameter 9
col Parameter 10
col Parameter 11
Updated dataset

In [ ]:

new_out_df.describe().T

Out[ ]:

	count	mean	std	min	25%	50%	75%	max
Parameter 1	1359.0	8.284069	1.658319	4.60000	7.1000	7.9000	9.20000	12.35000
Parameter 2	1359.0	0.527840	0.177262	0.12000	0.3900	0.5200	0.64000	1.01500
Parameter 3	1359.0	0.272288	0.195379	0.00000	0.0900	0.2600	0.43000	0.94000
Parameter 4	1359.0	2.324099	0.607558	0.90000	1.9000	2.2000	2.60000	3.65000
Parameter 5	1359.0	0.081323	0.018486	0.03850	0.0700	0.0790	0.09100	0.12250
Parameter 6	1359.0	15.714496	9.852641	1.00000	7.0000	14.0000	21.00000	42.00000
Parameter 7	1359.0	46.092715	30.877994	6.00000	22.0000	38.0000	63.00000	124.50000
Parameter 8	1359.0	0.996707	0.001798	0.99227	0.9956	0.9967	0.99782	1.00115
Parameter 9	1359.0	3.308889	0.149982	2.92500	3.2100	3.3100	3.40000	3.68500
Parameter 10	1359.0	0.649963	0.137403	0.33000	0.5500	0.6200	0.73000	1.00000
Parameter 11	1359.0	10.428734	1.070647	8.40000	9.5000	10.2000	11.10000	13.50000

Df has 1599 rows and 12 columns
All parameters are floating point and

signal strength is an integer 3. STD higher for parmeter 7 (32.8953)and lower for parameter 4. There are no null values in the data 5. Duplicates were there and removed 6. there is high correlation between Parameter 6 and 7 and have no correlation with other parameters. 7. Parameter 3 ranges between 0 and 1. 8. Maximum value of Parameter 5 is 0.6
9. Removed the Outliers. 10. Parameter 1 is positively correlated to Parameter 3 and Parameter 8 and negatively correlated to Parameter 2 and Parameter 9. Parameter 4 has very low correlation with other Parameters.

2. Data preprocessing¶

In [ ]:

#Split the data into X & Y.
X = new_out_df
y = df['Signal_Strength']

In [ ]:

#Split the data into train & test with 70:30 proportion.[
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=.30, random_state=4)

In [ ]:

#shape of all the 4 variables
print('Shape of X_train', X_train.shape)
print('Shape of X_test', X_test.shape)
print('Shape of y_test', y_test.shape)
print('Shape of y_train', y_train.shape)

Shape of X_train (951, 11)
Shape of X_test (408, 11)
Shape of y_test (408,)
Shape of y_train (951,)

In [ ]:

#Verifying if train and test are in sync
print('Are test and train data in sync?',(X_train.index==y_train.index).all() and (X_test.index== y_test.index).all())

Are test and train data in sync? True

In [ ]:

##Normalise the train test
#Scaling training data
X_train = StandardScaler().fit_transform(X_train)
# Scaling testing data
X_test = StandardScaler().fit_transform(X_test)

In [ ]:

#Transform Labels into format acceptable by Neural Network / Converting y data into categorical (one-hot encoding)
trainY = tf.keras.utils.to_categorical(y_train,num_classes=6)
testY = tf.keras.utils.to_categorical(y_test,num_classes=6)

Model Training & Evaluation using Neural Network¶

In [ ]:

num_features =11
num_classes = 6

#model= tf.keras.models.Sequential()
#creates a sequential model

In [ ]:

model = Sequential()

In [ ]:

model.add(Dense(11, activation='relu',input_shape = (11,)))
model.add(Dense(11, activation='relu'))
model.add(Dense(6, activation='softmax'))

In [ ]:

model.summary()

Model: "sequential_8"
_________________________________________________________________
 Layer (type)                Output Shape              Param #   
=================================================================
 dense_31 (Dense)            (None, 11)                132       
                                                                 
 dense_32 (Dense)            (None, 11)                132       
                                                                 
 dense_33 (Dense)            (None, 6)                 72        
                                                                 
=================================================================
Total params: 336 (1.31 KB)
Trainable params: 336 (1.31 KB)
Non-trainable params: 0 (0.00 Byte)
_________________________________________________________________

In [ ]:

#model.compile(optimizer= 'sgd', loss= 'mse', metrics=['accuracy'])
model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])

In [ ]:

history = model.fit(X_train, trainY,validation_split=.3, epochs=20, batch_size=25)

Epoch 1/20
27/27 [==============================] - 1s 16ms/step - loss: 1.7453 - accuracy: 0.2526 - val_loss: 1.6909 - val_accuracy: 0.2797
Epoch 2/20
27/27 [==============================] - 0s 5ms/step - loss: 1.6151 - accuracy: 0.3098 - val_loss: 1.5860 - val_accuracy: 0.3497
Epoch 3/20
27/27 [==============================] - 0s 5ms/step - loss: 1.5063 - accuracy: 0.3504 - val_loss: 1.4896 - val_accuracy: 0.4231
Epoch 4/20
27/27 [==============================] - 0s 5ms/step - loss: 1.4065 - accuracy: 0.4241 - val_loss: 1.4000 - val_accuracy: 0.4580
Epoch 5/20
27/27 [==============================] - 0s 5ms/step - loss: 1.3157 - accuracy: 0.4692 - val_loss: 1.3231 - val_accuracy: 0.4965
Epoch 6/20
27/27 [==============================] - 0s 6ms/step - loss: 1.2393 - accuracy: 0.5008 - val_loss: 1.2642 - val_accuracy: 0.5245
Epoch 7/20
27/27 [==============================] - 0s 5ms/step - loss: 1.1788 - accuracy: 0.5233 - val_loss: 1.2141 - val_accuracy: 0.5315
Epoch 8/20
27/27 [==============================] - 0s 6ms/step - loss: 1.1315 - accuracy: 0.5383 - val_loss: 1.1742 - val_accuracy: 0.5420
Epoch 9/20
27/27 [==============================] - 0s 6ms/step - loss: 1.0941 - accuracy: 0.5398 - val_loss: 1.1482 - val_accuracy: 0.5385
Epoch 10/20
27/27 [==============================] - 0s 6ms/step - loss: 1.0653 - accuracy: 0.5519 - val_loss: 1.1307 - val_accuracy: 0.5420
Epoch 11/20
27/27 [==============================] - 0s 7ms/step - loss: 1.0434 - accuracy: 0.5714 - val_loss: 1.1177 - val_accuracy: 0.5455
Epoch 12/20
27/27 [==============================] - 0s 5ms/step - loss: 1.0264 - accuracy: 0.5744 - val_loss: 1.1101 - val_accuracy: 0.5420
Epoch 13/20
27/27 [==============================] - 0s 5ms/step - loss: 1.0122 - accuracy: 0.5789 - val_loss: 1.1043 - val_accuracy: 0.5420
Epoch 14/20
27/27 [==============================] - 0s 5ms/step - loss: 1.0012 - accuracy: 0.5759 - val_loss: 1.1009 - val_accuracy: 0.5350
Epoch 15/20
27/27 [==============================] - 0s 7ms/step - loss: 0.9918 - accuracy: 0.5955 - val_loss: 1.0984 - val_accuracy: 0.5350
Epoch 16/20
27/27 [==============================] - 0s 7ms/step - loss: 0.9829 - accuracy: 0.5805 - val_loss: 1.0954 - val_accuracy: 0.5350
Epoch 17/20
27/27 [==============================] - 0s 5ms/step - loss: 0.9749 - accuracy: 0.5820 - val_loss: 1.0936 - val_accuracy: 0.5420
Epoch 18/20
27/27 [==============================] - 0s 5ms/step - loss: 0.9687 - accuracy: 0.5925 - val_loss: 1.0904 - val_accuracy: 0.5420
Epoch 19/20
27/27 [==============================] - 0s 5ms/step - loss: 0.9635 - accuracy: 0.5880 - val_loss: 1.0882 - val_accuracy: 0.5455
Epoch 20/20
27/27 [==============================] - 0s 5ms/step - loss: 0.9567 - accuracy: 0.5925 - val_loss: 1.0861 - val_accuracy: 0.5385

In [ ]:

#The model is slightly overfitting since the training accuracy is higher than the validation accuracy
# Evaluate the model
loss, accuracy = model.evaluate(X_test, testY)
print("Test loss:", loss)
print("Test accuracy:", accuracy)

13/13 [==============================] - 0s 2ms/step - loss: 1.0177 - accuracy: 0.5784
Test loss: 1.0176830291748047
Test accuracy: 0.5784313678741455

In [ ]:

loss_train = Network_Classifier.history['loss']
loss_val = Network_Classifier.history['val_loss']
epochs = range(1,EPOCH+1)
plt.plot(epochs, loss_train, 'g', label='Training loss')
plt.plot(epochs, loss_val, 'b', label='validation loss')
plt.title('Training and Validation loss')
plt.xlabel('Epochs')
plt.ylabel('Loss')
plt.legend()
plt.show()

In [ ]:

Acc_train = Network_Classifier.history['accuracy']
Acc_val = Network_Classifier.history['val_accuracy']
epochs = range(1,EPOCH+1)
plt.plot(epochs, Acc_train, 'g', label='Training accuracy')
plt.plot(epochs, Acc_val, 'b', label='validation accuracy')
plt.title('Training and Validation accuracy')
plt.xlabel('Epochs')
plt.ylabel('accuracy')
plt.legend()
plt.show()