JupyterLab Extension for INFACore

JupyterLab Extension for INFACore

Parse name

Parse name

You can parse the first name and surname and the respective gender to determine the gender score and gender status for the person name.
If you know the gender for a name, then the function uses the gender specific score to determine the gender. Acceptable input for male and female genders is M and F.
If you do not know the gender, the function uses the highest of the male or female scores to determine the status. The rule also calculates the probable gender based on the first name input and provides a confidence score based on the frequency a name occurs as male or female.
Genders are only assigned a score if the probability of the gender being male or female is 70% or more. Unknown genders always have a confidence score of zero.
To perform this operation, select the
Parse Name
function, and specify the data object variable. Then, enter the first name, surname, and gender column names that you want to parse.
For example, the input flat file includes columns for first names, surnames, and gender.
The following snippet is the input code when you apply the function:
import informatica.infacore as ic FF_DV = ic.get_data_source("Flat File").get_connection("DR_FlatFile").get_data_object("input.csv") dqf = ic.DataQualityFunctions() result=dqf.parse_name(FF_DV,"FirstName","Surname","Gender") df_reader = ic.DataFrameReader(result) p_df = df_reader.to_pandas() p_df.head()
The function parses the data and returns the following gender score and gender status:
In_Firstname: [["James","Mary","Ishika"]] In_Surname: [["Thomson","Patricia","Garg"]] In_Gender: [["M","F","F"]] R1_male_first_name_prob: [[0.9611602416284911,0.862213129336417,0.5]] R1_female_first_name_prob: [[0.09618367642795013,0.9994692271687158,0.9994110718492345]] R2_male_surname_prob: [[0.9810939357907253,0.06919354838709678,0.9996677740863787]] R2_female_surname_prob: [[0.9999596024884867,0.00023361381512597722,0.9996677740863787]] R_male_name_parse_prob: [[0.9999983675038314,0.31748599893598634,0.9996677740863787]] R_female_name_parse_prob: [[0.999620522413733,0.9929068110415156,0.9999998041624873]] Status: [["Probably Valid","Uncommon Name","Probably Valid"]]

0 COMMENTS

We’d like to hear from you!