„CT 00” változatai közötti eltérés

A Miau Wiki wikiből
(Chapter#2.3. KPIs)
(Chapter#1.4. Utilities (estimation of informational added-values))
 
(Egy közbenső módosítás ugyanattól a szerkesztőtől nincs mutatva)
72. sor: 72. sor:
 
There are now 4 targeted groups: individuals as Students, groups of Students, individuals as Teachers, manager of universities. The informational added-value is the difference between impacts without and with the results of this project minus costs. In ideal case: the projects does cause more positive impacts than costs compared to the benchmark where the projects results are not given.
 
There are now 4 targeted groups: individuals as Students, groups of Students, individuals as Teachers, manager of universities. The informational added-value is the difference between impacts without and with the results of this project minus costs. In ideal case: the projects does cause more positive impacts than costs compared to the benchmark where the projects results are not given.
 
Estimations have two layers: incomes and costs in the bechmark situation AND incomes and costs based on the results of the projects.... (later)
 
Estimations have two layers: incomes and costs in the bechmark situation AND incomes and costs based on the results of the projects.... (later)
 +
 +
Manager of universities:
 +
*Benchmark: naive approach for daily marketing for motivating more Students to attendance
 +
**Costs: basically wages (where employees/experts are writting messages for the social media)
 +
**Impacts: in ideal case, the share of the particular university is not decreasing compared to the competitive institutions
 +
**Expectation: the income through the human activities must be higher than the costs of the human activities, atl least zero (0 EUR)
 +
*AI-driven support:
 +
**Costs: redurced wages, but licence fees for AI (concept testing) - human experts produce concepts based on the particular data, robots are verifying concepts
 +
**Costs of the AI-oriented development (10.000 EUR/licence)
 +
**Impacts: in ideal case, the share of the particular university is massive increasing compared to the competitive institutions through the most realistic understanding of the marketing systems (e.g. 10.000 EUR/year)
 +
*Conclusion: the investition into the AI-oriented development can be covered within 1 year
  
 
==Chapter#1.5. Motivation==
 
==Chapter#1.5. Motivation==

A lap jelenlegi, 2025. április 7., 13:50-kori változata

Final-thesis-like publication based on previous performances (see: https://miau.my-x.hu/mediawiki/index.php?title=CT_01)
Principles for editing: https://miau.my-x.hu/mediawiki/index.php/Vita:CT_00
History of the final product: https://miau.my-x.hu/mediawiki/index.php?title=CT_00&action=history
History of the discussion page: https://miau.my-x.hu/mediawiki/index.php?title=Vita:CT_00&action=history

Title

Which concepts can be verified based on partial data about log-information in an e-car?

Subtitle

(or a cooperative experiment, how to create e.g. the chapter2 about literature in a final thesis)

Authors

László Pitlik (https://orcid.org/0000-0001-5819-0319), László Pitlik (Jr.) (https://orcid.org/0000-0002-8058-9577) Mátyás Pitlik (https://orcid.org/0000-0002-1991-3008),

Institutions

MY-X research team

Abstract

History of the project: The software-testing as such from point of view of a praxis-oriented education has to enforce real testing experiences - especially about softwares being given day-by-day in the education (e.g. https://miau.my-x.hu/miau/320/moodle_neptun_tests/, https://miau.my-x.hu/miau/320/moodle_testing/, https://miau.my-x.hu/miau/320/teams_testing/). On the other hand, it is not correct, if the term of testing is only focusing on ergonomy, functionality in a trivial way. Therefore, specific aspects are also important: e.g. https://miau.my-x.hu/miau/320/moodle_cubes_logic/ about interpreting systems with seemingly correct functionalities and/or https://miau.my-x.hu/miau/320/moodle_webkincstar/ about legal aspects of potential damages based on testing results. Finally, the testing as such approximate the challenge of concept testing (c.f. https://miau.my-x.hu/miau/320/concept_testing/), where the best concepts should be derived based on partial log-data about arbitrary systems (c.f. encryption/decryption tasks for unknown-cyphers).

Own objectives and results: This publication demonstrates a case about the negotiation process of 10+ experts concerning a tricky challenge, where partial (raw and derived) log-data of an e-car could be analyzed based on three concepts. 2 of them were totally correct from mathematical point of views, and one concept was a randomized set of potential interpretable numbers. The interpretation process had two levels: the first level made only a part of the existing data visible. On other level, all data could be seen. Parallel, to the case tadies based on human intuition processes, an AI-based approach must also be interpreted by human experts. The conclusions can be seen in this publication.

Future: The creation of the publication (as a kind of side effect) will also be used in the education to demonstrate a lot of rules concerning the writing process of a final thesis. On the other hand, the main motivation is always the automation: it is important, that human experts are capable of solving problems in an approximative way, but it is significantly more relevant to explore, how can we derive automations concerning the thinking processes of human experts.

Chapter#1. Introduction

In this chapter, it will be necessary to clarify the basic information about the project: aims/objectives, tasks, targeted groups, uitilities (estimation of information added-values), motivation, about the structure of the study.

Chapter#1.1. Aims/objectives

The title signalize more relevant keywords needing at least a short definition (c.f. concepts, verification, partial log-data).

The data asset for task-definitions can be seen here: https://miau.my-x.hu/miau/320/concept_testing/concept_testing_task_level.xlsx. The whole analytical process can be interpreted here: https://miau.my-x.hu/miau/320/concept_testing/concept_testing_v1.xlsx There are 3 task levels (for each level there is a separate sheet "task1", "task2", "task3" - see *task_level.xlsx). The entire complexity (see *_v1.xlsx - including data and analytical steps) was a hidden file during the task-periode. Further files concerning solutions can be seen here: https://miau.my-x.hu/miau/320/concept_testing/?C=M;O=D.

Based on the above-mentioned files, the expression partial data means: parts of a complex systems are presented as task in order to motivate for explanations/interpretations. The situation is the same, as somebody has to report about a room based on a view through one/more key-hole(s).

Concepts as keyword means: based on the raw data and further calculated data, there are 3 hidden formulas and only the results of these hidden formulas are known in frame of the tasks. The inputs of the tasks is only data positions without any formulas.

Verification as keyword means: what kind of analytical steps lead to a situation, where it is possible to classify concepts as potential realistic or even potential irrealistic.

Based on these short definitions, the publication try to present a case study where (see the entire publication as such), where different steps (task1, task2, task3, task4:interpretation of the hidden file) are interpreted in a detailed way.

The experiment based on the data delivered in task1 can be found in chapter#...

The experiment based on the data delivered in task2 can be found in chapter#...

The experiment based on the data delivered in task3 can be found in chapter#...

The experiment based on the data delivered in task4 can be found in chapter#...

The entire publication tries to deliver interpretation possibilities to the term "verification". Verification can be derived manually (see chapter#...) or even in an automated way (see chapter#...). The manual-driven steps can have such a traps, where automation becomes impossible (see chapter#...).

Summa summarum: the whole publication tries to have influence to the thinking methodology of the Students in order to see practical steps behind phylosophycal challenges (e.g. automation, nature/level of vierification). The publication can be evaluated as understood, if the Reader think, (s)he is capable of deriving classifications concerning arbitrary concepts and (s)he is capable of deciding about a concept whether it it is rather realistic or rather irrealistic. It is also important, that the Readers see the third output-level: namely, not each concept may be evaluated based on the partical given raw data (see chapter#...).

Chapter#1.2. Tasks

The aims/objective presented already the 3+1 tasks: 3 tasks are handling with concepts based on partial information. The last one (4th) demonstrates holistic/complete information.

Task1: Based on the particular information, which concept (A,B,C) seems to be rational or irrational? (see chapter#...)

Task2: Based on further particular information, which concept (A,B,C) seems to be rational or irrational? (see chapter#...)

Task3: Based on further new particular information, which concept (A,B,C) seems to be rational or irrational? (see chapter#...)

Task4a: Based on holistic/complete information, which concept (A,B,C) seems to be rational or irrational? (see chapter#...) AND

Task4b: How can be automated the most complex (most consistent) verification process? (see chapter#...)

Argumentations: The new and newer futher information units try to support the understanding process concerning more and more complex verification strategies. The tasks sould be solved in a step-by-step-way in order to ensure didactical impacts/effects in Students.

Chapter#1.3. Targeted groups

The entire challenge is a didactical challenge. The step-wise progress is the learning process as such. The methodology is basing on trial-and-error-effects in individuals and in groups. Therefore, the targeted groups are individuals (as Students) and groups of Students. On the other hand: each learning material is a kind of support for teachers too. Therefore, teachers are also part of the targeted groups. Affected teachers are not only teachers having the same subject (c.f. testing), but each subject can also be supported through the phylosophycal (context free) aspects. Finally, instituions (management of institutions/universities) are also a kind of targeted group, because the castles of the sciences have to apply each teached knowledge in the own management processes.

Chapter#1.4. Utilities (estimation of informational added-values)

There are now 4 targeted groups: individuals as Students, groups of Students, individuals as Teachers, manager of universities. The informational added-value is the difference between impacts without and with the results of this project minus costs. In ideal case: the projects does cause more positive impacts than costs compared to the benchmark where the projects results are not given. Estimations have two layers: incomes and costs in the bechmark situation AND incomes and costs based on the results of the projects.... (later)

Manager of universities:

  • Benchmark: naive approach for daily marketing for motivating more Students to attendance
    • Costs: basically wages (where employees/experts are writting messages for the social media)
    • Impacts: in ideal case, the share of the particular university is not decreasing compared to the competitive institutions
    • Expectation: the income through the human activities must be higher than the costs of the human activities, atl least zero (0 EUR)
  • AI-driven support:
    • Costs: redurced wages, but licence fees for AI (concept testing) - human experts produce concepts based on the particular data, robots are verifying concepts
    • Costs of the AI-oriented development (10.000 EUR/licence)
    • Impacts: in ideal case, the share of the particular university is massive increasing compared to the competitive institutions through the most realistic understanding of the marketing systems (e.g. 10.000 EUR/year)
  • Conclusion: the investition into the AI-oriented development can be covered within 1 year

Chapter#1.5. Motivation

This publication is an efficient case study concerning knowledge management, especially testing knowledge management processes among Students for better final theses and parallel, it is a real publication about a complex challenge: concept testing layers. Therefore, it is motivating to integrate to goals in one single action.

Chapter#1.6. About the structure of the publication

The publication will concern mathematical aspects (see similarity analyses), but without such level of details, where this publication could be used for learning about the complex system of the similerities. This challenge is complex enough in order to handle in an other publication.

This publication tries to follow the strict pattern predefined for final theses in general, and especially for BPROF-Students. In this publication one single expectation will not be worked out: the relationships between the subjects in the curriculum and the particular publication title. In order to have appropriate examples, please analyse the following URL: https://miau.my-x.hu/temp/2025tavasz/?C=M;O=D

The publication is just a quasi formatted text. Only chapters are defined in a more-layer-strucuture. The "citations" will be written as prescripted incl. the necessary sources - in this case in form of URLs pointing to specific parts of the background documentations: e.g. https://miau.my-x.hu/mediawiki/index.php?title=CT_01 Further formats (bold, underlined, footnotes, lists, etc.) are excluded.

Chapter#2. Literature

This chapter is dedicated for all definitions, which are necessary to understand the own development, results. Here, it is important to use citations with sources and between two citations, it is expected, that the Author(s) deliver argumentations about each citation: is a citation is to integrated or even to avoid? Relevant topics are: testing as such, proving as such, KPIs, correlations, regressions, similarity analyses, automation, ...

Chapter#2.1. Testing

"Software testing is the act of checking whether software satisfies expectations." (Source: https://en.wikipedia.org/wiki/Software_testing) This short definition is complex enough to deliver a relevant new keyword: "expectations". Before this abstraction is really involved, the term of "concept testing" should be defined. This definition may come from the Author(s), because here and now, only the goals of the Author(s) are relevant. Concepts are therefore patterns (formulas, systems, relationships, models, etc.) being seemingly capable of mirroring the connections between the known data (even they are partial from point of view of a holistic approach). "Expectations" are all measurable features being capable of monitoring the goodnees of the unknown connections. It is important: the human experts may not change the raw data if a concept seems not to be appropriate enough. Always the concepts should be changed till all raw data are covered through the mathematisms of the particular (best) concept. The problems of the arbitrariness of the human experts can be found listed in the book: Arthur Koestler, The Sleepwalkers! (more: https://en.wikipedia.org/wiki/The_Sleepwalkers:_A_History_of_Man%27s_Changing_Vision_of_the_Universe) Therefore, the goodness of the concepts let assume a scale: the one end of the scale is the set of the randomized generated concepts. The opposite end of this scale is the set of the error-free solutions (because it is possible two have alternative solutions with the same evaluation value).

Chapter#2.2. Proving, goodness, objectivity

As a direct logical step based on the subchapter#2.1. (about testing): Goodness as such is also concerned in the background publications: e.g. "This level of accuracy—where predicted values match actual ones—is a strong sign that A-Concept is successfully capturing meaningful patterns." (source: https://miau.my-x.hu/mediawiki/index.php/CT_01#A-Concept:_A_Rational_Framework - first paragraph in Source#2). The background texts has 39 items about accuracy. All these mentionings should be consolidated in the chapter#3 in order to see, what kind of automatable system can be identified for concept testing as such. The statement in the above-mentioned citation about the accuracy means, goodness can be measured, if predicted (estimated) values are the same compared to the appropriate facts (matching). It is a relevant aspects of goodness, but it is a discrete scale (hit rate / contingency coefficient), where statistics about existing and not-existing matching-positions will be derived: e.g. 75% matching means: 3 of 4 facts have matching with the estimated values. The basic principle (direction) is valid for a hit rate: the more the more. BUT, not only hit rate is existing. The estimations could have numeric accuracy: e.g. difference(^2) between facts and estimations. Important assumption: quasi unlimited goodness-criteria can be defined and therefore, we need immediately a kind of aggregation process for all goodness-criteria. This aggregation may however not be arbitrary (see: weights and/or scores). The aggregation must be optimized! Conclusion: the best concept can only be derived in an automated way, if the goodness-criteria are complex and aggregated in an optimized (objective way). The last (4th) task in the concept testing process is given in order to enforce this optimized aggregation process based on a clear example... Further interpretations about the goodness (c.f. key-term=accuracy, source=https://miau.my-x.hu/mediawiki/index.php?title=CT_01):

Source#3:

  • "The analytical summaries (e.g., "Átlag / rel. diff," "Maximum / rel. diff4") quantify the estimation process’s accuracy."
  • "The ranking and COCO framework abstract this into testable units, validated by estimation models (A5-C6) that predict outcomes with high accuracy (e.g., correlations above 0.96 for A6, B6)."

The formulations talks about quantification, e.g. correlation.

Source#4:

  • "Error Dispersion: Elevated error metrics in the quasi-random outcomes underscored the impact of randomness on the predictive accuracy."
  • "This combined approach improves prediction accuracy and helps pinpoint areas where model refinements are necessary, thereby advancing the overall robustness of the performance evaluation. "

The mentioning of the randomness is important as on of the characteristic points of the concept testing as such. The mentioning of improving is a clear sing for the necessity of measuring of goodness. Such terms as robustness are disturbing: they are empty bubbles without any potential steps towards the KNUTH-principle (c.f. https://miau.my-x.hu/miau2009/index_tki.php3?_filterText0=*knuth)

Source#5:

  • "Multiple Tests for Accuracy: The three COCO STD datasets help ensure the rankings are reliable."
  • "Simplify the Steps: Some calculations seem unnecessary and could be removed without losing accuracy." +"While most steps make sense, some choices (like using 37 instead of 36) seem unusual."

The expression of "multiple tests" means: the goodness must have different layers (and they should be aggregated in an optimzed way). The ""simplification"" can be seen as a kind of discussion-layer.

Source#6:

Not all background materials (https://miau.my-x.hu/mediawiki/index.php?title=CT_01) are using the term of "accuracy" (c.f. source#1). "The model sheets likely represent different iterations or configurations of the underlying analysis. Each model appears to test alternative assumptions or parameters regarding energy consumption. The consistent referencing of objects, attributes, and the notion of “steps” (as seen in the Hungarian “Lépcsôk”) suggests a systematic approach to evaluating model performance and reliability." The challenge can be identified in Source#6, but the problem about the accuracy seems to be lost in fram eof goals. "Pattern Recognition" is an important term, but the evaluation (goodness) of potential patterns could not be explained in a detailed way. This negative effects seems to be a conclusion of the chatgpt-impact (c.f. "In this essay, we explore the multifaceted layers of the Excel file while integrating insights from AI-assisted dialogues, demonstrating how tools like ChatGPT/Copilot can enrich the interpretative process."). Further bubble-like text-elements (characteristical for chatgpt/copilot) can also be identified: e.g. "Validate Patterns: Multiple interactions confirmed recurring themes across the dataset, particularly regarding the consistency in the averaging process and the role of model sheets in testing various conceptual scenarios." All these formulations are without any real/deep/operationalized meaning - unfortunately. LLM-approaches are definitely not capable of rational hermeneutics (e.g. https://miau.my-x.hu/miau/320/tartalom_es_forma_szoveges_elvalasztasa_copilot_gyogypedagogia.docx). On the other hand: source#6 delivers a LLM-based interpretation, where the basic XLSX-file are seen as a form of the complex communication contrary e.g. to MTMT-logic, but parallel to the MIAU.MY-X.HU-logic: c.f. "The Excel file is not merely a repository of data; it is a narrative of a systematic experimental approach."

Source#7:

  • "This paper aims to analyze these datasets to evaluate the accuracy of performance predictions and their implications on model efficiency."
  • "Fact-estimate discrepancies were also evaluated, with lower values signifying better estimation accuracy."
  • "*Model_A6*: Includes hidden attributes, achieving a high correlation (0.99) and strong estimation accuracy"
  • " *Model_C6*: Poor correlation (0.80) and weak estimation accuracy, ranking the lowest among models."
  • "Advanced Estimations: OAM, Y0, OAM_2, and Y0_2 The OAM worksheet evaluates model stability and accuracy through a COCO:Y0 engine estimation. "
  • "Conclusion The dataset analysis reveals critical insights into the accuracy and efficiency of various e-car models. Models A6 and B6 exhibit the highest reliability based on correlation and estimation accuracy, while Model C6 underperforms significantly. "

The term "underperforms significantly" is a logical trap: the significance should be important (c.f. special KPI), but the basic XLSX-file does not have any classic significance analyses. The term of "model efficiency" seems to be important, but there are buzzwords like "efficiency" which are empty bubbles if the operationalism/defining is not given. The interpretation/evaluation/ranking of the concept-variations (A-B-C) can be identified, but without an automatable flow-chart of the realistic detailed steps. The real role of the COCO Y0-models could not be derived - unfortunately. It is important, that concepts (A,B) could have the same "accuracy", while concept-C is definitely less robust (what robustness ever means).

Chapter#2.3. KPIs

Matching-oriented KPIs:

  • hit rates: "A false positive is an error in binary classification in which a test result incorrectly indicates the presence of a condition (such as a disease when the disease is not present), while a false negative is the opposite error, where the test result incorrectly indicates the absence of a condition when it is actually present. These are the two kinds of errors in a binary test, in contrast to the two kinds of correct result (a true positive and a true negative)." (https://en.wikipedia.org/wiki/False_positives_and_false_negatives)
  • further classifications: The matching can not only interpreted between already/really existing pairs of values. Artificial benchmarks can also be integrated into a goodness-structure: e.g. matching of dynamical processes (fact vs. extimations): increasing:increasing, decreasing:decreasing, increasing:decreasing, decreasing:increasing compared to the previous values. Benchmarks can be defined in quasi arbitrary ways.
  • ...

All matching-oriented KPIs are relevant!

Numeric KPIs:

  • sum of absulote difference between facts and estimations:
  • sum of quadratic difference between facts and estimations: e.g. Excel: SUMSQ() - "Returns the sum of the squares of the arguments." (https://support.microsoft.com/en-us/office/sumsq-function-e3313c02-51cc-4963-aae6-31442d9ec307) where the arguments are the differences between facts and estimations
  • correlation: "In statistics, correlation or dependence is any statistical relationship, whether causal or not, between two random variables or bivariate data. Although in the broadest sense, "correlation" may indicate any type of association, in statistics it usually refers to the degree to which a pair of variables are linearly related. Familiar examples of dependent phenomena include the correlation between the height of parents and their offspring, and the correlation between the price of a good and the quantity the consumers are willing to purchase, as it is depicted in the demand curve." (https://en.wikipedia.org/wiki/Correlation) The correlation is a more complex abstraction, than e.g. SUMSQ.
  • significancy: It could be important, but here and now, it is nor operationalized.
  • efficiency: It could be important, but here and now, it is nor operationalized.
  • ...

All numeric KPIs are relevant! The own consolidation (system model, system plan) have to clarify an automatable process for testing/evaluating concepts.

Chapter#2.4. ...

Chapter#2.5. ...

Chapter#2.6. ...

Chapter#2.7. ...

Chapter#3. Own developments

...

Chapter#3.x Automation

Chapter#3.x Testing

Chapter#3.x IT-security aspects

Chapter#4. Discussions

Chapter#5. Conclusions

Chapter#6. Future

Chapter#7. Summary

Chapter#8. Annexes

Chapter#.8.1. Abbreviations

Chapter#.8.2. Figures

Chapter#.8.3. References

Chapter#.8.4. Conversations with LLMs