I need your help with below case study.
The following dataset contains process start and stop events collected from individual Windows-based desktop computers and servers. Each event is on a separate line in the form of “time, user@domain, computer, process name, start/end” and represents a process event at the given time.
Specific users that are well known system related (SYSTEM, Local Service) were not de-identified though any well-known administrators
account were still de-identified. The specific timeframe used is not disclosed for security purposes. All data starts with a time epoch of 1 using a time resolution of 1 second. Below is the sample datasets examples and attached as well.
1,B553$@DOM1,C553,P16,Start 1,B553$@DOM1,C553,P25,End 1,B553$@DOM1,C553,P25,Start
Can anybody help me here to analyze above dataset and to identify ( To understand what is going on in these local systems inside the internal network) . clusters of users and processes based on their execution.
proctest.zip (821.2 KB)
I did some analysis and thought for using K-Means algorithim for clustering but it seems K-Means works with numeric data only & in above case data are string based.