September is over. The first three quarters of 2015 are over. This has been a very important year so far – difficult, but revealing. Everything has been about change, healing and renewal.
This was a hard choice – it took many months to reach the conclusion this is what I needed to do.
Most people have gone thru strong programming: they think you have to be ‘successful’ at something. Success is externally defined, anyhow (as opposed to satisfaction which we define ourselves) and therefore you are supposed to study in college a certain field, then use that at work to build your career in the same field… and keep doing the same thing.
I was never like that – I didn’t go to college, I didn’t study as an ‘engineer’. I just saw there was a market opportunity to find a job when I started, studied on the job, eventually excelled at it. But it never was *the* road. It just was one road; it has served me well so far, but it was just one thing I tried, and it worked out. How did it start? As a pre-teen, I had been interested in computers, then left that for a while, did ‘normal’ high school (in Italy at the time, this was really non-technological), then I tried to study sociology for a little bit – I really enjoyed the Cultural Anthropology lessons there, and we were smoking good weed with some folks outside of the university, but I really could not be asked to spend the following 5 or 10 years or my life just studying and ‘hanging around’ – I wanted money and independence to move out of my parent’s house.
So, without much fanfare, I revived my IT knowledge: upgraded my skill from the ‘hobbyist’ world of the Commodore 64 and Amiga scene (I had been passionate about modems and the BBS world then), looked at the PC world of the time, rode the ‘Internet wave’ and applied for a simple job at an IT company.
A lot of my friends were either not even searching for a job, with the excuse that there weren’t any, or spending time in university, in a time of change, where all the university-level jobs were taken anyway so that would have meant waiting even more after they had finished studying… I am not even sure they realized this until much later. But I just applied, played my cards, and got my job.
When I went to sign it, they also reminded me they expected hard work at the simplest and humblest level: I would have to fix PC’s, printers, help users with networking issues and tasks like those – at a customer of theirs, a big company. I was ready to roll up my sleeves and help that IT department however I would be capable of, and I did. It all grew from there.
And that’s how my IT career started. I learned all I know of IT on the job and by working my ass off and studying extra hours and watching older/more expert colleagues and making experience.
I am not an engineer. I am, at most, a mechanic. I did learn a lot of companies and the market, languages, designs, politics, the human and technical factors in software engineering and the IT marketplace/worlds, over the course of the past 18 years.
But when I started, I was just trying to lend a honest hand, to get paid some money in return – isn’t that what work was about?
Over time IT got out of control. Like Venom, in the Marvel comics, that made its appearance as a costume that SpiderMan started wearing… and it slowly took over, as the ‘costume’ was in reality some sort of alien symbiotic organism (like a pest).
You might be wondering what I mean. From the outside I was a successful Senior Program Manager of a ‘hot’ Microsoft product. Someone must have mistaken my diligence and hard work for ‘talent’ or ‘desire of career’ – but it never was. I got pushed up, taught to never turn down ‘opportunities’.
First and foremost, I am taking time for myself and my family. I am reading (and writing) I am cooking again I have been catching up on sleep – and have dreams again I am helping my father in law to build a shed in his yard We bought a 14-years old Volkswagen van that we are turning into a Camper I have not stopped building guitars – in fact I am getting setup to do it ‘seriously’ – so I am also standing up a separate site to promote that activity I am making music and discovering new music and instruments I am meeting new people and new situations
There’s a lot of folks out there who either think I am crazy (they might be right, but I am happy this way), or think this is some sort of lateral move – I am not searching for another IT job, thanks. Stop the noise on LinkedIn please: I don’t fit in your algorithms, I just made you believe I did, all these years.
I just saw that my former colleague (PFE) Tristan has posted an interesting note about the use of SetSPN “–A” vs SetSPN “–S”. I normally don’t repost other people’s content, but I thought this would be useful as there are a few SPN used in OpsMgr and it is not always easy to get them all right… and you can find a few tricks I was not aware of, by reading his post.
People were already collecting logs with MOM, so why not the security log? Some people were doing that, but it did not scale enough; for this reason, a few years ago Eric Fitzgerald announced that he was working on Microsoft Audit Collection System. Anyhow, the tool as it was had no interface… and the rest is history: it has been integrated into System Center Operations Manager. Anyhow, ACS remains a lesser-known component of OpsMgr.
There are a number of resources on the web that is worth mentioning and linking to:
and, of course, many more, I cannot link them all.
As for myself, I have been playing with ACS since those early beta days (before I joined Microsoft and before going back to MOM, when I was working in Security), but I never really blogged about this piece.
Since I have been doing quite a lot of work around ACS lately, again, I thought it might be worth consolidating some thoughts about it, hence this post.
Anatomy of an “Online” Sizing Calculation
What I would like to explain here is the strategy and process I go thru when analyzing the data stored in a ACS database, in order to determine a filtering strategy: what to keep and what not to keep, by applying a filter on the ACS Collector.
So, the first thing I usually start with is using one of the many “ACS sizer” Excel spreadsheets around… which usually tell you that you need more space than it really is necessary… basically giving you a “worst case” scenario. I don’t know how some people can actually do this from a purely theoretical point of view, but I usually prefer a bottom up approach: I look at the actual data that the ACS is collecting without filters, and start from there for a better/more accurate sizing.
In the case of a new install this is easy – you just turn ACS on, set the retention to a few days (one or two weeks maximum), give the DB plenty of space to make sure it will make it, add all your forwarders… sit back and wait.
Then you come back 2 weeks later and start looking at the data that has been collected.
What/How much data are we collecting?
First of all, if we have not changed the default settings, the grooming and partitioning algorithm will create new partitioned tables every day. So my first step is to see how big each “partition” is.
But… what is a partition, anyway? A partition is a set of 4 tables joint together:
dtEvent_GUID
dtEventData_GUID
dtPrincipal_GUID
dtSTrings_GUID
where GUID is a new GUID every day, and of course the 4 tables that make up a daily partition will have the same GUID.
The dtPartition table contains a list of all partitions and their GUIDs, together with their start and closing time.
Just to get a rough estimate we can ignore the space used by the last three tables – which are usually very small – and only use the dtEvent_GUID table to get the number of events for that day, and use the stored procedure “sp_spaceused” against that same table to get an overall idea of how much space that day is taking in the database.
By following this process, I come up with something like the following:
Partition ID
Status
Partition Start Time
Partition Close Time
Rows
Reserved KB
Total GB
9b45a567_c848_4a32_9c35_39b402ea0ee2
0
2/1/2010 2:00
2/1/2010 2:00
29,749,366
7,663,488
7,484
8d8c8ee1_4c5c_4dea_b6df_82233c52e346
2
1/31/2010 2:00
2/1/2010 2:00
28,067,438
9,076,904
8,864
34ce995b_689b_46ae_b9d3_c644cfb66e01
2
1/30/2010 2:00
1/31/2010 2:00
30,485,110
9,857,896
9,627
bb7ea5d3_f751_473a_a835_1d1d42683039
2
1/29/2010 2:00
1/30/2010 2:00
48,464,952
15,670,792
15,304
ee262692_beae_4d81_8079_470a54567946
2
1/28/2010 2:00
1/29/2010 2:00
48,980,178
15,836,416
15,465
7984b5b8_ddea_4e9c_9e51_0ee7a413b4c9
2
1/27/2010 2:00
1/28/2010 2:00
51,295,777
16,585,408
16,197
d93b9f0e_2ec3_4f61_b5e0_b600bbe173d2
2
1/26/2010 2:00
1/27/2010 2:00
53,385,239
17,262,232
16,858
8ce1b69a_7839_4a05_8785_29fd6bfeda5f
2
1/25/2010 2:00
1/26/2010 2:00
55,997,546
18,105,840
17,681
19aeb336_252d_4099_9a55_81895bfe5860
2
1/24/2010 2:00
1/24/2010 2:00
28,525,304
7,345,120
7,173
1cf70e01_3465_44dc_9d5c_4f3700dc408a
2
1/23/2010 2:00
1/23/2010 2:00
26,046,092
6,673,472
6,517
f5ec207f_158c_47a8_b15f_8aab177a6305
2
1/22/2010 2:00
1/22/2010 2:00
47,818,322
12,302,208
12,014
b48dabe6_a483_4c60_bb4d_93b7d3549b3e
2
1/21/2010 2:00
1/21/2010 2:00
55,060,150
14,155,392
13,824
efe66c10_0cf2_4327_adbf_bebb97551c93
2
1/20/2010 2:00
1/20/2010 2:00
58,322,217
15,029,216
14,677
0231463e_8d50_4a42_a834_baf55e6b4dcd
2
1/19/2010 2:00
1/19/2010 2:00
61,257,393
15,741,248
15,372
510acc08_dc59_482e_a353_bfae1f85e648
2
1/18/2010 2:00
1/18/2010 2:00
64,579,122
16,612,512
16,223
If you have just installed ACS and let it run without filters with your agents for a couple of weeks, you should get some numbers like those above for your “couple of weeks” of analysis. If you graph your numbers in Excel (both size and number of rows/events per day) you should get some similar lines that show a pattern or trend:
So, in my example above, we can clearly observe a “weekly” pattern (monday-to-friday being busier than the weekend) and we can see that – for that environment – the biggest partition is roughly 17GB. If we round this up to 20GB – and also considering the weekends are much quieter – we can forecast 20*7 = 140GB per week. This has an excess “buffer” which will let the system survive event storms, should they happen. We also always recommend having some free space to allow for re-indexing operations.
In fact, especially when collecting everything without filters, the daily size is a lot less predictable: imagine worms “trying out” administrator account’s passwords, and so on… those things can easily create event storms.
Anyway, in the example above, the customer would have liked to keep 6 MONTHS (180days) of data online, which would become 20*180 = 3600GB = THREE TERABYTE and a HALF! Therefore we need a filtering strategy – and badly – to reduce this size.
[edited on May 7th 2010 – if you want to automate the above analysis and produce a table and graphs like those just shown, you should look at my following post.]
Filtering Strategies
Ok, then we need to look at WHAT actually comprises that amount of events we are collecting without filters. As I wrote above, I usually run queries to get this type of information.
I will not get into HOW TO write a filter here – a collector’s filter is a WMI notification query and it is already described pretty well elsewhere how to configure it.
Here, instead, I want to walk thru the process and the queries I use to understand where the noise comes from and what could be filtered – and get an estimate of how much space we could be saving if filter one way or another.
Number of Events per User
–event count by User (with Percentages) declare @total float select @total = count(HeaderUser) from AdtServer.dvHeader select count(HeaderUser),HeaderUser, cast(convert(float,(count(HeaderUser)) / (convert(float,@total)) * 100) as decimal(10,2)) from AdtServer.dvHeader group by HeaderUser order by count(HeaderUser) desc
In our example above, over the 14 days we were observing, we obtained percentages like the following ones:
#evt
HeaderUser Account
Percent
204,904,332
SYSTEM
40.79 %
18,811,139
LOCAL SERVICE
3.74 %
14,883,946
ANONYMOUS LOGON
2.96 %
10,536,317
appintrauser
2.09 %
5,590,434
mossfarmusr
…
Just by looking at this, it is pretty clear that filtering out events tracked by the accounts “SYSTEM”, “LOCAL SERVICE” and “ANONYMOUS”, we would save over 45% of the disk space!
Number of Events by EventID
Similarly, we can look at how different Event IDs have different weights on the total amount of events tracked in the database:
–event count by ID (with Percentages) declare @total float select @total = count(EventId) from AdtServer.dvHeader select count(EventId),EventId, cast(convert(float,(count(EventId)) / (convert(float,@total)) * 100) as decimal(10,2)) from AdtServer.dvHeader group by EventId order by count(EventId) desc
We would get some similar information here:
Event ID
Meaning
Sum of events
Percent
538
A user logged off
99,494,648
27.63
540
Successful Network Logon
97,819,640
27.16
672
Authentication Ticket Request
52,281,129
14.52
680
Account Used for Logon by (Windows 2000)
35,141,235
9.76
576
Specified privileges were added to a user’s access token.
26,154,761
7.26
8086
Custom Application ID
18,789,599
5.21
673
Service Ticket Request
10,641,090
2.95
675
Pre-Authentication Failed
7,890,823
2.19
552
Logon attempt using explicit credentials
4,143,741
1.15
539
Logon Failure – Account locked out
2,383,809
0.66
528
Successful Logon
1,764,697
0.49
Also, do not forget that ACS provides some report to do this type of analysis out of the box, even if for my experience they are generally slower – on large datasets – than the queries provided here. Also, a number of reports have been buggy over time, so I just prefer to run queries and be on the safe side.
Below an example of such report (even if run against a different environment – just in case you were wondering why the numbers were not the same ones :-)):
The numbers and percentages we got from the two queries above should already point us in the right direction about what we might want to adjust in either our auditing policy directly on Windows and/or decide if there is something we want to filter out at the collector level (here you should ask yourself the question: “if they aren’t worth collecting are they worth generating?” – but I digress).
Also, a permutation of the above two queries should let you see which user is generating the most “noise” in regards to some events and not other ones… for example:
–event distribution for a specific user (change the @user) – with percentages for the user and compared with the total #events in the DB declare @user varchar(255) set @user = ‘SYSTEM’ declare @total float select @total = count(Id) from AdtServer.dvHeader declare @totalforuser float select @totalforuser = count(Id) from AdtServer.dvHeader where HeaderUser = @user select count(Id), EventID, cast(convert(float,(count(Id)) / convert(float,@totalforuser) * 100) as decimal(10,2)) as PercentageForUser, cast(convert(float,(count(Id)) / (convert(float,@total)) * 100) as decimal(10,2)) as PercentageTotal from AdtServer.dvHeader where HeaderUser = @user group by EventID order by count(Id) desc
The above is particularly important, as we might want to filter out a number of events for the SYSTEM account (i.e. logons that occur when starting and stopping services) but we might want to keep other events that are tracked by the SYSTEM account too, such as an administrator having wiped the Security Log clean – which might be something you want to keep:
of course the amount of EventIDs 517 over the total of events tracked by the SYSTEM account will not be as many, and we can still filter the other ones out.
Number of Events by EventID and by User
We could also combine the two approaches above – by EventID and by User:
select count(Id),HeaderUser, EventId
from AdtServer.dvHeader
group by HeaderUser, EventId
order by count(Id) desc
This will produce a table like the following one
which can be easily copied/pasted into Excel in order to produce a pivot Table:
Cluster EventLog Replication
One more aspect that is less widely known, but I think is worth showing, is the way that clusters behave when in ACS. I don’t mean all clusters… but if you keep the “eventlog replication” feature of clusters enabled (you should disable it also from a monitoring perspective, but I digress), each cluster node’s security eventlog will have events not just for itself, but for all other nodes as well.
Albeit I have not found a reliable way to filter out – other than disabling eventlog replication altogether.
Anyway, just to get an idea of how much this type of “duplicate” events weights on the total, I use the following query, that tells you how many events for each machine are tracked by another machine:
–to spot machines that are cluster nodes with eventlog repliation and write duplicate events (slow)
select Count(Id) as Total,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”) as ForwarderMachine, EventMachine
from AdtServer.dvHeader
–where ForwarderMachine <> EventMachine
group by EventMachine,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”)
order by ForwarderMachine,EventMachine
Those presented above are just some of the approaches I usually look into at first. Of course there are a number more. Here I am including the same queries already shown in action, plus a few more that can be useful in this process.
I have even considered building a page with all these queries – a bit like those that Kevin is collecting for OpsMgr (we actually wrote some of them together when building the OpsMgr Health Check)… shall I move the below queries on such a page? I though I’d list them here and give some background on how I normally use them, to start off with.
Some more Useful Queries
–top event ids select count(EventId), EventId from AdtServer.dvHeader group by EventId order by count(EventId) desc
–event count by ID (with Percentages) declare @total float select @total = count(EventId) from AdtServer.dvHeader select count(EventId),EventId, cast(convert(float,(count(EventId)) / (convert(float,@total)) * 100) as decimal(10,2)) from AdtServer.dvHeader group by EventId order by count(EventId) desc
–which machines have ever written event 538 select distinct EventMachine, count(EventId) as total from AdtServer.dvHeader where EventID = 538 group by EventMachine
–machines select * from dtMachine
–machines (more readable) select replace(right(Description, (len(Description) – patindex(‘%\%’,Description))),’$’,”) from dtMachine
–events by machine select count(EventMachine), EventMachine from AdtServer.dvHeader group by EventMachine
–rows where EventMachine field not available (typically events written by ACS itself for chekpointing) select * from AdtServer.dvHeader where EventMachine = ‘n/a’
–event count by day select convert(varchar(20), CreationTime, 102) as Date, count(EventMachine) as total from AdtServer.dvHeader group by convert(varchar(20), CreationTime, 102) order by convert(varchar(20), CreationTime, 102)
–event count by day and by machine select convert(varchar(20), CreationTime, 102) as Date, EventMachine, count(EventMachine) as total from AdtServer.dvHeader group by EventMachine, convert(varchar(20), CreationTime, 102) order by convert(varchar(20), CreationTime, 102)
–event count by machine and by date (distinuishes between AgentMachine and EventMachine select convert(varchar(10),CreationTime,102),Count(Id),EventMachine,AgentMachine from AdtServer.dvHeader group by convert(varchar(10),CreationTime,102),EventMachine,AgentMachine order by convert(varchar(10),CreationTime,102) desc ,EventMachine
–event count by User select count(Id),HeaderUser from AdtServer.dvHeader group by HeaderUser order by count(Id) desc
–event count by User (with Percentages) declare @total float select @total = count(HeaderUser) from AdtServer.dvHeader select count(HeaderUser),HeaderUser, cast(convert(float,(count(HeaderUser)) / (convert(float,@total)) * 100) as decimal(10,2)) from AdtServer.dvHeader group by HeaderUser order by count(HeaderUser) desc
–event distribution for a specific user (change the @user) – with percentages for the user and compared with the total #events in the DB declare @user varchar(255) set @user = ‘SYSTEM’ declare @total float select @total = count(Id) from AdtServer.dvHeader declare @totalforuser float select @totalforuser = count(Id) from AdtServer.dvHeader where HeaderUser = @user select count(Id), EventID, cast(convert(float,(count(Id)) / convert(float,@totalforuser) * 100) as decimal(10,2)) as PercentageForUser, cast(convert(float,(count(Id)) / (convert(float,@total)) * 100) as decimal(10,2)) as PercentageTotal from AdtServer.dvHeader where HeaderUser = @user group by EventID order by count(Id) desc
–to spot machines that write duplicate events (such as cluster nodes with eventlog replication enabled) select Count(Id),EventMachine,AgentMachine from AdtServer.dvHeader group by EventMachine,AgentMachine order by EventMachine
–to spot machines that are cluster nodes with eventlog repliation and write duplicate events (better but slower) select Count(Id) as Total,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”) as ForwarderMachine, EventMachine from AdtServer.dvHeader –where ForwarderMachine <> EventMachine group by EventMachine,replace(right(AgentMachine, (len(AgentMachine) – patindex(‘%\%’,AgentMachine))),’$’,”) order by ForwarderMachine,EventMachine
–which user and from which machine is target of elevation (network service doing “runas” is a 552 event) select count(Id),EventMachine, TargetUser from AdtServer.dvHeader where HeaderUser = ‘NETWORK SERVICE’ and EventID = 552 group by EventMachine, TargetUser order by count(Id) desc
–by hour, minute and user –(change the timestamp)… this query is useful to search which users are active in a given time period… –helpful to spot “peaks” of activities such as password brute force attacks, or other activities limited in time. select datepart(hour,CreationTime) as Hours, datepart(minute,CreationTime) as Minutes, HeaderUser, count(Id) as total from AdtServer.dvHeader where CreationTime < ‘2010-02-22T16:00:00.000’ and CreationTime > ‘2010-02-22T15:00:00.000’ group by datepart(hour,CreationTime), datepart(minute,CreationTime),HeaderUser order by datepart(hour,CreationTime), datepart(minute,CreationTime),HeaderUser
I am seriously considering giving Asirra a try. It is an interesting project from Microsoft Research for an HIP (Human Interaction Proof) that uses info from petfinder.com to let users set apart pictures of dogs from those of cats. There is also a WordPress plugin, in the best and newest “we want to interoperate” fashion that we are finally getting at Microsoft (this has always been the way to go, IMHO, and BTW).
Facebook Terms of Service state that it is forbidden to “[…] use automated scripts to collect information from or otherwise interact with the Service or the Site […]”
I am quite sure there are a lot of people writing “official” applications (that is using the “platform API” and so on) that are collecting A LOT of information about users who install their applications. They are being sent the info about the visitors by facebook, they are storing them, they might do whatever they please with (study it, sell it to spammers, to marketers, to making-money-assholes) and nobody will ever notice because it is on their servers and nobody can check that.
But a script that changes your status from remote – since this is not a functionality they CHOSE to expose in their API – then THAT is a big issue. Doh! It’s just plain ridiculous, but that’s it.
[…] 4) Except as provided in Section 2.A.6 below, you may not continue to use, and must immediately remove from any Facebook Platform Application and any Data Repository in your possession or under your control, any Facebook Properties not explicitly identified as being storable indefinitely in the Facebook Platform Documentation within 24 hours after the time at which you obtained the data, or such other time as Facebook may specify to you from time to time;
5) You may store and use indefinitely any Facebook Properties that are explicitly identified as being storable indefinitely in the Facebook Platform Documentation; provided, however, that except as provided in Section 2.A.6 below, you may not continue to use, and must immediately remove from any Facebook Platform Application and any Data Repository in your possession or under your control, any such Facebook Properties: (a) if Facebook ceases to explicitly identify the same as being storable indefinitely in the Facebook Platform Documentation; (b) upon notice from Facebook (including if we notify you that a particular Facebook User has requested that their information be made inaccessible to that Facebook Platform Application); or (c) upon any termination of this Agreement or of your use of or participation in Facebook Platform; […] You will not directly or indirectly sell, export, re-export, transfer, divert, or otherwise dispose of any Facebook Properties to any country (or national thereof) without obtaining any required prior authorizations from the appropriate government authorities; […]
Are we sure everybody is playing by these rules, when every facebook “application” really runs on the developer’server ? How do you know that they are really storing only what you want them to store, and deleting what you want them to delete ? Everybody knows how difficult it is to really “delete” digital content once it has come into existance… who knows how many copies of this database/social graph are floating around ?
Of course that is not an issue because people don’t talk about it enough. But a script that changes your status – now, THAT is a very terrible thing.
I just don’t get this “politically correctness”. It must be me.
Oh, no… look! It’s not only me! I had read this post of Dare, but I problably had overlooked the last bit of it…. because he did point out this Hypocrisy going on:
[…] Or (5) the information returned by FQL about a user contains no contact information (no email address, no IM screen names, no telephone numbers, no street address) so it is pretty useless as a way to utilize one’s friends list with applications besides Facebook since there is no way to cross-reference your friends using any personally identifiable association that would exist in another service.
When it comes to contact lists (i.e. the social graph), Facebook is a roach motel. Lots of information about user relationships goes in but there’s no way for users or applications to get it out easily. Whenever an application like FacebookSync comes along which helps users do this, it is quickly shut down for violating their Terms of Use. Hypocrisy? Indeed. […]
[…] I will point out that 9 times out of 10 when you hear geeks talking about social network portability or similar buzzwords they are really talking about sending people spam because someone they know joined some social networking site. I also wonder how many people realize that these fly-by-night social networking sites that they happily hand over their log-in credentials to so they can spam their friends also share the list of email addresses thus obtained with services that resell to spammers? […] how do you prevent badly behaved applications like Quechup from taking control away from your users? At the end of the day your users might end up thinking you sold their email addresses to spammers when in truth it was the insecure practices of the people who they’d shared their email addresses with that got them in that mess. This is one of the few reasons I can understand why Facebook takes such a hypocritical approach. 🙂 […]
Thanks, Dare, for mentioning Hypocrisy. Thanks for calling things by their name. I do understand their approach, I just don’t agreewith it.
I did pull my small application off the Internet because I have a family to mantain and I don’t want to have legal troubles with Facebook. Sorry to all those that found it handy. No, I cannot even give that to you per email. It’s gone. I am sorry. For the freedom of speech, especially, I am sorry.
I keep finding way too many software that claim to interact with Web 2.0 sites or services, and connect here or there…. still forgetting one basic simple rule, that is: letting people use a proxy.
Most programmers for some reasons just assume that since they are directly connected to the internet, everybody is. Which isn’t always the case. Most companies have proxies and will only let you out to port 80 – by using their proxy.
…which in turn is one of the reasons why most applications now “talk” and tunnel whatever application protocol on top of HTTP… still a lot of softwares simply “forget” or don’t care proving a simple checkbox “use proxy”, which will translate in two or three extra lines of code… three lines which I personally usually include in my projects, when I am not even a *developer*!! (but that might explain why I *think* of it… I come from a security and networking background :-))
Anyway. I keep finding this thing over and over again. Both in simple, hobbyist, sample and/or in complex, big, expensive enterprise software. Last time I got pissed off about a piece of code missing this feature was some days ago when testing http://www.codeplex.com/FacebookToolkit. The previous time was during Windows Vista beta-testing (I had found a similar issue in beta2, and had it fixed for RC1.)
Actually, I am being polite saying it is “missing a feature”. To be honest I think missing this “feature” would have to be considered a bug: every piece of software using HTTP *should* include the possibility to pass thorugh proxy (also, don’t forget about AUTHENTICATED proxies), or the purpose of using HTTP in the first place is defeated!!
Developers!!! You have to remember people ARE behind proxies !!!!!
I have been quite hooked into Facebook for the last couple of days, figuring out what it can and cannot do. It can do a lot. The possibility to inject code and brand new application into it is absolutely awesome.
PopFly lets you create mashups and even custom blocks, and I liked that too. But you have to use fancy-shiny Silverlight (which is very cool indeed, but probably not *always* necesary) and you can only create blocks using Javascript. Sure, as someone as already written, the meaning of AJAX is “javascript now works”. I can understand (even if I don’t know them for sure) the reasons behind certain choices. But I find it limiting. Maybe it is because I don’t like Javascript. It must be it.
Facebook, instead, empowers you to inject code into their social networking framework. Any code. In whatever language you like. They started it in PHP, but you can plug-in whatever you like: Java, Ruby, Perl…. you can even have your application running on your own server, still providing a seamless experience inside of facebook. This opens up to millions of possibilities, and I got fascinated by that.
At the same time, the paranoid part of myself has been thinking to the security implications of it. This open platform is cool, but it also sounds like a framework for cross-site-scripting (XSS) attacks. Sure, you can “report” an application made by a third party that does something weird… but who will really notice if all that happens under the hood is that your cookies get stolen (and someone accesses your bank account) ? Will you figure it out it has happenend because you wanted to see the “dancing pigs” loaded in your profile ? Or will you figure it out at all ?
This said, I set aside my fear for a while and I delved into coding. What I did learn in the last couple of years, having slowly moved away from security engagements, is to relax. When I was working costantly with security I was a lot more paranoid. Now I case much less, and I live a lot more.
So I developed a couple of quick and simple apps running from this very server into Facebook, and I started using thePHP5 library they provide, so to be able to follow the examples first and figure out how it was working.
An interesting interview with a personality of the security community of some years ago has been published by Antonio `s4tan` Parata. It is very interesting to read from RFP’s words an analysis of how the view of people has changed regarding security.
I particularly enjoyed the following passage:
[…] Antonio “s4tan” Parata (ap): Hi Rain Forest Puppy, many thanks for this interview. You are considered one of the fathers of web security and the inventor of the SQL injection attack. Anyway in the year 2003 you decided to publicly retire from the security field (to get more infos http://www.wiretrip.net/rfp/txt/evolution.txt). Can you briefly sum your decision?
Rain Forest Puppy (rfp): My decision to retire from the public eye was based on a lot of reasons; overall, the amount of resources & energy required to release and maintain advisories and tools was just getting to be too large. It wasn’t fun anymore–and why pursue a hobby if you’re not enjoying it?
Plus, the security industry was becoming commercialized. Advisories and exploits are now bought and sold; performing security research in the first place can land you in legal waters. The intellectual value of the security research performed has been reduced to a single severity rating, which…if not high enough…causes the entire research to be dismissed. I really enjoy security from the intellectual angle; to me, it’s all just a big mental challenge…a puzzle, if you will. So when the creativity and intellectual aspect of it started to fade away, I decided to go with it. […]
I do back up this point of view: “why pursue a hobby is you’re not enjoying it ?”.
Creativity and intellectual aspects of security do still interest me, just the market around changed. That’s also part of why I started doing more System Management again – at least I have fun thiking and thinkering, integrating, scripting and composing….
[…] The intellectual value of the security research performed has been reduced to a single severity rating […] I really enjoy security from the intellectual angle; to me, it’s all just a big mental challenge…a puzzle, if you will […]
His point is expressed beautifully.
But he does not only talk about the Security community and market, he also has some interesting thoughts on open and closed source software:
ap: You are the author of the libwhisker library (http://www.wiretrip.net/rfp/lw.asp), widely used to create assessment perl scripts. What do you think about nowadays products related to web application assessment? What about some open source software (like parosproxy or nessus) changed to closed-source?
rfp: I have to choose my words carefully, because I very recently started working for a security software vendor.
Having had open source projects, I will say this: it is very hard to bootstrap a development community, and achieve the same level of polish, quality (as in QA), and implementation thoroughness as a commercial product. This isn’t necessarily because commercial software vendors are better coders; the dynamics are just different.
Open source coders are usually working on their own donated time. That means contributions are often catch-can and best-effort. Open source (when not sponsored by a commercial entity) are typically limited in resources (with time being the critical one).
[…]
All I care about is whether the tool works and/or gets the job done. I’ve spent so much wasted time trying to get a screwdriver to do a hammer’s job, and vice versa. I really don’t care if a tool is open source or commercial; I let the job dictate the tool, and not the other way around. Of course, there are certain artificial restrictions on this (like price limitations), but in general, I think there are some things that currently only exist in free & open source tools, and there are some things that currently only exist in commercial tools.
So use both wisely and get the best of both worlds. 🙂
On this website we use first or third-party tools that store small files (cookie) on your device. Cookies are normally used to allow the site to run properly (technical cookies), to generate navigation usage reports (statistics cookies) and to suitable advertise our services/products (profiling cookies). We can directly use technical cookies, but you have the right to choose whether or not to enable statistical and profiling cookies. Enabling these cookies, you help us to offer you a better experience. Cookie and Privacy policy